Not able to fine tune ted-youtube384.pth model.

snap-research / articulated-animation

Code for Motion Representations for Articulated Animation paper

https://snap-research.github.io/articulated-animation/

Other

1.21k stars 348 forks source link

Not able to fine tune ted-youtube384.pth model. #77

Closed hacker009-sudo closed 8 months ago

hacker009-sudo commented 8 months ago

Below is the issue:

File "/content/articulated-animation/logger.py", line 80, in load_cpk optimizer_reconstruction.load_state_dict(checkpoint['optimizer_reconstruction']) File "/usr/local/lib/python3.7/site-packages/torch/optim/optimizer.py", line 116, in load_state_dict raise ValueError("loaded state dict contains a parameter group " ValueError: loaded state dict contains a parameter group that doesn't match the size of optimizer's group

@AliaksandrSiarohin @sergeytulyakov @stulyakovsc , please guide.

hacker009-sudo commented 8 months ago

@AliaksandrSiarohin @sergeytulyakov @stulyakovsc , can you please guide, it's really urgent. There is some mismatch in config files present in repo.

AliaksandrSiarohin commented 8 months ago

I don't what config your are using, so can't tell if there is a mismatch.

hacker009-sudo commented 8 months ago

Hi @AliaksandrSiarohin , Thanks for replying.

I am using the same config present under config/ted-your384.yml file. I didn't change anything and except this ted-youtube384.pth model I am able to fine tune all other models.

I am getting this issue only in case of ted-youtube384.pth model.

I request you to please let me know what config changes I need to do in ted-youtube384.yml file to fine this model.

AliaksandrSiarohin commented 8 months ago

Probably number of regions in the config should be 15 or 10. But can't say for sure, because I don't know the shapes.

hacker009-sudo commented 8 months ago

@AliaksandrSiarohin , I am tried changing number of regions to 15 and 10. But now I am getting below issue:

RuntimeError: Error(s) in loading state_dict for Generator: size mismatch for pixelwise_flow_predictor.hourglass.encoder.down_blocks.0.conv.weight: copying a param with shape torch.Size([128, 84, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 44, 3, 3]). size mismatch for pixelwise_flow_predictor.mask.weight: copying a param with shape torch.Size([21, 148, 7, 7]) from checkpoint, the shape in current model is torch.Size([11, 108, 7, 7]). size mismatch for pixelwise_flow_predictor.mask.bias: copying a param with shape torch.Size([21]) from checkpoint, the shape in current model is torch.Size([11]). size mismatch for pixelwise_flow_predictor.occlusion.weight: copying a param with shape torch.Size([1, 148, 7, 7]) from checkpoint, the shape in current model is torch.Size([1, 108, 7, 7]).

Please guide how to fix it.

AliaksandrSiarohin commented 8 months ago

What was the error before? It this only error in optimizer, you can skip loading optimizer params.

hacker009-sudo commented 8 months ago

@AliaksandrSiarohin , Will it effect the performance of the model, if I will skip loading optimizer params?

AliaksandrSiarohin commented 8 months ago

Not, much, since you are probably using different dataset, there is not much sense to load optimizer state anyway.

hacker009-sudo commented 8 months ago

Sure, disabling it. Thanks for the help :)