Checkpoint state mismatch with code config

kdcyberdude commented 8 months ago

It seems like the released checkpoint is trained with different hyperparameters -


python main.py --config configs/config_demo.yaml --mode test
/home/kd/anaconda3/envs/pop/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1695392036766/work/aten/src/ATen/native/TensorShape.cpp:3526.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

------------------------Loading checkpoint POP_pretrained_CAPEdata_14outfits_epoch00400_model.pt
Traceback (most recent call last):
  File "main.py", line 341, in <module>
    main()
  File "main.py", line 125, in main
    model.load_state_dict(ckpt_loaded['model_state'])
  File "/home/kd/anaconda3/envs/pop/lib/python3.8/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for POP:
        size mismatch for unet_posefeat.upconvC5.up.1.weight: copying a param with shape torch.Size([64, 384, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 3, 3]).```

nsarafianos commented 7 months ago

@kdcyberdude did you resolve this by any chance? I'm getting the same error

kdcyberdude commented 7 months ago

I am not able to resolve this @nsarafianos!!

qianlim commented 7 months ago

Hey guys, it looks like this is the same as the previous issue here. For now the easy solution is to checkout to an earlier commit e.g. a05404d which is compatible with the pre-trained models (if you just wanna test and compare with what we showed in the paper); or just use the latest commit and re-train the model. Let me think of a more elegant way to fix this issue soon. Sorry for the confusion!

nsarafianos commented 7 months ago

Thanks a lot @qianlim !

qianlim / POP

Checkpoint state mismatch with code config #24