layumi / Person_reID_baseline_pytorch

:bouncing_ball_person: Pytorch ReID: A tiny, friendly, strong pytorch implement of person re-id / vehicle re-id baseline. Tutorial 👉https://github.com/layumi/Person_reID_baseline_pytorch/tree/master/tutorial
https://www.zdzheng.xyz
MIT License
4.12k stars 1k forks source link

RuntimeError: Error(s) in loading state_dict for ft_net_swin #382

Open saminheydarian97 opened 1 year ago

saminheydarian97 commented 1 year ago

Hi. I used the ft_net_swin for loading the model. When I run my code in my device there is no problem but in the kaggle I got this error.

File /opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py:2041, in Module.load_state_dict(self, state_dict, strict)
   2036         error_msgs.insert(
   2037             0, 'Missing key(s) in state_dict: {}. '.format(
   2038                 ', '.join('"{}"'.format(k) for k in missing_keys)))
   2040 if len(error_msgs) > 0:
-> 2041     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
   2042                        self.__class__.__name__, "\n\t".join(error_msgs)))
   2043 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for ft_net_swin:
    Missing key(s) in state_dict: "model.layers.3.downsample.norm.weight", "model.layers.3.downsample.norm.bias", "model.layers.3.downsample.reduction.weight". 
    Unexpected key(s) in state_dict: "model.layers.0.downsample.norm.weight", "model.layers.0.downsample.norm.bias", "model.layers.0.downsample.reduction.weight". 
    size mismatch for model.layers.1.downsample.norm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for model.layers.1.downsample.norm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([512]).
    size mismatch for model.layers.1.downsample.reduction.weight: copying a param with shape torch.Size([512, 1024]) from checkpoint, the shape in current model is torch.Size([256, 512]).
    size mismatch for model.layers.2.downsample.norm.weight: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]).
    size mismatch for model.layers.2.downsample.norm.bias: copying a param with shape torch.Size([2048]) from checkpoint, the shape in current model is torch.Size([1024]).
    size mismatch for model.layers.2.downsample.reduction.weight: copying a param with shape torch.Size([1024, 2048]) from checkpoint, the shape in current model is torch.Size([512, 1024]).
layumi commented 1 year ago

It seems different model structure in downsample layers. The reason can be different timm package like https://github.com/layumi/Person_reID_baseline_pytorch/issues/334, which updates swin model.

pip install git+https://github.com/rwightman/pytorch-image-models.git

I suggest you may consider to re-train your model under the same environment. It usually works.