Key Mismatch Issue When Loading Pre-trained Model Weights

SeunghanYu commented 2 months ago

Hi @shenyehui and @Chen-Xieyuanli !

I encountered an error when trying to load a pre-trained model downloaded from this Dropbox link as listed in the pre-trained models section.

Here's the code I'm using:

network = importlib.import_module('tmp.models.{}'.format(self.opt.net))
model = network.deliver_model(self.opt, self.opt.phase[-3:]).to(self.device)
checkpoint = torch.load(self.opt.resume)
model.load_state_dict(checkpoint['state_dict'])

The error suggests that the keys in the model do not match the keys in checkpoint['state_dict']. The specific error message is:

RuntimeError: Error(s) in loading state_dict for StudentNet:
    Missing key(s) in state_dict: ...
    Unexpected key(s) in state_dict: ...

Total keys in model: 1153
Total keys in checkpoint: 1217
Missing keys count: 294
Unexpected keys count: 358

It seems that the model and the checkpoint have a significant discrepancy in their keys. Is the issue due to a difference in the model code, or could the pre-trained model provided in the link be different from what the code expects?

Looking forward to your reply. Thank you in advance!

shenyehui commented 2 months ago

Hi @shenyehui and @Chen-Xieyuanli !

I encountered an error when trying to load a pre-trained model downloaded from this Dropbox link as listed in the pre-trained models section.

Here's the code I'm using:
network = importlib.import_module('tmp.models.{}'.format(self.opt.net))
model = network.deliver_model(self.opt, self.opt.phase[-3:]).to(self.device)
checkpoint = torch.load(self.opt.resume)
model.load_state_dict(checkpoint['state_dict'])
The error suggests that the keys in the model do not match the keys in checkpoint['state_dict']. The specific error message is:
RuntimeError: Error(s) in loading state_dict for StudentNet:
    Missing key(s) in state_dict: ...
    Unexpected key(s) in state_dict: ...
Total keys in model: 1153

Total keys in checkpoint: 1217

Missing keys count: 294

Unexpected keys count: 358

It seems that the model and the checkpoint have a significant discrepancy in their keys. Is the issue due to a difference in the model code, or could the pre-trained model provided in the link be different from what the code expects?

Looking forward to your reply. Thank you in advance!

Hello, I'm very glad that you're interested in our code. I didn't encounter this issue when running it on my computer, so the problem might be related to the downloaded pre-trained ResNet50 model. You can try running tscm.py directly to see if the issue persists. If there is no issue, you can check the following path:

Linux/macOS: ~/.cache/torch/hub/checkpoints/ Windows: C:\Users\.cache\torch\hub\checkpoints\ and see which version of the ResNet50 model was loaded, and whether it's the same version as mine. I downloaded the resnet50-0676ba61.pth version, and you can download it from this URL: "https://download.pytorch.org/models/resnet50-0676ba61.pth".

SeunghanYu commented 2 months ago

Thanks to your fast response! @shenyehui

I was wondering if you could share the environment setup for the TSCM code. I am currently using Conda, and I would like to match my environment exactly with yours to ensure compatibility.

shenyehui commented 2 months ago

Thanks to your fast response! @shenyehui

I was wondering if you could share the environment setup for the TSCM code. I am currently using Conda, and I would like to match my environment exactly with yours to ensure compatibility.

Did my response above solve your problem? Since the cloud server I rented before has expired, I am unable to check the environment and libraries used when running TSCM. I remember the configuration of the cloud server I rented was as follows: PyTorch 1.11.0, Python 3.8 (Ubuntu 20.04), and CUDA 11.3. If you encounter any missing libraries while running the code, you can simply install them as needed.

shenyehui commented 2 months ago

Thanks to your fast response! @shenyehui

I was wondering if you could share the environment setup for the TSCM code. I am currently using Conda, and I would like to match my environment exactly with yours to ensure compatibility.

I have added a requirements.txt file based on my memory. If any libraries are missing, you can refer to it.

nubot-nudt / TSCM

Key Mismatch Issue When Loading Pre-trained Model Weights #1