Closed leefly072 closed 3 years ago
I have the same problem. Did you solve this issue? I printed the state_dicts of LTE and LTE_copy together with the GPU, the model is running on:
LTE 1 odict_keys([]) LTE_Copy 1 odict_keys([]) LTE 0 odict_keys(['sub_mean.weight', 'sub_mean.bias']) LTE_Copy 0 odict_keys(['slice1.0.weight', 'slice1.0.bias', 'slice2.2.weight', 'slice2.2.bias', 'slice2.5.weight', 'slice2.5.bias', 'slice3.7.weight', 'slice3.7.bias', 'slice3.10.weight', 'slice3.10.bias', 'sub_mean.weight', 'sub_mean.bias'])
If I set strict = False in load_state_dict then everything runs smoothly. But isn't that just ignoring the problem?
@23vil Hey I met exactly the same problem and I saw your question posted on stackoverflow as well as the pytorch forum. Did you find out the cause eventually?
Dear author : this is a very interesting paper, and thank you very much for you share the code. But there is some problem when I try to run the model. when I training the data in the four GPU ,there has some error in the follow. if you can give me a hand, I will very appreciate for you kindness. Thank you very much again.
Traceback (most recent call last): File "/data/lifei/TTSR-master/main.py", line 51, in
t.train(current_epoch=epoch, is_init=False)
File "/data/lifei/TTSR-master/trainer.py", line 97, in train
sr_lv1, sr_lv2, sr_lv3 = self.model(sr=sr).cuda()
File "/home/lifei/.conda/envs/TTSR/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(*input, kwargs)
File "/home/lifei/.conda/envs/TTSR/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 155, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/lifei/.conda/envs/TTSR/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/lifei/.conda/envs/TTSR/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
output.reraise()
File "/home/lifei/.conda/envs/TTSR/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/lifei/.conda/envs/TTSR/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
output = module(*input, *kwargs)
File "/home/lifei/.conda/envs/TTSR/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
result = self.forward(input, kwargs)
File "/data/lifei/TTSR-master/model/TTSR.py", line 22, in forward
self.LTE_copy.load_state_dict(self.LTE.state_dict()).cuda()
File "/home/lifei/.conda/envs/TTSR/lib/python3.8/site-packages/torch/nn/modules/module.py", line 846, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LTE:
Missing key(s) in state_dict: "slice1.0.weight", "slice1.0.bias", "slice2.2.weight", "slice2.2.bias", "slice2.5.weight", "slice2.5.bias", "slice3.7.weight", "slice3.7.bias", "slice3.10.weight", "slice3.10.bias".