Issues with the pretrained model

chunweit commented 4 years ago

First of all, thanks for your amazing work as well as sharing the necessary resources to encourage the reproducibility of your work. However, I have been encountering some issues to even executing the testing with your pretrained model by following the exact instruction as provided in the readme. I was first attempted with a newer version of PyTorch and TorchVision but later on decided to setup a new virtual environment in order to install the right versions as stated. However, such attempt did not guarantee a smooth execution and I was having the following issue: AttributeError: module 'torch.nn' has no attribute 'SyncBatchNorm'. After some searches, the recommended workaround is to upgrade the PyTorch version from 1.0 to 1.1 and it seemed to be working to solve the SyncBatchNorm issue. Just wondering have you encountered the same issue as I did since the exact same versions of the PyTorch and TorchVision were used?

Another issue that I encountered is as follows: RuntimeError: Error(s) in loading state_dict for baseline: Missing key(s) in state_dict: "total_ops", "total_params", " ... and it seems the problem is coming from self.Model.load_state_dict(torch.load(L.mpath, map_location='cpu')) in the Experiment.py. Appreciate if you could provide me with more details on how such issue could be tackled. Thank you.

BarCodeReader commented 4 years ago

@chunweit the PretrainModel file(not the ImageNet one) given I think is wrong: in Experiement.py, if you print out the model parameters:

encoder.encoder.conv1.weight
encoder.encoder.bn1.weight
encoder.encoder.bn1.bias
encoder.encoder.layer1.0.conv1.weight
encoder.encoder.layer1.0.bn1.weight
encoder.encoder.layer1.0.bn1.bias
...

however, if you print out the PretrainModel file provided:

encoder.encoder.layers.0.0.weight
encoder.encoder.layers.0.1.weight
encoder.encoder.layers.0.1.bias
encoder.encoder.layers.0.1.running_mean
encoder.encoder.layers.0.1.running_var
...

they do not match.

So, @moothes please take a look into the model file you provided and really thanks!

moothes commented 4 years ago

@chunweit Sorry, I have never met the first issue. My environment is Pytorch 1.0.0 and torchvision 0.2.1.

For the second issue, "total_ops" and "total_params" are appended by the Thop, which is utilized to calculate the number of parameters. You can try to delete the 131-133 lines in Loader.py:

 input = torch.randn(1, 3, opt.size, opt.size)
 flops, params = profile(self.Model, inputs=(input, ))
 print('FLOPs: {:.2f}, Params: {:.2f}.'.format(flops / 1e9, params / 1e6))

moothes commented 4 years ago

@chunweit the PretrainModel file(not the ImageNet one) given I think is wrong: in Experiement.py, if you print out the model parameters:
encoder.encoder.conv1.weight
encoder.encoder.bn1.weight
encoder.encoder.bn1.bias
encoder.encoder.layer1.0.conv1.weight
encoder.encoder.layer1.0.bn1.weight
encoder.encoder.layer1.0.bn1.bias
...
however, if you print out the PretrainModel file provided:
encoder.encoder.layers.0.0.weight
encoder.encoder.layers.0.1.weight
encoder.encoder.layers.0.1.bias
encoder.encoder.layers.0.1.running_mean
encoder.encoder.layers.0.1.running_var
...
they do not match.

So, @moothes please take a look into the model file you provided and really thanks!

Sorry, when did you download the trained weights? We already updated the weight file and it should work. Maybe you can download the weights again. If it still doesn't work, I will check it out.

chunweit commented 4 years ago

Sorry, when did you download the trained weights? We already updated the weight file and it should work.

@moothes the downloaded weights were dated "2020-08-13" for both resnet and vgg16 and yet it still doesn't seem to work even I made another attempt.

moothes commented 4 years ago

@BarCodeReader @chunweit I have updated the weights. You can check it again. Sorry for this mistake.

chunweit commented 4 years ago

@moothes thanks for rectifying the resnet-based pretrained model issue. However, I think similar issue also occurred for the VGG-based pretrained model as the keys in the state dict seems doesn't tally at all.

chunweit commented 4 years ago

@moothes thanks for rectifying the resnet-based pretrained model issue. However, I think similar issue also occurred for the VGG-based pretrained model as the keys in the state dict seems doesn't tally at all.

encoder.convs.0.0.weight encoder.convs.0.0.bias encoder.convs.0.2.weight encoder.convs.0.2.bias encoder.convs.1.0.weight encoder.convs.1.0.bias encoder.convs.1.2.weight encoder.convs.1.2.bias encoder.convs.2.0.weight encoder.convs.2.0.bias encoder.convs.2.2.weight encoder.convs.2.2.bias encoder.convs.2.4.weight encoder.convs.2.4.bias encoder.convs.3.0.weight ...

in the provided vgg.pkl: encoder.encoder.features.0.bias encoder.encoder.features.0.weight encoder.encoder.features.10.bias encoder.encoder.features.10.weight encoder.encoder.features.12.bias encoder.encoder.features.12.weight encoder.encoder.features.14.bias encoder.encoder.features.14.weight encoder.encoder.features.17.bias encoder.encoder.features.17.weight encoder.encoder.features.19.bias encoder.encoder.features.19.weight encoder.encoder.features.2.bias encoder.encoder.features.2.weight encoder.encoder.features.21.bias ...

moothes commented 4 years ago

@chunweit You can check lines 4-10 in model.baseline.py file. If you import the official vgg implementation, this weight file can work.

moothes / ITSD-pytorch

Issues with the pretrained model #4