teddykoker / u-noise

Official PyTorch code for U-Noise: Learnable Noise Masks for Interpretable Image Segmentation (ICIP 2021)
https://arxiv.org/abs/2101.05791
39 stars 6 forks source link

Loading pre-trained models #3

Closed jvmedenilla closed 2 years ago

jvmedenilla commented 2 years ago

Hello! First of all great work. While running train_noise pretrained I ran into a problem where the keys in state_dict were wrong. For example, the keys should be something like "model.downs.0.0.weight", but its giving me an error saying my keys were "util_model.model.downs.0.0.weight". I am running this on Python 3.8, was this error due to a different Python version?

teddykoker commented 2 years ago

Hi @jvmedenilla. I don't think this would be an issue with the Python version. Would you mind sharing the exact command you ran to reproduce the problem, and ensure that the versions of the libraries you have match those in the requirements.txt?

jvmedenilla commented 2 years ago

Hi @jvmedenilla. I don't think this would be an issue with the Python version. Would you mind sharing the exact command you ran to reproduce the problem, and ensure that the versions of the libraries you have match those in the requirements.txt?

Yes, I used "pip install -r requirements.txt" to install the correct versions of the packages. To run the training on the pretrained models, I did "python src/train_noise.py --depth 4 --channel_factor 4 --batch_size 8 --pretrained /path/to/pretrained --learning_rate 1e-3".

teddykoker commented 2 years ago

What are you using as the pretrained model in this case? Would you mind providing the full stack trace as well? There are a few places in the code this mismatch could happen. I imagine one potential cause would be attempting to use a pretrained noise model as the pretrained model for a new noise model. The code is built to use a pretrained utility model as an initialization for the noise model, but there is currently no functionality for using a pretrained noise model as an initialization for a new noise model.

jvmedenilla commented 2 years ago

In this case, I am using unoise_large_pretrained.ckpt. So, I first run "python src/train_util.py --depth 4 --channel_factor 4 --batch_size 8 --epochs 5", then I run "python src/train_noise.py --depth 4 --channel_factor 4 --batch_size 8 --pretrained /path/to/unoise_large_pretrained.ckpt --learning_rate 1e-3". The error I keep getting is:

Traceback (most recent call last): File "src/train_noise.py", line 168, in main(args) File "src/train_noise.py", line 117, in main args.pretrained = UtilityModel.load_from_checkpoint( File "/home/ubuntu/anaconda3/envs/unoise1/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 154, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, **kwargs) File "/home/ubuntu/anaconda3/envs/unoise1/lib/python3.8/site-packages/pytorch_lightning/core/saving.py", line 200, in _load_model_state model.load_state_dict(checkpoint['state_dict'], strict=strict) File "/home/ubuntu/anaconda3/envs/unoise1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1044, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UtilityModel: Missing key(s) in state_dict: "model.downs.0.0.weight", (and there's a bunch of these)" Unexpected key(s) in state_dict: "util_model.model.downs.0.0.weight", (and there's a bunch of these)"

It seems like it's trying to store both utility model parameters and noise model parameters to state_dict for Utility Model.

teddykoker commented 2 years ago

When you write --pretrained /path/to/unoise_large_pretrained.ckpt is it the path to one of the unoise models you downloaded? or the path to the utility model you trained in the first step? You will need to use the path to the utility model you trained in the first step, which will show up inlightning_logs/version_X/checkpoints/epoch=X.ckpt

jvmedenilla commented 2 years ago

I tried doing that, but I am still getting the same error. So I ran "python src/train_noise.py --depth 4 --channel_factor 4 --batch_size 8 --pretrained lightning_logs/version_1/checkpoints/epoch=74.ckpt --learning_rate 1e-3".

teddykoker commented 2 years ago

Is lightning_logs/version_1/checkpoints/epoch=74.ckpt is a model you trained using train_util.py? Would you mind sharing the file? Thank you for your patience

jvmedenilla commented 2 years ago

Yes, that is the model I trained using train_util.py. What file do you want to see? I did not change any of the source files yet. Thanks for being helpful too!

teddykoker commented 2 years ago

Hmm.. I'm having trouble here. When I look at a checkpoint for the utility model, I see the expected keys in the state dict:

>>> import torch
>>> torch.load("epoch=0.ckpt")["state_dict"].keys()
odict_keys(['model.downs.0.0.weight', 'model.downs.0.0.bias', 'model.downs.0.1.weight', 'model.downs.0.1.bias', 'model.downs.0.1.running_mean', 'model.downs.0.1.running_var', 'model.downs.0.1.num_batches_tracked', 'model.downs.0.3.weight', 'model.downs.0.3.bias', 'model.downs.0.4.weight', 'model.downs.0.4.bias', 'model.downs.0.4.running_mean', 'model.downs.0.4.running_var', 'model.downs.0.4.num_batches_tracked', 'model.downs.1.0.weight', 'model.downs.1.0.bias', 'model.downs.1.1.weight', 'model.downs.1.1.bias', 'model.downs.1.1.running_mean', 'model.downs.1.1.running_var', 'model.downs.1.1.num_batches_tracked', 'model.downs.1.3.weight', 'model.downs.1.3.bias', 'model.downs.1.4.weight', 'model.downs.1.4.bias', 'model.downs.1.4.running_mean', 'model.downs.1.4.running_var', 'model.downs.1.4.num_batches_tracked', 'model.downs.2.0.weight', 'model.downs.2.0.bias', 'model.downs.2.1.weight', 'model.downs.2.1.bias', 'model.downs.2.1.running_mean', 'model.downs.2.1.running_var', 'model.downs.2.1.num_batches_tracked', 'model.downs.2.3.weight', 'model.downs.2.3.bias', 'model.downs.2.4.weight', 'model.downs.2.4.bias', 'model.downs.2.4.running_mean', 'model.downs.2.4.running_var', 'model.downs.2.4.num_batches_tracked', 'model.downs.3.0.weight', 'model.downs.3.0.bias', 'model.downs.3.1.weight', 'model.downs.3.1.bias', 'model.downs.3.1.running_mean', 'model.downs.3.1.running_var', 'model.downs.3.1.num_batches_tracked', 'model.downs.3.3.weight', 'model.downs.3.3.bias', 'model.downs.3.4.weight', 'model.downs.3.4.bias', 'model.downs.3.4.running_mean', 'model.downs.3.4.running_var', 'model.downs.3.4.num_batches_tracked', 'model.ups.0.up.1.weight', 'model.ups.0.up.1.bias', 'model.ups.0.up.2.weight', 'model.ups.0.up.2.bias', 'model.ups.0.up.2.running_mean', 'model.ups.0.up.2.running_var', 'model.ups.0.up.2.num_batches_tracked', 'model.ups.0.conv.0.weight', 'model.ups.0.conv.0.bias', 'model.ups.0.conv.1.weight', 'model.ups.0.conv.1.bias', 'model.ups.0.conv.1.running_mean', 'model.ups.0.conv.1.running_var', 'model.ups.0.conv.1.num_batches_tracked', 'model.ups.0.conv.3.weight', 'model.ups.0.conv.3.bias', 'model.ups.0.conv.4.weight', 'model.ups.0.conv.4.bias', 'model.ups.0.conv.4.running_mean', 'model.ups.0.conv.4.running_var', 'model.ups.0.conv.4.num_batches_tracked', 'model.ups.1.up.1.weight', 'model.ups.1.up.1.bias', 'model.ups.1.up.2.weight', 'model.ups.1.up.2.bias', 'model.ups.1.up.2.running_mean', 'model.ups.1.up.2.running_var', 'model.ups.1.up.2.num_batches_tracked', 'model.ups.1.conv.0.weight', 'model.ups.1.conv.0.bias', 'model.ups.1.conv.1.weight', 'model.ups.1.conv.1.bias', 'model.ups.1.conv.1.running_mean', 'model.ups.1.conv.1.running_var', 'model.ups.1.conv.1.num_batches_tracked', 'model.ups.1.conv.3.weight', 'model.ups.1.conv.3.bias', 'model.ups.1.conv.4.weight', 'model.ups.1.conv.4.bias', 'model.ups.1.conv.4.running_mean', 'model.ups.1.conv.4.running_var', 'model.ups.1.conv.4.num_batches_tracked', 'model.ups.2.up.1.weight', 'model.ups.2.up.1.bias', 'model.ups.2.up.2.weight', 'model.ups.2.up.2.bias', 'model.ups.2.up.2.running_mean', 'model.ups.2.up.2.running_var', 'model.ups.2.up.2.num_batches_tracked', 'model.ups.2.conv.0.weight', 'model.ups.2.conv.0.bias', 'model.ups.2.conv.1.weight', 'model.ups.2.conv.1.bias', 'model.ups.2.conv.1.running_mean', 'model.ups.2.conv.1.running_var', 'model.ups.2.conv.1.num_batches_tracked', 'model.ups.2.conv.3.weight', 'model.ups.2.conv.3.bias', 'model.ups.2.conv.4.weight', 'model.ups.2.conv.4.bias', 'model.ups.2.conv.4.running_mean', 'model.ups.2.conv.4.running_var', 'model.ups.2.conv.4.num_batches_tracked', 'model.conv1x1.weight', 'model.conv1x1.bias'])

All of the weights have the model prefix as expected. Could you try running these few lines for the utility model you are using? I'm having trouble understanding how the util_model prefix would appear on a model trained with the train_util script, since the Utility model itself doesn't actually have a util_model attribute.

jvmedenilla commented 2 years ago

I got the same output you displayed! Did you change anything else before running "python src/train_noise.py --depth 4 --channel_factor 4 --batch_size 8 --pretrained lightning_logs/version_X/checkpoints/epoch=X.ckpt --learning_rate 1e-3" ?

Also I am not quite sure I understand these parameters correctly: image

So the utility model would be the pretrained unet provided in the models directory, but the hyperparameter "pretrained" pertains to the pretrained noise model to use, according to the comment/description. However you said it was the utility model I trained?

teddykoker commented 2 years ago

I think I understand the confusion - The U-Noise architecture consists of two models:

  1. the utility model. This performs the primary task (e.g pancreas segmentation in this case)
  2. the noise model. This learns to generate noise mask covering the input image such that the performance of the utility model remains the highest.

However one thing we observed in our paper is that initializing the noise model, with a smaller, pretrained utility model, will provide better noise masks than a randomly initialized one. This is where the --pretrained argument comes into play. It expects a pretrained utility model which it uses to initialize the noise model in the architecture. The code currently does not support using a pretrained noise model to initialize the noise model, which would cause the mismatch of keys. This would be easy to add, but I'm not sure it makes sense in this case.

jvmedenilla commented 2 years ago

That makes sense! However, I think I am just not going to use the pretrained model. I am getting decent results from training the noise model from scratch anyway. Thank you so much for your help!

teddykoker commented 2 years ago

Great, happy to help :)

jvmedenilla commented 2 years ago

Follow up question, do you have a code for the utility model's (UNet) output? I want to use UNoise to see how it does on a very poorly trained UNet model. So then in make_visualizations.py, I tried using the saved model from running train_util.py (i.e. lightning_logs/version_X/checkpoints/epoch=X.ckpt), but it's giving me mismatching sizes for the parameters. Whereas, whenever I just use the pretrained unet, utility.ckpt, I encounter no problem. Any suggestions?

teddykoker commented 2 years ago

Do you think you could provide a little more detail as to what you are trying to do? Visualize the mask predictions for a utility model? Visualize the mask predictions for a utility model on an image that has noise added? Where exactly is the size mismatch? I'd be happy to have a quick call if it helps.

jvmedenilla commented 2 years ago

I am trying to visualize the mask predictions for a utility model, yes. I want to see where the utility model is focusing when it does a poor job on segmentation. A call would be nice. You can reach me at 919-924-2855. Thanks!

jvmedenilla commented 2 years ago

Hi Teddy! Can you send me an email at jonvincent29@gmail.com? I am still stuck on this problem. Thank you!