qubvel-org / segmentation_models.pytorch

Semantic segmentation models with 500+ pretrained convolutional and transformer-based backbones.
https://smp.readthedocs.io/
MIT License
9.44k stars 1.65k forks source link

RuntimeError: Error(s) in loading state_dict for Unet #105

Closed mobassir94 closed 2 years ago

mobassir94 commented 4 years ago

i get the error below when i try to use my unet with se_resnext50 models trained weight file for ensembling :

RuntimeError: Error(s) in loading state_dict for Unet: Missing key(s) in state_dict: "decoder.blocks.0.conv1.0.weight", "decoder.blocks.0.conv1.1.weight", "decoder.blocks.0.conv1.1.bias", "decoder.blocks.0.conv1.1.running_mean", "decoder.blocks.0.conv1.1.running_var", "decoder.blocks.0.conv2.0.weight", "decoder.blocks.0.conv2.1.weight", "decoder.blocks.0.conv2.1.bias", "decoder.blocks.0.conv2.1.running_mean", "decoder.blocks.0.conv2.1.running_var", "decoder.blocks.1.conv1.0.weight", "decoder.blocks.1.conv1.1.weight", "decoder.blocks.1.conv1.1.bias", "decoder.blocks.1.conv1.1.running_mean", "decoder.blocks.1.conv1.1.running_var", "decoder.blocks.1.conv2.0.weight", "decoder.blocks.1.conv2.1.weight", "decoder.blocks.1.conv2.1.bias", "decoder.blocks.1.conv2.1.running_mean", "decoder.blocks.1.conv2.1.running_var", "decoder.blocks.2.conv1.0.weight", "decoder.blocks.2.conv1.1.weight", "decoder.blocks.2.conv1.1.bias", "decoder.blocks.2.conv1.1.running_mean", "decoder.blocks.2.conv1.1.running_var", "decoder.blocks.2.conv2.0.weight", "decoder.blocks.2.conv2.1.weight", "decoder.blocks.2.conv2.1.bias", "decoder.blocks.2.conv2.1.running_mean", "decoder.blocks.2.conv2.1.running_var", "decoder.blocks.3.conv1.0.weight", "decoder.blocks.3.conv1.1.weight", "decoder.blocks.3.conv1.1.bias", "decoder.blocks.3.conv1.1.running_mean", "decoder.blocks.3.conv1.1.running_var", "decoder.blocks.3.conv2.0.weight", "decoder.blocks.3.conv2.1.weight", "decoder.blocks.3.conv2.1.bias", "decoder.blocks.3.conv2.1.running_mean", "decoder.blocks.3.conv2.1.running_var", "decoder.blocks.4.conv1.0.weight", "decoder.blocks.4.conv1.1.weight", "decoder.blocks.4.conv1.1.bias", "decoder.blocks.4.conv1.1.running_mean", "decoder.blocks.4.conv1.1.running_var", "decoder.blocks.4.conv2.0.weight", "decoder.blocks.4.conv2.1.weight", "decoder.blocks.4.conv2.1.bias", "decoder.blocks.4.conv2.1.running_mean", "decoder.blocks.4.conv2.1.running_var", "segmentation_head.0.weight", "segmentation_head.0.bias". Unexpected key(s) in state_dict: "decoder.layer1.block.0.block.0.weight", "decoder.layer1.block.0.block.1.weight", "decoder.layer1.block.0.block.1.bias", "decoder.layer1.block.0.block.1.running_mean", "decoder.layer1.block.0.block.1.running_var", "decoder.layer1.block.0.block.1.num_batches_tracked", "decoder.layer1.block.1.block.0.weight", "decoder.layer1.block.1.block.1.weight", "decoder.layer1.block.1.block.1.bias", "decoder.layer1.block.1.block.1.running_mean", "decoder.layer1.block.1.block.1.running_var", "decoder.layer1.block.1.block.1.num_batches_tracked", "decoder.layer2.block.0.block.0.weight", "decoder.layer2.block.0.block.1.weight", "decoder.layer2.block.0.block.1.bias", "decoder.layer2.block.0.block.1.running_mean", "decoder.layer2.block.0.block.1.running_var", "decoder.layer2.block.0.block.1.num_batches_tracked", "decoder.layer2.block.1.block.0.weight", "decoder.layer2.block.1.block.1.weight", "decoder.layer2.block.1.block.1.bias", "decoder.layer2.block.1.block.1.running_mean", "decoder.layer2.block.1.block.1.running_var", "decoder.layer2.block.1.block.1.num_batches_tracked", "decoder.layer3.block.0.block.0.weight", "decoder.layer3.block.0.block.1.weight", "decoder.layer3.block.0.block.1.bias", "decoder.layer3.block.0.block.1.running_mean", "decoder.layer3.block.0.block.1.running_var", "decoder.layer3.block.0.block.1.num_batches_tracked", "decoder.layer3.block.1.block.0.weight", "decoder.layer3.block.1.block.1.weight", "decoder.layer3.block.1.block.1.bias", "decoder.layer3.block.1.block.1.running_mean", "decoder.layer3.block.1.block.1.running_var", "decoder.layer3.block.1.block.1.num_batches_tracked", "decoder.layer4.block.0.block.0.weight", "decoder.layer4.block.0.block.1.weight", "decoder.layer4.block.0.block.1.bias", "decoder.layer4.block.0.block.1.running_mean", "decoder.layer4.block.0.block.1.running_var", "decoder.layer4.block.0.block.1.num_batches_tracked", "decoder.layer4.block.1.block.0.weight", "decoder.layer4.block.1.block.1.weight", "decoder.layer4.block.1.block.1.bias", "decoder.layer4.block.1.block.1.running_mean", "decoder.layer4.block.1.block.1.running_var", "decoder.layer4.block.1.block.1.num_batches_tracked", "decoder.layer5.block.0.block.0.weight", "decoder.layer5.block.0.block.1.weight", "decoder.layer5.block.0.block.1.bias", "decoder.layer5.block.0.block.1.running_mean", "decoder.layer5.block.0.block.1.running_var", "decoder.layer5.block.0.block.1.num_batches_tracked", "decoder.layer5.block.1.block.0.weight", "decoder.layer5.block.1.block.1.weight", "decoder.layer5.block.1.block.1.bias", "decoder.layer5.block.1.block.1.running_mean", "decoder.layer5.block.1.block.1.running_var", "decoder.layer5.block.1.block.1.num_batches_tracked", "decoder.final_conv.weight", "decoder.final_conv.bias".

qubvel commented 4 years ago

Hi @mobassir94 Models have been updated, use previous version (0.3) for weights compatibility, for new trained version you can use a new one (with new features described in readme).

mobassir94 commented 4 years ago

that hurts brother, i have trained several models for my research purpose, some of them used your previous version and some used new version,each model training took almost 9 hours in kaggle kernel,i have also lost a lot of gpu hours for training several models :( now i can't ensemble anymore! i trained fpn with inceptionresnetv2 last night,i can load that weight file easily but i can't load se_resnext50 that i trained 3/4 weeks ago after kaggle finished pneumothorax competition! any help for ensembling your different versions model?

mobassir94 commented 4 years ago

i have also tried your model averaging code,this one : https://gist.github.com/qubvel/70c3d5e4cddcde731408f478e12ef87b

but you could make it more clear for example i don't understand how to use this LOC : checkpoints_weights_paths: List[str] = ... # sorted in descending order by score should i replace ... with weight path? will the weight path look like this : '../input/siimensemble/' if it is in kaggle kernel? also,how to use this LOC? model: torch.nn.Module = ... i am not understanding what to use there instead of ... a real working demo video and probably a working kaggle kernel using that code would help,thanks

qubvel commented 4 years ago

You can trace your models (read about tracing in pytorch) -> they will not require source code.

  1. Install previous version, create (old) models, load weights, trace, save.
  2. Install new version, create (new) models, load weights, trace, save.
  3. Load all models with torch.jit.load
  4. Ensemble
qubvel commented 4 years ago

https://gist.github.com/qubvel/70c3d5e4cddcde731408f478e12ef87b this gist demonstrate how to average weights for a SINGLE model and SINGLE experiment

  1. Save N best checkpoints
  2. Average them

This is not "take and use" code, just to demonstrate a concept.

mobassir94 commented 4 years ago

are you asking me to do something like this? :

from torch.jit import load

unet_se_resnext50_32x4d = \ load('/kaggle/input/severstalmodels/unet_se_resnext50_32x4d.pth').cuda() unet_mobilenet2 = load('/kaggle/input/severstalmodels/unet_mobilenet2.pth').cuda()

for loading weights?

mobassir94 commented 4 years ago

if your answer is yes,then i get this error :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-10-f32c4ffea105> in <module>
      1 from torch.jit import load
      2 
----> 3 unet_se_resnext50_32x4d = load('/kaggle/input/siimensemble/fpninceptionresnetv2.pth').cuda()
      4 unet_mobilenet2 = load('/kaggle/input/siimensemble/unetse_resnext50_32x4d.pth').cuda()

/opt/conda/lib/python3.6/site-packages/torch/jit/__init__.py in load(f, map_location, _extra_files)
    160             (sys.version_info[0] == 2 and isinstance(f, unicode)) or \
    161             (sys.version_info[0] == 3 and isinstance(f, pathlib.Path)):
--> 162         cpp_module = torch._C.import_ir_module(cu, f, map_location, _extra_files)
    163     else:
    164         cpp_module = torch._C.import_ir_module_from_buffer(cu, f.read(), map_location, _extra_files)

RuntimeError: [enforce fail at inline_container.cc:137] . PytorchStreamReader failed reading zip archive: failed finding central directory
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x5b (0x7f0fc1bc6bcb in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::valid(char const*) + 0x6b (0x7f0fc471e0db in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::init() + 0x9a (0x7f0fc4721b8a in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #3: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x60 (0x7f0fc4724bf0 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #4: torch::jit::import_ir_module(std::shared_ptr<torch::jit::script::CompilationUnit>, std::string const&, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&) + 0x38 (0x7f0fc58039d8 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #5: <unknown function> + 0x4d6903 (0x7f0ff1606903 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: <unknown function> + 0x1c8316 (0x7f0ff12f8316 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #7: _PyCFunction_FastCallDict + 0x154 (0x56401d1251a4 in /opt/conda/bin/python)
frame #8: <unknown function> + 0x19a43c (0x56401d1ac43c in /opt/conda/bin/python)
frame #9: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
frame #10: <unknown function> + 0x193814 (0x56401d1a5814 in /opt/conda/bin/python)
frame #11: <unknown function> + 0x1946b1 (0x56401d1a66b1 in /opt/conda/bin/python)
frame #12: <unknown function> + 0x19a515 (0x56401d1ac515 in /opt/conda/bin/python)
frame #13: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
frame #14: PyEval_EvalCodeEx + 0x329 (0x56401d1a71c9 in /opt/conda/bin/python)
frame #15: PyEval_EvalCode + 0x1c (0x56401d1a7f5c in /opt/conda/bin/python)
frame #16: <unknown function> + 0x1bc46b (0x56401d1ce46b in /opt/conda/bin/python)
frame #17: _PyCFunction_FastCallDict + 0x91 (0x56401d1250e1 in /opt/conda/bin/python)
frame #18: <unknown function> + 0x19a43c (0x56401d1ac43c in /opt/conda/bin/python)
frame #19: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
frame #20: _PyGen_Send + 0x256 (0x56401d1af586 in /opt/conda/bin/python)
frame #21: _PyEval_EvalFrameDefault + 0x1311 (0x56401d1d1a31 in /opt/conda/bin/python)
frame #22: _PyGen_Send + 0x256 (0x56401d1af586 in /opt/conda/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x1311 (0x56401d1d1a31 in /opt/conda/bin/python)
frame #24: _PyGen_Send + 0x256 (0x56401d1af586 in /opt/conda/bin/python)
frame #25: _PyCFunction_FastCallDict + 0x115 (0x56401d125165 in /opt/conda/bin/python)
frame #26: <unknown function> + 0x19a43c (0x56401d1ac43c in /opt/conda/bin/python)
frame #27: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
frame #28: <unknown function> + 0x19447b (0x56401d1a647b in /opt/conda/bin/python)
frame #29: <unknown function> + 0x19a515 (0x56401d1ac515 in /opt/conda/bin/python)
frame #30: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
frame #31: <unknown function> + 0x19447b (0x56401d1a647b in /opt/conda/bin/python)
frame #32: <unknown function> + 0x19a515 (0x56401d1ac515 in /opt/conda/bin/python)
frame #33: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
frame #34: <unknown function> + 0x193814 (0x56401d1a5814 in /opt/conda/bin/python)
frame #35: _PyFunction_FastCallDict + 0x3d7 (0x56401d1a6da7 in /opt/conda/bin/python)
frame #36: _PyObject_FastCallDict + 0x26f (0x56401d12556f in /opt/conda/bin/python)
frame #37: _PyObject_Call_Prepend + 0x63 (0x56401d129fe3 in /opt/conda/bin/python)
frame #38: PyObject_Call + 0x3e (0x56401d124fae in /opt/conda/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0x1bba (0x56401d1d22da in /opt/conda/bin/python)
frame #40: <unknown function> + 0x1939b6 (0x56401d1a59b6 in /opt/conda/bin/python)
frame #41: <unknown function> + 0x1946b1 (0x56401d1a66b1 in /opt/conda/bin/python)
frame #42: <unknown function> + 0x19a515 (0x56401d1ac515 in /opt/conda/bin/python)
frame #43: _PyEval_EvalFrameDefault + 0x1014 (0x56401d1d1734 in /opt/conda/bin/python)
frame #44: <unknown function> + 0x19d0f0 (0x56401d1af0f0 in /opt/conda/bin/python)
frame #45: _PyCFunction_FastCallDict + 0x91 (0x56401d1250e1 in /opt/conda/bin/python)
frame #46: <unknown function> + 0x19a43c (0x56401d1ac43c in /opt/conda/bin/python)
frame #47: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
frame #48: <unknown function> + 0x193b3e (0x56401d1a5b3e in /opt/conda/bin/python)
frame #49: <unknown function> + 0x1946b1 (0x56401d1a66b1 in /opt/conda/bin/python)
frame #50: <unknown function> + 0x19a515 (0x56401d1ac515 in /opt/conda/bin/python)
frame #51: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
frame #52: <unknown function> + 0x19d0f0 (0x56401d1af0f0 in /opt/conda/bin/python)
frame #53: _PyCFunction_FastCallDict + 0x91 (0x56401d1250e1 in /opt/conda/bin/python)
frame #54: <unknown function> + 0x19a43c (0x56401d1ac43c in /opt/conda/bin/python)
frame #55: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
frame #56: <unknown function> + 0x193b3e (0x56401d1a5b3e in /opt/conda/bin/python)
frame #57: <unknown function> + 0x1946b1 (0x56401d1a66b1 in /opt/conda/bin/python)
frame #58: <unknown function> + 0x19a515 (0x56401d1ac515 in /opt/conda/bin/python)
frame #59: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
frame #60: <unknown function> + 0x19d0f0 (0x56401d1af0f0 in /opt/conda/bin/python)
frame #61: _PyCFunction_FastCallDict + 0x91 (0x56401d1250e1 in /opt/conda/bin/python)
frame #62: <unknown function> + 0x19a43c (0x56401d1ac43c in /opt/conda/bin/python)
frame #63: _PyEval_EvalFrameDefault + 0x2fa (0x56401d1d0a1a in /opt/conda/bin/python)
qubvel commented 4 years ago

First, you have to save model with torch.jit.save, than you can load it without any code dependencies (e.g. like keras one)

mobassir94 commented 4 years ago

that means i will have to re train all my past trained models and use torch.jit.save to make the code above working for new weight files? do you have any plan on releasing more versions within next 2/3 weeks?

qubvel commented 4 years ago

No, read carefully.

  1. Install previous version, create (old) models, load weights, trace, save.
  2. Install new version, create (new) models, load weights, trace, save.
  3. Load all models with torch.jit.load
  4. Ensemble

You can install version 0.0.3, load weight for models which was trained with that version, trace and save this models as traced modules Than reapeat the same but for version 0.1.0

Traces modules can be loded without even segmentation_models.pytorch installed. I recomend you always save your model as traced, this will help you not to depends on code that create those models.

qubvel commented 4 years ago

And it is also important to fix version of package you are working with. Always use pip install package==x.x.x or pip install git+path/to/repo@<commit_hash> This will help you in future to avoid such problems. All packages intoduce new features and it is not always possible to preserve backward compatability.

mobassir94 commented 4 years ago

installed previous version then using this code : ckpt_path = "/kaggle/input/siimensemble/unetse_resnext50_32x4d.pth" device = torch.device("cuda") unet_resnext = smp.Unet("se_resnext50_32x4d", encoder_weights=None, activation=None) unet_resnext.to(device) unet_resnext.eval() state = torch.load(ckpt_path, map_location=lambda storage, loc: storage) unet_resnext.load_state_dict(state["state_dict"])

i have loaded weight and got the message "All keys matched successfully"

but how to trace and save? will you write code in the the next comment so that i can add them in my code?

mobassir94 commented 4 years ago

i have trained unet with resnet50 last night using latest version of your library,but today i can't load the weight,i am getting error like before :

RuntimeError: Error(s) in loading state_dict for Unet: Missing key(s) in state_dict: "decoder.blocks.0.conv1.0.weight", "decoder.blocks.0.conv1.1.weight", "decoder.blocks.0.conv1.1.bias", "decoder.blocks.0.conv1.1.running_mean", "decoder.blocks.0.conv1.1.running_var", "decoder.blocks.0.conv2.0.weight", "decoder.blocks.0.conv2.1.weight", "decoder.blocks.0.conv2.1.bias", "decoder.blocks.0.conv2.1.running_mean", "decoder.blocks.0.conv2.1.running_var", "decoder.blocks.1.conv1.0.weight", "decoder.blocks.1.conv1.1.weight", "decoder.blocks.1.conv1.1.bias", "decoder.blocks.1.conv1.1.running_mean", "decoder.blocks.1.conv1.1.running_var", "decoder.blocks.1.conv2.0.weight", "decoder.blocks.1.conv2.1.weight", "decoder.blocks.1.conv2.1.bias", "decoder.blocks.1.conv2.1.running_mean", "decoder.blocks.1.conv2.1.running_var", "decoder.blocks.2.conv1.0.weight", "decoder.blocks.2.conv1.1.weight", "decoder.blocks.2.conv1.1.bias", "decoder.blocks.2.conv1.1.running_mean", "decoder.blocks.2.conv1.1.running_var", "decoder.blocks.2.conv2.0.weight", "decoder.blocks.2.conv2.1.weight", "decoder.blocks.2.conv2.1.bias", "decoder.blocks.2.conv2.1.running_mean", "decoder.blocks.2.conv2.1.running_var", "decoder.blocks.3.conv1.0.weight", "decoder.blocks.3.conv1.1.weight", "decoder.blocks.3.conv1.1.bias", "decoder.blocks.3.conv1.1.running_mean", "decoder.blocks.3.conv1.1.running_var", "decoder.blocks.3.conv2.0.weight", "decoder.blocks.3.conv2.1.weight", "decoder.blocks.3.conv2.1.bias", "decoder.blocks.3.conv2.1.running_mean", "decoder.blocks.3.conv2.1.running_var", "decoder.blocks.4.conv1.0.weight", "decoder.blocks.4.conv1.1.weight", "decoder.blocks.4.conv1.1.bias", "decoder.blocks.4.conv1.1.running_mean", "decoder.blocks.4.conv1.1.running_var", "decoder.blocks.4.conv2.0.weight", "decoder.blocks.4.conv2.1.weight", "decoder.blocks.4.conv2.1.bias", "decoder.blocks.4.conv2.1.running_mean", "decoder.blocks.4.conv2.1.running_var", "segmentation_head.0.weight", "segmentation_head.0.bias". Unexpected key(s) in state_dict: "decoder.layer1.block.0.block.0.weight", "decoder.layer1.block.0.block.1.weight", "decoder.layer1.block.0.block.1.bias", "decoder.layer1.block.0.block.1.running_mean", "decoder.layer1.block.0.block.1.running_var", "decoder.layer1.block.0.block.1.num_batches_tracked", "decoder.layer1.block.1.block.0.weight", "decoder.layer1.block.1.block.1.weight", "decoder.layer1.block.1.block.1.bias", "decoder.layer1.block.1.block.1.running_mean", "decoder.layer1.block.1.block.1.running_var", "decoder.layer1.block.1.block.1.num_batches_tracked", "decoder.layer2.block.0.block.0.weight", "decoder.layer2.block.0.block.1.weight", "decoder.layer2.block.0.block.1.bias", "decoder.layer2.block.0.block.1.running_mean", "decoder.layer2.block.0.block.1.running_var", "decoder.layer2.block.0.block.1.num_batches_tracked", "decoder.layer2.block.1.block.0.weight", "decoder.layer2.block.1.block.1.weight", "decoder.layer2.block.1.block.1.bias", "decoder.layer2.block.1.block.1.running_mean", "decoder.layer2.block.1.block.1.running_var", "decoder.layer2.block.1.block.1.num_batches_tracked", "decoder.layer3.block.0.block.0.weight", "decoder.layer3.block.0.block.1.weight", "decoder.layer3.block.0.block.1.bias", "decoder.layer3.block.0.block.1.running_mean", "decoder.layer3.block.0.block.1.running_var", "decoder.layer3.block.0.block.1.num_batches_tracked", "decoder.layer3.block.1.block.0.weight", "decoder.layer3.block.1.block.1.weight", "decoder.layer3.block.1.block.1.bias", "decoder.layer3.block.1.block.1.running_mean", "decoder.layer3.block.1.block.1.running_var", "decoder.layer3.block.1.block.1.num_batches_tracked", "decoder.layer4.block.0.block.0.weight", "decoder.layer4.block.0.block.1.weight", "decoder.layer4.block.0.block.1.bias", "decoder.layer4.block.0.block.1.running_mean", "decoder.layer4.block.0.block.1.running_var", "decoder.layer4.block.0.block.1.num_batches_tracked", "decoder.layer4.block.1.block.0.weight", "decoder.layer4.block.1.block.1.weight", "decoder.layer4.block.1.block.1.bias", "decoder.layer4.block.1.block.1.running_mean", "decoder.layer4.block.1.block.1.running_var", "decoder.layer4.block.1.block.1.num_batches_tracked", "decoder.layer5.block.0.block.0.weight", "decoder.layer5.block.0.block.1.weight", "decoder.layer5.block.0.block.1.bias", "decoder.layer5.block.0.block.1.running_mean", "decoder.layer5.block.0.block.1.running_var", "decoder.layer5.block.0.block.1.num_batches_tracked", "decoder.layer5.block.1.block.0.weight", "decoder.layer5.block.1.block.1.weight", "decoder.layer5.block.1.block.1.bias", "decoder.layer5.block.1.block.1.running_mean", "decoder.layer5.block.1.block.1.running_var", "decoder.layer5.block.1.block.1.num_batches_tracked", "decoder.final_conv.weight", "decoder.final_conv.bias".

qubvel commented 4 years ago
model.eval()
for m in model.modules():
    m.requires_grad = False

sample = torch.ones([1, 3, 64, 64]).to(device)
traced_module = torch.jit.trace(model, sample)
torch.jit.save(traced_module, "path/to/model.ptt")

P.S. Then load with following code in your ensemble script

model = torch.jit.load("path/to/model.ptt")
qubvel commented 4 years ago

check the version

smp.__version__
qubvel commented 4 years ago

Looks like it was not the latest version, you may have a problem in your environment setup

mobassir94 commented 4 years ago

file extension should be ptt not pth?

mobassir94 commented 4 years ago

i have used this code to install smp and model(unet with resnet50) for training : !pip install git+https://github.com/qubvel/segmentation_models.pytorch > /dev/null 2>&1 # Install segmentations_models.pytorch, with no bash output.

also using same command i re installed smp in another kaggle kernel to access the weight file i trained last night but i am getting RuntimeError as mentioned above,why?

mobassir94 commented 4 years ago

got error : AttributeError: module 'segmentation_models_pytorch' has no attribute 'version' while trying smp.version

qubvel commented 4 years ago

try with:

from segmentation_models_pytorch import __version__
print(__version__.__version__)
qubvel commented 4 years ago

And use -U option while (re-)installing package

qubvel commented 4 years ago

Actually pytorch create zip archive, but you can save with any extension, I choose ptt to be able to distinguish whether it is traced module or not.

mobassir94 commented 4 years ago

trying but can you please tell me 2 things?

  1. why using same command for installing your library both in training kernel and inference kernel(for loading weights) get runtime error?
  2. the LOC : sample = torch.ones([1, 3, 64, 64]).to(device) do we use same code for all models weight file? and why is that?
qubvel commented 4 years ago
  1. Don`t know why you have RuntimeError
  2. This is just sample of data to record operations (so-called tracing) on tensor, yes you can use it for all segmentation models, or you can use any of images from your dataset
mobassir94 commented 4 years ago

after trying : from segmentation_models_pytorch import version

print(__version__.__version__)

it shows version = 0.1.0

Now please tell me one thing : "why i am getting runtimeerror"? except using efficientnet's i tried to use all other encoders of this library with version from : " 0.1.0"

also in inference kernel i used version " 0.1.0" for loading those weights but except inceptionresnetv2 it seems like for most of the other architectures i get runtimeerror as mentioned above! i hope you solve this issue ASAP,thanks

qubvel commented 4 years ago

I am not able to fix this, this is a problem of your version mismatch. If you will try to create model, save state dict and load it back in one kernel - this will work. So the reason - you have different model, that is why you get runtime error. Be more careful with version control next time.

phenomenal-manish commented 3 years ago
model.eval()
for m in model.modules():
    m.requires_grad = False

sample = torch.ones([1, 3, 64, 64]).to(device)
traced_module = torch.jit.trace(model, sample)
torch.jit.save(traced_module, "path/to/model.ptt")

P.S. Then load with following code in your ensemble script

model = torch.jit.load("path/to/model.ptt")

After loading the model, it does not go to GPU. Any suggestions on how to do that?

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

JMBokhorst commented 1 year ago

Hi,

I have a similar problem and hope you can help. I'm using python3.8, PyTorch 1.9 and segmentation models 0.3.3 (installed via pip3 install git+https://github.com/qubvel/segmentation_models.pytorch).

I trained an Unet-plusplus model with a mobilenetv2 backbone, directly after I trained the model I wanted to apply it however I got this error:


untimeError: Error(s) in loading state_dict for Unet:
    Missing key(s) in state_dict: "decoder.blocks.0.conv1.0.weight", "decoder.blocks.0.conv1.1.weight", "decoder.blocks.0.conv1.1.bias", "decoder.blocks.0.conv1.1.running_mean", "decoder.blocks.0.conv1.1.running_var", "decoder.blocks.0.conv2.0.weight", "decoder.blocks.0.conv2.1.weight", "decoder.blocks.0.conv2.1.bias", "decoder.blocks.0.conv2.1.running_mean", "decoder.blocks.0.conv2.1.running_var", "decoder.blocks.1.conv1.0.weight", "decoder.blocks.1.conv1.1.weight", "decoder.blocks.1.conv1.1.bias", "decoder.blocks.1.conv1.1.running_mean", "decoder.blocks.1.conv1.1.running_var", "decoder.blocks.1.conv2.0.weight", "decoder.blocks.1.conv2.1.weight", "decoder.blocks.1.conv2.1.bias", "decoder.blocks.1.conv2.1.running_mean", "decoder.blocks.1.conv2.1.running_var", "decoder.blocks.2.conv1.0.weight", "decoder.blocks.2.conv1.1.weight", "decoder.blocks.2.conv1.1.bias", "decoder.blocks.2.conv1.1.running_mean", "decoder.blocks.2.conv1.1.running_var", "decoder.blocks.2.conv2.0.weight", "decoder.blocks.2.conv2.1.weight", "decoder.blocks.2.conv2.1.bias", "decoder.blocks.2.conv2.1.running_mean", "decoder.blocks.2.conv2.1.running_var", "decoder.blocks.3.conv1.0.weight", "decoder.blocks.3.conv1.1.weight", "decoder.blocks.3.conv1.1.bias", "decoder.blocks.3.conv1.1.running_mean", "decoder.blocks.3.conv1.1.running_var", "decoder.blocks.3.conv2.0.weight", "decoder.blocks.3.conv2.1.weight", "decoder.blocks.3.conv2.1.bias", "decoder.blocks.3.conv2.1.running_mean", "decoder.blocks.3.conv2.1.running_var", "decoder.blocks.4.conv1.0.weight", "decoder.blocks.4.conv1.1.weight", "decoder.blocks.4.conv1.1.bias", "decoder.blocks.4.conv1.1.running_mean", "decoder.blocks.4.conv1.1.running_var", "decoder.blocks.4.conv2.0.weight", "decoder.blocks.4.conv2.1.weight", "decoder.blocks.4.conv2.1.bias", "decoder.blocks.4.conv2.1.running_mean", "decoder.blocks.4.conv2.1.running_var". 
    Unexpected key(s) in state_dict: "decoder.blocks.x_0_0.conv1.0.weight", "decoder.blocks.x_0_0.conv1.1.weight", "decoder.blocks.x_0_0.conv1.1.bias", "decoder.blocks.x_0_0.conv1.1.running_mean", "decoder.blocks.x_0_0.conv1.1.running_var", "decoder.blocks.x_0_0.conv1.1.num_batches_tracked", "decoder.blocks.x_0_0.conv2.0.weight", "decoder.blocks.x_0_0.conv2.1.weight", "decoder.blocks.x_0_0.conv2.1.bias", "decoder.blocks.x_0_0.conv2.1.running_mean", "decoder.blocks.x_0_0.conv2.1.running_var", "decoder.blocks.x_0_0.conv2.1.num_batches_tracked", "decoder.blocks.x_0_1.conv1.0.weight", "decoder.blocks.x_0_1.conv1.1.weight", "decoder.blocks.x_0_1.conv1.1.bias", "decoder.blocks.x_0_1.conv1.1.running_mean", "decoder.blocks.x_0_1.conv1.1.running_var", "decoder.blocks.x_0_1.conv1.1.num_batches_tracked", "decoder.blocks.x_0_1.conv2.0.weight", "decoder.blocks.x_0_1.conv2.1.weight", "decoder.blocks.x_0_1.conv2.1.bias", "decoder.blocks.x_0_1.conv2.1.running_mean", "decoder.blocks.x_0_1.conv2.1.running_var", "decoder.blocks.x_0_1.conv2.1.num_batches_tracked", "decoder.blocks.x_1_1.conv1.0.weight", "decoder.blocks.x_1_1.conv1.1.weight", "decoder.blocks.x_1_1.conv1.1.bias", "decoder.blocks.x_1_1.conv1.1.running_mean", "decoder.blocks.x_1_1.conv1.1.running_var", "decoder.blocks.x_1_1.conv1.1.num_batches_tracked", "decoder.blocks.x_1_1.conv2.0.weight", "decoder.blocks.x_1_1.conv2.1.weight", "decoder.blocks.x_1_1.conv2.1.bias", "decoder.blocks.x_1_1.conv2.1.running_mean", "decoder.blocks.x_1_1.conv2.1.running_var", "decoder.blocks.x_1_1.conv2.1.num_batches_tracked", "decoder.blocks.x_0_2.conv1.0.weight", "decoder.blocks.x_0_2.conv1.1.weight", "decoder.blocks.x_0_2.conv1.1.bias", "decoder.blocks.x_0_2.conv1.1.running_mean", "decoder.blocks.x_0_2.conv1.1.running_var", "decoder.blocks.x_0_2.conv1.1.num_batches_tracked", "decoder.blocks.x_0_2.conv2.0.weight", "decoder.blocks.x_0_2.conv2.1.weight", "decoder.blocks.x_0_2.conv2.1.bias", "decoder.blocks.x_0_2.conv2.1.running_mean", "decoder.blocks.x_0_2.conv2.1.running_var", "decoder.blocks.x_0_2.conv2.1.num_batches_tracked", "decoder.blocks.x_1_2.conv1.0.weight", "decoder.blocks.x_1_2.conv1.1.weight", "decoder.blocks.x_1_2.conv1.1.bias", "decoder.blocks.x_1_2.conv1.1.running_mean", "decoder.blocks.x_1_2.conv1.1.running_var", "decoder.blocks.x_1_2.conv1.1.num_batches_tracked", "decoder.blocks.x_1_2.conv2.0.weight", "decoder.blocks.x_1_2.conv2.1.weight", "decoder.blocks.x_1_2.conv2.1.bias", "decoder.blocks.x_1_2.conv2.1.running_mean", "decoder.blocks.x_1_2.conv2.1.running_var", "decoder.blocks.x_1_2.conv2.1.num_batches_tracked", "decoder.blocks.x_2_2.conv1.0.weight", "decoder.blocks.x_2_2.conv1.1.weight", "decoder.blocks.x_2_2.conv1.1.bias", "decoder.blocks.x_2_2.conv1.1.running_mean", "decoder.blocks.x_2_2.conv1.1.running_var", "decoder.blocks.x_2_2.conv1.1.num_batches_tracked", "decoder.blocks.x_2_2.conv2.0.weight", "decoder.blocks.x_2_2.conv2.1.weight", "decoder.blocks.x_2_2.conv2.1.bias", "decoder.blocks.x_2_2.conv2.1.running_mean", "decoder.blocks.x_2_2.conv2.1.running_var", "decoder.blocks.x_2_2.conv2.1.num_batches_tracked", "decoder.blocks.x_0_3.conv1.0.weight", "decoder.blocks.x_0_3.conv1.1.weight", "decoder.blocks.x_0_3.conv1.1.bias", "decoder.blocks.x_0_3.conv1.1.running_mean", "decoder.blocks.x_0_3.conv1.1.running_var", "decoder.blocks.x_0_3.conv1.1.num_batches_tracked", "decoder.blocks.x_0_3.conv2.0.weight", "decoder.blocks.x_0_3.conv2.1.weight", "decoder.blocks.x_0_3.conv2.1.bias", "decoder.blocks.x_0_3.conv2.1.running_mean", "decoder.blocks.x_0_3.conv2.1.running_var", "decoder.blocks.x_0_3.conv2.1.num_batches_tracked", "decoder.blocks.x_1_3.conv1.0.weight", "decoder.blocks.x_1_3.conv1.1.weight", "decoder.blocks.x_1_3.conv1.1.bias", "decoder.blocks.x_1_3.conv1.1.running_mean", "decoder.blocks.x_1_3.conv1.1.running_var", "decoder.blocks.x_1_3.conv1.1.num_batches_tracked", "decoder.blocks.x_1_3.conv2.0.weight", "decoder.blocks.x_1_3.conv2.1.weight", "decoder.blocks.x_1_3.conv2.1.bias", "decoder.blocks.x_1_3.conv2.1.running_mean", "decoder.blocks.x_1_3.conv2.1.running_var", "decoder.blocks.x_1_3.conv2.1.num_batches_tracked", "decoder.blocks.x_2_3.conv1.0.weight", "decoder.blocks.x_2_3.conv1.1.weight", "decoder.blocks.x_2_3.conv1.1.bias", "decoder.blocks.x_2_3.conv1.1.running_mean", "decoder.blocks.x_2_3.conv1.1.running_var", "decoder.blocks.x_2_3.conv1.1.num_batches_tracked", "decoder.blocks.x_2_3.conv2.0.weight", "decoder.blocks.x_2_3.conv2.1.weight", "decoder.blocks.x_2_3.conv2.1.bias", "decoder.blocks.x_2_3.conv2.1.running_mean", "decoder.blocks.x_2_3.conv2.1.running_var", "decoder.blocks.x_2_3.conv2.1.num_batches_tracked", "decoder.blocks.x_3_3.conv1.0.weight", "decoder.blocks.x_3_3.conv1.1.weight", "decoder.blocks.x_3_3.conv1.1.bias", "decoder.blocks.x_3_3.conv1.1.running_mean", "decoder.blocks.x_3_3.conv1.1.running_var", "decoder.blocks.x_3_3.conv1.1.num_batches_tracked", "decoder.blocks.x_3_3.conv2.0.weight", "decoder.blocks.x_3_3.conv2.1.weight", "decoder.blocks.x_3_3.conv2.1.bias", "decoder.blocks.x_3_3.conv2.1.running_mean", "decoder.blocks.x_3_3.conv2.1.running_var", "decoder.blocks.x_3_3.conv2.1.num_batches_tracked", "decoder.blocks.x_0_4.conv1.0.weight", "decoder.blocks.x_0_4.conv1.1.weight", "decoder.blocks.x_0_4.conv1.1.bias", "decoder.blocks.x_0_4.conv1.1.running_mean", "decoder.blocks.x_0_4.conv1.1.running_var", "decoder.blocks.x_0_4.conv1.1.num_batches_tracked", "decoder.blocks.x_0_4.conv2.0.weight", "decoder.blocks.x_0_4.conv2.1.weight", "decoder.blocks.x_0_4.conv2.1.bias", "decoder.blocks.x_0_4.conv2.1.running_mean", "decoder.blocks.x_0_4.conv2.1.running_var", "decoder.blocks.x_0_4.conv2.1.num_batches_tracked". 

I see that names are slightly different (x_0_0 vs 0 for example) and would expect that to happen when I would use different versions between the runs. However, everything is identical in terms of versions between training and inference.

Could you help?