Which Pytorch is used in HDDM 0.9.0?

hcp4715 commented 2 years ago

Dear there,

I am trying to package HDDM 0.9.0 into a docker image, it seem that I can generate the docker images without error, but when I am trying the HDDMnn example (https://hddm.readthedocs.io/en/latest/lan_new_classes.html#short-example), an version error occurred.

The PyTorch was installed via conda:

conda install pytorch torchvision torchaudio cpuonly -c pytorch

The PyTorch version is 1.4.0, as the returned by the command below

import torch
print("PyTorch's version is ", torch.__version__)

here is the screenshot of the short example:

Screenshot from 2021-11-29 18-23-05

I doubt this error is related to the version of pytorch, but not very sure about it.

The error message of the last cell is pasted below:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-a5cdde3d374b> in <module>
----> 1 network = load_torch_mlp(model=model)

/opt/conda/lib/python3.8/site-packages/hddm/torch/mlp_inference_class.py in load_torch_mlp(model)
     37     def load_torch_mlp(model=None):
     38         cfg = TorchConfig(model=model)
---> 39         infer_model = LoadTorchMLPInfer(
     40             model_file_path=cfg.network_path,
     41             network_config=cfg.network_config,

/opt/conda/lib/python3.8/site-packages/hddm/torch/mlp_inference_class.py in __init__(self, model_file_path, network_config, input_dim)
     24             )
     25             self.net.load_state_dict(
---> 26                 torch.load(self.model_file_path, map_location=self.dev)
     27             )
     28             self.net.to(self.dev)

/opt/conda/lib/python3.8/site-packages/torch/serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    525     with _open_file_like(f, 'rb') as opened_file:
    526         if _is_zipfile(opened_file):
--> 527             with _open_zipfile_reader(f) as opened_zipfile:
    528                 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
    529         return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)

/opt/conda/lib/python3.8/site-packages/torch/serialization.py in __init__(self, name_or_buffer)
    222 class _open_zipfile_reader(_opener):
    223     def __init__(self, name_or_buffer):
--> 224         super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
    225 
    226 

RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /opt/conda/conda-bld/pytorch_1579022027171/work/caffe2/serialize/inline_container.cc:132, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 2. Your PyTorch installation may be too old. (init at /opt/conda/conda-bld/pytorch_1579022027171/work/caffe2/serialize/inline_container.cc:132)
frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f212304e627 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x1f5b (0x7f2124ee5cbb in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x64 (0x7f2124ee6ed4 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch.so)
frame #3: <unknown function> + 0x69a466 (0x7f2129282466 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: <unknown function> + 0x26b097 (0x7f2128e53097 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: PyCFunction_Call + 0x54 (0x556866fefdf4 in /opt/conda/bin/python)
frame #6: _PyObject_MakeTpCall + 0x31e (0x556866ffef2e in /opt/conda/bin/python)
frame #7: <unknown function> + 0x1b26be (0x5568670766be in /opt/conda/bin/python)
frame #8: PyObject_Call + 0x5e (0x556866fe90be in /opt/conda/bin/python)
frame #9: <unknown function> + 0x1b3ac0 (0x556867077ac0 in /opt/conda/bin/python)
frame #10: _PyObject_MakeTpCall + 0x2eb (0x556866ffeefb in /opt/conda/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x534b (0x556867098f6b in /opt/conda/bin/python)
frame #12: _PyEval_EvalCodeWithName + 0x2c3 (0x556867074503 in /opt/conda/bin/python)
frame #13: <unknown function> + 0x1b3213 (0x556867077213 in /opt/conda/bin/python)
frame #14: _PyObject_MakeTpCall + 0x2eb (0x556866ffeefb in /opt/conda/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x4f2e (0x556867098b4e in /opt/conda/bin/python)
frame #16: _PyEval_EvalCodeWithName + 0x2c3 (0x556867074503 in /opt/conda/bin/python)
frame #17: _PyFunction_Vectorcall + 0x378 (0x5568670758d8 in /opt/conda/bin/python)
frame #18: _PyEval_EvalFrameDefault + 0x1782 (0x5568670953a2 in /opt/conda/bin/python)
frame #19: _PyEval_EvalCodeWithName + 0x2c3 (0x556867074503 in /opt/conda/bin/python)
frame #20: _PyFunction_Vectorcall + 0x378 (0x5568670758d8 in /opt/conda/bin/python)
frame #21: <unknown function> + 0x1b32a7 (0x5568670772a7 in /opt/conda/bin/python)
frame #22: _PyObject_MakeTpCall + 0x2eb (0x556866ffeefb in /opt/conda/bin/python)
frame #23: _PyEval_EvalFrameDefault + 0x562d (0x55686709924d in /opt/conda/bin/python)
frame #24: _PyEval_EvalCodeWithName + 0x2c3 (0x556867074503 in /opt/conda/bin/python)
frame #25: _PyFunction_Vectorcall + 0x378 (0x5568670758d8 in /opt/conda/bin/python)
frame #26: _PyEval_EvalFrameDefault + 0x1782 (0x5568670953a2 in /opt/conda/bin/python)
frame #27: _PyEval_EvalCodeWithName + 0x2c3 (0x556867074503 in /opt/conda/bin/python)
frame #28: PyEval_EvalCodeEx + 0x39 (0x556867075559 in /opt/conda/bin/python)
frame #29: PyEval_EvalCode + 0x1b (0x5568671189ab in /opt/conda/bin/python)
frame #30: <unknown function> + 0x2731de (0x5568671371de in /opt/conda/bin/python)
frame #31: <unknown function> + 0x128d4b (0x556866fecd4b in /opt/conda/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x92f (0x55686709454f in /opt/conda/bin/python)
frame #33: <unknown function> + 0x182ea3 (0x556867046ea3 in /opt/conda/bin/python)
frame #34: _PyEval_EvalFrameDefault + 0x1d37 (0x556867095957 in /opt/conda/bin/python)
frame #35: <unknown function> + 0x182ea3 (0x556867046ea3 in /opt/conda/bin/python)
frame #36: _PyEval_EvalFrameDefault + 0x1d37 (0x556867095957 in /opt/conda/bin/python)
frame #37: <unknown function> + 0x182ea3 (0x556867046ea3 in /opt/conda/bin/python)
frame #38: <unknown function> + 0x1958c9 (0x5568670598c9 in /opt/conda/bin/python)
frame #39: _PyEval_EvalFrameDefault + 0xa4b (0x55686709466b in /opt/conda/bin/python)
frame #40: _PyFunction_Vectorcall + 0x1a6 (0x556867075706 in /opt/conda/bin/python)
frame #41: _PyEval_EvalFrameDefault + 0x92f (0x55686709454f in /opt/conda/bin/python)
frame #42: _PyFunction_Vectorcall + 0x1a6 (0x556867075706 in /opt/conda/bin/python)
frame #43: _PyEval_EvalFrameDefault + 0xa4b (0x55686709466b in /opt/conda/bin/python)
frame #44: _PyEval_EvalCodeWithName + 0x2c3 (0x556867074503 in /opt/conda/bin/python)
frame #45: _PyFunction_Vectorcall + 0x378 (0x5568670758d8 in /opt/conda/bin/python)
frame #46: <unknown function> + 0x1b1f91 (0x556867075f91 in /opt/conda/bin/python)
frame #47: PyObject_Call + 0x5e (0x556866fe90be in /opt/conda/bin/python)
frame #48: _PyEval_EvalFrameDefault + 0x21c1 (0x556867095de1 in /opt/conda/bin/python)
frame #49: _PyEval_EvalCodeWithName + 0x2c3 (0x556867074503 in /opt/conda/bin/python)
frame #50: <unknown function> + 0x1b2007 (0x556867076007 in /opt/conda/bin/python)
frame #51: _PyEval_EvalFrameDefault + 0x1782 (0x5568670953a2 in /opt/conda/bin/python)
frame #52: <unknown function> + 0x1925da (0x5568670565da in /opt/conda/bin/python)
frame #53: <unknown function> + 0x128d4b (0x556866fecd4b in /opt/conda/bin/python)
frame #54: <unknown function> + 0x13b3ea (0x556866fff3ea in /opt/conda/bin/python)
frame #55: <unknown function> + 0x21da4f (0x5568670e1a4f in /opt/conda/bin/python)
frame #56: <unknown function> + 0x128fc2 (0x556866fecfc2 in /opt/conda/bin/python)
frame #57: _PyEval_EvalFrameDefault + 0x92f (0x55686709454f in /opt/conda/bin/python)
frame #58: _PyEval_EvalCodeWithName + 0x2c3 (0x556867074503 in /opt/conda/bin/python)
frame #59: _PyFunction_Vectorcall + 0x378 (0x5568670758d8 in /opt/conda/bin/python)
frame #60: _PyEval_EvalFrameDefault + 0xa4b (0x55686709466b in /opt/conda/bin/python)
frame #61: <unknown function> + 0x1925da (0x5568670565da in /opt/conda/bin/python)
frame #62: <unknown function> + 0x128d4b (0x556866fecd4b in /opt/conda/bin/python)
frame #63: <unknown function> + 0x13b3ea (0x556866fff3ea in /opt/conda/bin/python)

2021.11.30 Update:

I tried to look into the details of the error message, the error was generated from torch.load()

Screenshot from 2021-11-30 08-51-29

Then I check why this happen, the problem is: model trained in higher version of PyTorch, e.g., 1.7.0, can not be directly loaded by lower version PyTorch 1.4.0.

There are two solutions. One is to save the model file that compatible with lower version:

torch.save(model, "filename", _use_new_zipfile_serialization=False)

The other is using the higher version of PyTorch. I tried the latter solution. After testing with the latest and older version of PyTorch, now it seems that PyTorch 1.7.0 works with py 3.8.8 and other packages.

I've updated the docker image.

panwanke commented 2 years ago

I also encountered the same problem, and after my troubleshooting of the source code, I have a few discoveries.

In the hddm.model_config.model_config.py module, there is a config definition for "full_ddm"and "ddm_vanilla".
However, there is no network file for "full_ddm" in hddm.torch.torch_config.py where it just have "ddm" network file. Therefore, it will occur an error when the model is defined as "full ddm", but not when it is defined as "ddm" or "levy".

I guess the reason may be that the contributor forgot to update this part. It's also possible that I only installed pytorch for cpu instead of pytorch for cuda, which may also be a problem.

hcp4715 commented 2 years ago

Hi, Wanke,

I can reproduce your error in my docker image (see the screenshot below), but it's not related to the Pytorch version but the content of this script in HDDM. As you can see from the script, full_ddm is not in the config file, caused error when running TorchConfig, in contrast, model angle is in the list and worked.

Maybe @AlexanderFengler can help to solve this issue.

reproduced error: Screenshot from 2021-12-01 14-10-32

AlexanderFengler commented 2 years ago

model_config has "doc" keys which give some info for each model now. It reflects that full_ddm is missing atm. It will appear soon. full_ddm got lost in updating everything from keras to pytorch.

I assume (as per above) the pytorch error can be considered resolved by using more recent pytorch versions. Will update pytorch dependency to be > 1.7, didn't realize they made breaking changes when it comes to loading models (but I should have learned that lesson from tensorflow / keras earlier..).

Best, Alex

hddm-devs / hddm

Which Pytorch is used in HDDM 0.9.0? #78