torchmd / torchmd-net

Training neural network potentials
MIT License
335 stars 75 forks source link

Loading of pre-trained models fails #154

Closed mbackenkoehler closed 2 years ago

mbackenkoehler commented 2 years ago

Trying to load a pre-trained model as described in the examples folder results in an error.

Code:

from torchmdnet.models.model import load_model
import torch
model = load_model('epoch=649-val_loss=0.0003-test_loss=0.0059.ckpt')

Error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [3], line 1
----> 1 model = load_model('epoch=649-val_loss=0.0003-test_loss=0.0059.ckpt')

File ~/Code/torchmd-net/torchmdnet/models/model.py:104, in load_model(filepath, args, device, **kwargs)
    101 model = create_model(args)
    103 state_dict = {re.sub(r"^model\.", "", k): v for k, v in ckpt["state_dict"].items()}
--> 104 model.load_state_dict(state_dict)
    105 return model.to(device)

File ~/.miniconda3/envs/torchmd-net/lib/python3.9/site-packages/torch/nn/modules/module.py:1497, in Module.load_state_dict(self, state_dict, strict)
   1492         error_msgs.insert(
   1493             0, 'Missing key(s) in state_dict: {}. '.format(
   1494                 ', '.join('"{}"'.format(k) for k in missing_keys)))
   1496 if len(error_msgs) > 0:
-> 1497     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
   1498                        self.__class__.__name__, "\n\t".join(error_msgs)))
   1499 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for TorchMD_Net:
    Missing key(s) in state_dict: "prior_model.0.initial_atomref", "prior_model.0.atomref.weight". 
    Unexpected key(s) in state_dict: "prior_model.initial_atomref", "prior_model.atomref.weight". 
PhilippThoelke commented 2 years ago

@peastman is this related to #134?

peastman commented 2 years ago

Can you describe how you created the checkpoint? If you can provide your config file, that would be great. Also, was it created with an earlier code version, or with the same version of the code you're using to load it?

PhilippThoelke commented 2 years ago

It's one of the pretrained checkpoints we provide: https://github.com/torchmd/torchmd-net/tree/main/examples#loading-checkpoints

These were created several months ago with an older version of the code. I think the problem is that we allow multiple prior models now, saved in a ModuleList. The checkpoints do not contain ModuleLists for the prior model but instead just the prior model as a Module. We could probably modify the load_model function to handle older checkpoints simply by replacing prior_model by prior_model.0 in the state dict.

peastman commented 2 years ago

That makes sense. I'll try doing it.

giadefa commented 2 years ago

can't we just manually modify the checkpoint to be compatible with the new class?

On Wed, Nov 23, 2022 at 10:33 PM Philipp Thölke @.***> wrote:

It's one of the pretrained checkpoints we provide: https://github.com/torchmd/torchmd-net/tree/main/examples#loading-checkpoints

These were created several months ago with an older version of the code. I think the problem is that we allow multiple prior models now, saved in a ModuleList. The checkpoints do not contain ModuleLists for the prior model but instead just the prior model as a Module. We could probably modify the load_model function to handle older checkpoints simply by replacing prior_model by prior_model.0 in the state dict.

— Reply to this email directly, view it on GitHub https://github.com/torchmd/torchmd-net/issues/154#issuecomment-1325682369, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOR5O4IZIWBRNJRPF5LWJ2ERLANCNFSM6AAAAAASI5FJ2U . You are receiving this because you are subscribed to this thread.Message ID: @.***>

PhilippThoelke commented 2 years ago

Yes that would work too.

peastman commented 2 years ago

It just depends whether we care about maintaining backward compatibility with old files or not.

PhilippThoelke commented 2 years ago

I think it wouldn't hurt to support the older files, especially if we mark the compatibility code as such with a comment.