ipcamit / kim-nequip

Small utility fork of nequip to port NequIP models to OpenKIM compatible format
MIT License
0 stars 1 forks source link

Debugging NequIP port #4

Closed jvita closed 5 months ago

jvita commented 5 months ago

Moving discussion here for better documentation and code formatting.

Previous error:

forward(__torch__.kim_nequip.nn._graph_mixin.KLIFFGraphNetwork self, Tensor species, Tensor coords, Tensor edge_index0, Tensor edge_index1, Tensor edge_index2, Tensor edge_index3, Tensor batch) -> (Tensor):
Expected a value of type 'Tensor (inferred)' for argument 'species' but instead found type 'Dict[str, Tensor]'.
Inferred 'species' to be of type 'Tensor' because it was not annotated with an explicit type.
:
  File "/usr/WS1/vita1/programs/kim-nequip/kim_nequip/nn/_grad_output.py", line 75

        if self.skip:
            return self.func(data)
                   ~~~~~~~~~ <--- HERE

        # set req grad

Amit's response:

I have modified the code to explicitly ignore ForceOutput etc, and the original error message is no longer there. I will test it more to ensure that output is correct. In the meantime, if you want to give it a try, please check the "josh-4-layer-port" github branch. of kim-nequip.

New error:

Saved the model as kim_deployed.pth
You can check the architecture in : kim_deployed.txt
Traceback (most recent call last):
  File "/g/g20/vita1/ws/programs/anaconda/envs/iap-uq-nequip/bin/kim-nequip-port", line 33, in <module>
    sys.exit(load_entry_point('kim-nequip', 'console_scripts', 'kim-nequip-port')())
  File "/usr/WS1/vita1/programs/kim-nequip/kim_nequip/scripts/convert.py", line 59, in main
    trainer = fresh_start(config)
  File "/usr/WS1/vita1/programs/kim-nequip/kim_nequip/scripts/convert.py", line 131, in fresh_start
    final_model = copy_weights(deployed_model, final_model, config)
  File "/usr/WS1/vita1/programs/kim-nequip/kim_nequip/scripts/convert.py", line 206, in copy_weights
    model_wrapped = WrappedModel(final_model, deployed_model.scale_by)
  File "/g/g20/vita1/ws/programs/anaconda/envs/iap-uq-nequip/lib/python3.9/site-packages/torch/jit/_script.py", line 785, in __getattr__
    return super(RecursiveScriptModule, self).__getattr__(attr)
  File "/g/g20/vita1/ws/programs/anaconda/envs/iap-uq-nequip/lib/python3.9/site-packages/torch/jit/_script.py", line 502, in __getattr__
    return super(ScriptModule, self).__getattr__(attr)
  File "/g/g20/vita1/ws/programs/anaconda/envs/iap-uq-nequip/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1207, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'RecursiveScriptModule' object has no attribute 'scale_by'
ipcamit commented 5 months ago

The issue is I guess missing RescaleEnergyEtc module. When we use default "example.yaml" or "full.yaml" configuration, it adds a global scale_by factor in the model. But in your config file, that has been removed. I could not find any explicit keyword that influences that, but manual model_builder list. For now I have added an exception checking to set the global scale to 1 if missing. Now I can get KIM API model from the configuration you provided. I think this should work now. Please give it a go.

ipcamit commented 5 months ago

This also explains the original oversight. When giving the default yaml files, I skipped explicit declaration of model_builder list. This forces a default construction of the model_builder, where I removed the ForceOutput builder string. But when you explicitly ser it, that was overridden.

ipcamit commented 5 months ago

@jvita Did you get the time to give it a go?

jvita commented 5 months ago

Yes, the model compiled at least, but I haven't had time to run an actual test with it. I was planning to put together a script to run a few KIM tests, but haven't found the time yet.

ipcamit commented 5 months ago

Great! No issues at all. In that case I will merge these changes back in main branch end close this issue.

On Tue, Feb 20, 2024, 07:00 Josh Vita @.***> wrote:

Yes, the model compiled at least, but I haven't had time to run an actual test with it. I was planning to put together a script to run a few KIM tests, but haven't found the time yet.

— Reply to this email directly, view it on GitHub https://github.com/ipcamit/kim-nequip/issues/4#issuecomment-1954167150, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTZ2D75VABVJADSQWHSMH3YUSM6JAVCNFSM6AAAAABDKRYRS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJUGE3DOMJVGA . You are receiving this because you commented.Message ID: @.***>