mir-group / nequip

NequIP is a code for building E(3)-equivariant interatomic potentials
https://www.nature.com/articles/s41467-022-29939-5
MIT License
565 stars 124 forks source link

issue when using nequip-deploy 🐛 [BUG] #346

Open utkarshp1161 opened 1 year ago

utkarshp1161 commented 1 year ago

Describe the bug issue when using nequip-deploy

To Reproduce nequip-deploy build --train-dir model_path/ model_path/deployed_model.pth

ERROR:

[W init.cpp:833] Warning: Use _jit_set_fusion_strategy, bailout depth is deprecated. Setting to (STATIC, 2) (function operator())
/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_check.py:172: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`.
  warnings.warn("The TorchScript type system doesn't support "
Traceback (most recent call last):
  File "/home/anaconda3/envs/bebam/bin/nequip-deploy", line 8, in <module>
    sys.exit(main())
  File "/home/nequip/nequip/nequip/scripts/deploy.py", line 225, in main
    model = _compile_for_deploy(model)
  File "/home/nequip/nequip/nequip/scripts/deploy.py", line 62, in _compile_for_deploy
    model = script(model)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/e3nn/util/jit.py", line 266, in script
    out = compile(mod, in_place=in_place)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/e3nn/util/jit.py", line 101, in compile
    compile(
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/e3nn/util/jit.py", line 113, in compile
    mod = torch.jit.script(mod, **script_options)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_script.py", line 1284, in script
    return torch.jit._recursive.create_script_module(
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 480, in create_script_module
    return create_script_module_impl(nn_module, concrete_type, stubs_fn)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 542, in create_script_module_impl
    script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_script.py", line 614, in _construct
    init_fn(script_module)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 520, in init_fn
    scripted = create_script_module_impl(orig_value, sub_concrete_type, stubs_fn)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 546, in create_script_module_impl
    create_methods_and_properties_from_stubs(concrete_type, method_stubs, property_stubs)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 397, in create_methods_and_properties_from_stubs
    concrete_type._create_methods_and_properties(property_defs, property_rcbs, method_defs, method_rcbs, method_defaults)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_recursive.py", line 867, in try_compile_fn
    return torch.jit.script(fn, _rcb=rcb)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/_script.py", line 1338, in script
    ast = get_jit_def(obj, obj.__name__)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/frontend.py", line 297, in get_jit_def
    return build_def(parsed_def.ctx, fn_def, type_line, def_name, self_name=self_name, pdt_arg_types=pdt_arg_types)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/frontend.py", line 335, in build_def
    param_list = build_param_list(ctx, py_def.args, self_name, pdt_arg_types)
  File "/home/anaconda3/envs/bebam/lib/python3.10/site-packages/torch/jit/frontend.py", line 359, in build_param_list
    raise NotSupportedError(ctx_range, _vararg_kwarg_err)
torch.jit.frontend.NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults:
  File "/home/anaconda3/envs/bebam/lib/python3.10/logging/__init__.py", line 2131
def debug(msg, *args, **kwargs):
                       ~~~~~~~ <--- HERE
    """
    Log a message with severity 'DEBUG' on the root logger. If the logger has
Linux-cpp-lisp commented 1 year ago

This looks like you've edited the code to include logging.debug calls in the model?

utkarshp1161 commented 1 year ago

Not that, but I have set up my python environment for nequip such that I am able to use pytroch 2.0 (unlike what is prescribed : PyTorch >= 1.8, !=1.9, <=1.11.*. PyTorch, due to hardware constraints and try few other things with torch geometric). I am able to train the nequip models in this setup but when trying to deploy the model getting this error. My goal is to do a md simulation on trained model and I thought that I could use NequIPCalculator.from_deployed_model(model, **kwargs). Is there a workaround such that I can do the md sim without having to deploy the model?

Linux-cpp-lisp commented 1 year ago

I see--- what hardware things? Please note the following upstream issue: https://github.com/mir-group/nequip/discussions/311. If you do or do not encounter this issue, please post in that thread so we can continue to try to resolve and understand this problem. Also please note that on AMD GPUs more recent versions of PyTorch appear to be fine.

Regarding torch_geometric, that is no longer a dependency of nequip, but maybe I am misinterpreting what you mean.

You could try 1.13? I've never seen this issue reported before... besides your PyTorch version, is there anything else custom or unusual about your setup? There should never be a call to logging.debug in the model. Maybe the rest of the stack trace, which isn't included here, says where in the model it is?

utkarshp1161 commented 1 year ago

Thank you, will try and get back with more details.

Can you please answer this: "My goal is to do a md simulation on trained model and I thought that I could use NequIPCalculator.from_deployed_model(model, **kwargs). Is there a workaround such that I can do the md sim without having to deploy the model?"

Actually I have already trained quite a number of models and since nequip-deploy is not working for them I am looking for some work around to complete my study without having to setup things again.

Linux-cpp-lisp commented 1 year ago

You could do inefficient MD by manually constructing NequIPCalculator from an uncompiled PyTorch model (build using model_from_config and .load_state_dict and then passed to the constructor, rather than from_deployed_model). This will loose you performance in a lot of places, however.

It is not possible to do MD in LAMMPS, OpenMM, etc. without deploying.

Linux-cpp-lisp commented 1 year ago

Thank you, will try and get back with more details.

Thanks. It's possible that there is a missing @torch.jit.unused, in which case a quick code change will make it possible for you to deploy everything without retraining. (In general most code and version changes will not require retraining.)

utkarshp1161 commented 1 year ago

You could do inefficient MD by manually constructing NequIPCalculator from an uncompiled PyTorch model (build using model_from_config and .load_state_dict and then passed to the constructor, rather than from_deployed_model).

Do I need to modify the calculate function in "class NequIPCalculator(Calculator)" if I use uncompiled PyTorch model?

Linux-cpp-lisp commented 1 year ago

No, you shouldn't need to.

utkarshp1161 commented 1 year ago

No, you shouldn't need to.

cool