openmm / openmm-ml

High level API for using machine learning models in OpenMM simulations
Other
76 stars 25 forks source link

calling `CustomCVForce_getCollectiveVariableValues(self, context)` throws a `GIL` error when collective variable is a `TorchForce` #22

Closed dominicrufa closed 2 years ago

dominicrufa commented 2 years ago

I tried to call an energy/force operation on a TorchForce wrapped in a CustomCVForce and i ran into this error. Not sure if this should be fixed on the OpenMM level or pytorch level. my guess is the latter based on the message. i can construct a minimally-working example if necessary for further debugging if the solution isn't obvious.

File ~/anaconda3/envs/openmm_torch/lib/python3.9/site-packages/openmm/openmm.py:4935, in CustomCVForce.getCollectiveVariableValues(self, context)
   4920 def getCollectiveVariableValues(self, context):
   4921     r"""
   4922     getCollectiveVariableValues(self, context)
   4923     Get the current values of the collective variables in a Context.
   (...)
   4933         the values of the collective variables are computed and stored into this
   4934     """
-> 4935     return _openmm.CustomCVForce_getCollectiveVariableValues(self, context)

OpenMMException: The autograd engine was called while holding the GIL. If you are using the C++ API, the autograd engine is an expensive operation that does not require the GIL to be held so you should release it with 'pybind11::gil_scoped_release no_gil;'. If you are not using the C++ API, please report a bug to the pytorch team.
Exception raised from execute at /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1640869844479/work/torch/csrc/autograd/python_engine.cpp:120 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x68 (0x7fa92b91d8c8 in /home/dominic/anaconda3/envs/openmm_torch/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0xf4 (0x7fa92b8fdac5 in /home/dominic/anaconda3/envs/openmm_torch/lib/python3.9/site-packages/torch/lib/libc10.so)
frame #2: torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, bool, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) + 0xb0 (0x7fa8ba6fee10 in /home/dominic/anaconda3/envs/openmm_torch/lib/python3.9/site-packages/torch/lib/libtorch_python.so)
frame #3: <unknown function> + 0x3b63512 (0x7fa973be9512 in /home/dominic/anaconda3/envs/openmm_torch/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #4: torch::autograd::backward(std::vector<at::Tensor, std::allocator<at::Tensor> > const&, std::vector<at::Tensor, std::allocator<at::Tensor> > const&, c10::optional<bool>, bool, std::vector<at::Tensor, std::allocator<at::Tensor> > const&) + 0x6b (0x7fa973beb81b in /home/dominic/anaconda3/envs/openmm_torch/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x3bc5866 (0x7fa973c4b866 in /home/dominic/anaconda3/envs/openmm_torch/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #6: at::Tensor::_backward(c10::ArrayRef<at::Tensor>, c10::optional<at::Tensor> const&, c10::optional<bool>, bool) const + 0x49 (0x7fa9711f3d19 in /home/dominic/anaconda3/envs/openmm_torch/lib/python3.9/site-packages/torch/lib/libtorch_cpu.so)
frame #7: TorchPlugin::ReferenceCalcTorchForceKernel::execute(OpenMM::ContextImpl&, bool, bool) + 0xe7a (0x7fa8c891c70a in /home/dominic/anaconda3/envs/openmm_torch/lib/plugins/libOpenMMTorchReference.so)
frame #8: OpenMM::ContextImpl::calcForcesAndEnergy(bool, bool, int) + 0xc9 (0x7fa9782f06e9 in /home/dominic/anaconda3/envs/openmm_torch/lib/python3.9/site-packages/openmm/../../../libOpenMM.so.7.7)
frame #9: OpenMM::Context::getState(int, bool, int) const + 0x166 (0x7fa9782ef586 in /home/dominic/anaconda3/envs/openmm_torch/lib/python3.9/site-packages/openmm/../../../libOpenMM.so.7.7)
raimis commented 2 years ago

This is a similar issue to https://github.com/openmm/openmm-torch/issues/61 and I know how to fix it.