openmm / openmm-ml

High level API for using machine learning models in OpenMM simulations
Other
76 stars 25 forks source link

`TorchForce` inside `CustomCVForce` fails with `nnpops` implementation #32

Closed dominicrufa closed 1 year ago

dominicrufa commented 2 years ago

I have been trying to create a minimally-modified implementation to get nnpops working on a conda-installable openmm-ml environment, and i am encountering an error in TestMLPotential.py with the implementation.

If you drop in/replace the anipotential.py with this and run the test, (making sure to set the platform to CUDA), you should see that the test fails on line 26,

======================================================================
ERROR: testCreateMixedSystem (__main__.TestMLPotential)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/lila/home/rufad/github/openmm-ml/test/TestMLPotential.py", line 26, in testCreateMixedSystem
    interpEnergy1 = interpContext.getState(getEnergy=True).getPotentialEnergy().value_in_unit(unit.kilojoules_per_mole)
  File "/home/rufad/miniconda3/envs/omm_dev2/lib/python3.9/site-packages/openmm/openmm.py", line 12318, in getState
    state = _openmm.Context_getState(self, types, enforcePeriodicBox, groups_mask)
openmm.OpenMMException: Error invoking kernel: CUDA_ERROR_INVALID_HANDLE (400)

meaning that the createMixedSystem(..., interpolate=False) is compatible with nnpops, but interpolate=True is not. I have found a way around this (by taking the TorchForce out of the customCVForce and adding a scale parameter to the ANIForce manually), but I don't want to open a PR attempting to merge my modifications unless the compatibility issue with nnpops can't be fixed an easier way.

dominicrufa commented 2 years ago

@peastman , @raimis: turns out ^this is actually the crux of this issue

peastman commented 2 years ago

This may also be related to https://github.com/openmm/openmm/issues/3588.

dominicrufa commented 2 years ago

in the interest of getting openmm-ml working with nnpops, i think the easiest thing would be to just extract the TorchForce from the customCV (as in my fork) and run with it until a more stable solution is found. is that agreeable?

dominicrufa commented 2 years ago

@peastman, have you found a stable workaround for this issue, or should we proceed with removing the TorchForce from the CVForce?

dominicrufa commented 2 years ago

tagging @jchodera

peastman commented 2 years ago

Let me make one more effort at trying to figure it out. I suspect this is related to https://github.com/openmm/openmm/issues/3588, and most especially to https://github.com/openmm/openmm/issues/3588#issuecomment-1151731824. If I can't figure out a proper solution, we can use your suggested workaround.

jchodera commented 2 years ago

@dominicrufa : Can you test the PR here? https://github.com/openmm/openmm-torch/pull/78

dominicrufa commented 2 years ago

@peastman , I am unable to source install the PR and make it compatible with the rest of the conda-installed bits needed for the nnpops test. if you can source install it locally, can you run the test i mentioned at the top of this issue?

peastman commented 2 years ago

It runs without problem now. It looks like that fixes it!

dominicrufa commented 2 years ago

@peastman , awesome! can you merge the PR and make it conda-installable? i can add a PR with the nnpops implementation for anipotential.py here and modify the test to test for the torchani and nnpops implementation if you'd like.

raimis commented 2 years ago

@peastman could reopen this. I don't permissions.

peastman commented 2 years ago

I just added you as a maintainer. You should have full permissions now.

raimis commented 2 years ago

@dominicrufa I can reproduce the issue and created a minimal test https://github.com/openmm/openmm-torch/pull/80/commits/63a49c170232531c12edf5db2d2f5255af1dfa6f.

dominicrufa commented 2 years ago

@raimis , great! let me know if/when i can conda-install the fix to write the openmm-ml tests.

jchodera commented 1 year ago

@dominicrufa : Can you confirm the latest release fixes this? If so, can you close this?

dominicrufa commented 1 year ago

yes, this is fixed. closing