NNPOps Integration - Githubissues

dominicrufa commented 2 years ago

integrate NNPOps into MLPotential.

I want to be able to integrate OptimizedTorchANI directly into TorchANI models under the hood when I create a fully or hybrid ani/openmm system.

when i run the following:

  2 import torch
  3 import torchani
  4
  5 #from NNPOps.SpeciesConverter import TorchANISpeciesConverter
  6 #from NNPOps.SymmetryFunctions import TorchANISymmetryFunctions
  7 #from NNPOps.BatchedNN import TorchANIBatchedNN
  8 #from NNPOps.EnergyShifter import TorchANIEnergyShifter
  9 from NNPOps import OptimizedTorchANI
 10
 11 from openmmtools.testsystems import HostGuestExplicit
 12 from openmmtools.integrators import LangevinIntegrator
 13 from openmmml.mlpotential import MLPotential
 14 from simtk import openmm, unit
 15
 16
 17 device = torch.device('cuda')
 18
 19 hgv = HostGuestExplicit(constraints=None)
 20
 21 potential = MLPotential('ani2x')
 22 system = potential.createMixedSystem(hgv.topology, system = hgv.system, atoms = range(126,156), use_OptimizedTorchANI = True)
 23 print(f"done making system")
 24
 25
 26
 27 _int = LangevinIntegrator()
 28 context = openmm.Context(system, _int)
 29 context.setPositions(hgv.positions)
 30 print(f"unminimized pe: {context.getState(getEnergy=True).getPotentialEnergy()}")
 31 openmm.LocalEnergyMinimizer.minimize(context, maxIterations = 100)
 32 context.setVelocitiesToTemperature(298.15*unit.kelvin)
 33 for i in range(10):
 34     _int.step(100)
 35     print(context.getState(getEnergy=True).getPotentialEnergy())

with the new modification, I see that

  File "/lila/home/rufad/nnpops/run.py", line 22, in <module>
    system = potential.createMixedSystem(hgv.topology, system = hgv.system, atoms = range(126,156), use_OptimizedTorchANI = True)
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/openmmml-1.0-py3.9.egg/openmmml/mlpotential.py", line 278, in createMixedSystem
    self._impl.addForces(topology, newSystem, atomList, forceGroup, **args)
  File "/home/rufad/anaconda3/envs/nnpops/lib/python3.9/site-packages/openmmml-1.0-py3.9.egg/openmmml/models/anipotential.py", line 93, in addForces
    raise Exception(e)
Exception: Unknown species found in tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0]])

but not without the modification. I am not modifying the species attr, so I'm not sure where this is coming from.

dominicrufa commented 2 years ago

issue persists with periodic_table_index=False, as well. previously forgot to remove the edit

peastman commented 2 years ago

@raimis is the best person to comment on the implementation. One suggestion I'd make is that we probably don't want a flag for which implementation to use. If the optimized one is available, we should just use it automatically. I don't think there's ever a reason not to?

jchodera commented 2 years ago

One suggestion I'd make is that we probably don't want a flag for which implementation to use. If the optimized one is available, we should just use it automatically. I don't think there's ever a reason not to?

We will presumably need a way to test the optimized implementation against the non-optimized one or use the non-optimized one if circumstances require it, similar to how we can elect to use the Reference platform in OpenMM instead of more performant versions if desired.

raimis commented 2 years ago

@dominicrufa I have created a draft of a tutorial (https://github.com/openmm/openmm-torch/pull/62). This should give an example how to use NNPOps.

You can see it better here (https://github.com/raimis/openmm-torch/blob/example/tutorials/openmm-torch-nnpops.ipynb) or just

dominicrufa commented 2 years ago

@dominicrufa I have created a draft of a tutorial (openmm/openmm-torch#62). This should give an example how to use NNPOps.

You can see it better here (https://github.com/raimis/openmm-torch/blob/example/tutorials/openmm-torch-nnpops.ipynb) or just

ah, I think i was just treating the atomic_numbers field wrong. I think i have it working now. will update soon.

dominicrufa commented 2 years ago

alright, so I can now integrate NNPops into TorchANI. Interestingly, equipping NNPOPS has a cost of ~0.11s/MD step whereas omitting NNPOPS costs ~0.03s/MD step (script below). as a reference, MM-only MD costs ~0.0004s/MD step:

  1 #!/usr/bin/env python
  2 import torch
  3 import torchani
  4
  5 #from NNPOps.SpeciesConverter import TorchANISpeciesConverter
  6 #from NNPOps.SymmetryFunctions import TorchANISymmetryFunctions
  7 #from NNPOps.BatchedNN import TorchANIBatchedNN
  8 #from NNPOps.EnergyShifter import TorchANIEnergyShifter
  9 from NNPOps import OptimizedTorchANI
 10
 11 from openmmtools.testsystems import HostGuestExplicit
 12 from openmmtools.integrators import LangevinIntegrator
 13 from openmmml.mlpotential import MLPotential
 14 from simtk import openmm, unit
 15 import time
 16 import numpy as np
 17
 18
 19 device = torch.device('cuda')
 20
 21 hgv = HostGuestExplicit(constraints=None)
 22
 23 potential = MLPotential('ani2x')
 24 system = potential.createMixedSystem(hgv.topology, system = hgv.system, atoms = range(126,156), use_OptimizedTorchANI = True)
 25 print(f"done making system")
 26
 27
 28
 29 _int = LangevinIntegrator()
 30 context = openmm.Context(system, _int)
 31 context.setPositions(hgv.positions)
 32 print(f"unminimized pe: {context.getState(getEnergy=True).getPotentialEnergy()}")
 33 openmm.LocalEnergyMinimizer.minimize(context, maxIterations = 100)
 34 context.setVelocitiesToTemperature(298.15*unit.kelvin)
 35
 36 #timer
 37 timer = []
 38 for i in range(10):
 39     start_time = time.time()
 40     _int.step(100)
 41     print(context.getState(getEnergy=True).getPotentialEnergy())
 42     timer.append(time.time() - start_time)
 43
 44
 45 print(np.mean(timer), np.std(timer))

raimis commented 2 years ago

@dominicrufa I tried you script.

On GPU, it crashes:

Traceback (most recent call last):
  File "/home/user/tmp/ml.py", line 38, in <module>
    _int.step(100)
  File "/home/user/conda/lib/python3.9/site-packages/openmm/openmm.py", line 7788, in step
    return _openmm.CustomIntegrator_step(self, steps)
openmm.OpenMMException: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: CUDA driver error: invalid resource handle

On CPU, it runs ~0.02 s/MD step for both cases. This is reasonable, because NNPOps.BatchedNN is quite inefficient on CPU. It trades memory bandwidth for speed, which CPUs don't have a lot.

not-matt commented 2 years ago

@raimis I modified the bench script, fixing the bug you encountered

import torch
import torchani

#from NNPOps.SpeciesConverter import TorchANISpeciesConverter
#from NNPOps.SymmetryFunctions import TorchANISymmetryFunctions
#from NNPOps.BatchedNN import TorchANIBatchedNN
#from NNPOps.EnergyShifter import TorchANIEnergyShifter
from NNPOps import OptimizedTorchANI

from openmmtools.testsystems import HostGuestExplicit
from openmmtools.integrators import LangevinIntegrator
from openmmml.mlpotential import MLPotential
from simtk import openmm, unit
import time
import numpy as np

device = torch.device('cuda')

hgv = HostGuestExplicit(constraints=None)

potential = MLPotential('ani2x')
system = potential.createMixedSystem(hgv.topology, system = hgv.system, atoms = range(126,156), use_OptimizedTorchANI = True)
print(f"done making system")

# was causing error: RuntimeError: CUDA driver error: invalid resource handle
# _int = LangevinIntegrator()
_int = openmm.LangevinIntegrator(
    300 * unit.kelvin, 
    1 / unit.picosecond, 
    1.0 * unit.femtosecond,
)
context = openmm.Context(system, _int)
context.setPositions(hgv.positions)
print(f"unminimized pe: {context.getState(getEnergy=True).getPotentialEnergy()}")
openmm.LocalEnergyMinimizer.minimize(context, maxIterations = 100)
# was causing error: openmm.OpenMMException: The autograd engine was called while holding the GIL.
# context.setVelocitiesToTemperature(298.15*unit.kelvin)

#timer
timer = []
for i in range(10):
    start_time = time.time()
    _int.step(100)
    print(context.getState(getEnergy=True).getPotentialEnergy())
    timer.append(time.time() - start_time)

print(np.mean(timer), np.std(timer))

dominicrufa commented 2 years ago

I never encountered that bug on GPU; the nnpops i was using was mamba install -c mmh nnpops since mike henry is trying to place his version on conda-forge. @raimis, is there an env yaml you can send me to reproduce the error?. mine is attached nnpops.txt .

dominicrufa commented 2 years ago

@raimis I modified the bench script, fixing the bug you encountered

import torch
import torchani

#from NNPOps.SpeciesConverter import TorchANISpeciesConverter
#from NNPOps.SymmetryFunctions import TorchANISymmetryFunctions
#from NNPOps.BatchedNN import TorchANIBatchedNN
#from NNPOps.EnergyShifter import TorchANIEnergyShifter
from NNPOps import OptimizedTorchANI

from openmmtools.testsystems import HostGuestExplicit
from openmmtools.integrators import LangevinIntegrator
from openmmml.mlpotential import MLPotential
from simtk import openmm, unit
import time
import numpy as np

device = torch.device('cuda')

hgv = HostGuestExplicit(constraints=None)

potential = MLPotential('ani2x')
system = potential.createMixedSystem(hgv.topology, system = hgv.system, atoms = range(126,156), use_OptimizedTorchANI = True)
print(f"done making system")

# was causing error: RuntimeError: CUDA driver error: invalid resource handle
# _int = LangevinIntegrator()
_int = openmm.LangevinIntegrator(
    300 * unit.kelvin, 
    1 / unit.picosecond, 
    1.0 * unit.femtosecond,
)
context = openmm.Context(system, _int)
context.setPositions(hgv.positions)
print(f"unminimized pe: {context.getState(getEnergy=True).getPotentialEnergy()}")
openmm.LocalEnergyMinimizer.minimize(context, maxIterations = 100)
# was causing error: openmm.OpenMMException: The autograd engine was called while holding the GIL.
# context.setVelocitiesToTemperature(298.15*unit.kelvin)

#timer
timer = []
for i in range(10):
    start_time = time.time()
    _int.step(100)
    print(context.getState(getEnergy=True).getPotentialEnergy())
    timer.append(time.time() - start_time)

print(np.mean(timer), np.std(timer))

also, i might be missing something, but if openmmtools.integrators.LangevinIntegrator (or any CustomIntegrator objects) is not compatible with NNPops, this might cause more problems downstream.

not-matt commented 2 years ago

That's also how I installed mine, though I wouldn't put it past my setup to misbehave!

But I think you're right, CustomIntegrators are triggering an issue here.

jchodera commented 2 years ago

I missed the details---what is the issue with CustomIntegrator?

And do I understand @dominicrufa correctly that NNPOps and the optimized TorchANI makes things 5x slower, rather than 5x faster?

And are these single ANI models, or ensembles?

not-matt commented 2 years ago

Possible `CustomIntegrator` bug

Using openmmtools.integrators.LangevinIntegrator rather than manually configuring an openmm.LangevinIntegrator was triggering this error: RuntimeError: CUDA driver error: invalid resource handle

Speed

NNPops optimised ANI is significantly faster for me.

For a 30 atom unsolvated system, I get about 5ns/day with unoptimised torchani. The same system with optimised ANI reaches 30ns/day

raimis commented 2 years ago

@not-matt thanks for your effort.

# was causing error: RuntimeError: CUDA driver error: invalid resource handle
# _int = LangevinIntegrator()

I guess this might be a some incompatibility between OpenMM-Torch and OpenMM-Tools. Could you try to reduce the script to a minimum to trigger the issue and create a separate issue. For a moment using openmm.LangevinIntegrator is viable solutions.

# was causing error: openmm.OpenMMException: The autograd engine was called while holding the GIL.
# context.setVelocitiesToTemperature(298.15*unit.kelvin)

This is already fixed by https://github.com/openmm/openmm/pull/3424, we just need to release OpenMM 7.7.1.

mikemhenry commented 2 years ago

I've updated the nnpops package in my channel to 0.2 https://anaconda.org/mmh/nnpops/files I've now got a mix of different cuda/python packages built. Still waiting on a review from the conda-forge people but conda update nnpops should pull in the latest version now.

dominicrufa commented 2 years ago

I've updated the nnpops package in my channel to 0.2 https://anaconda.org/mmh/nnpops/files I've now got a mix of different cuda/python packages built. Still waiting on a review from the conda-forge people but conda update nnpops should pull in the latest version now.

@mikemhenry , did you replicate the code snippet with your conda installation?

mikemhenry commented 2 years ago

I forgot I was going to try that, can you link the code snippet? There are several in this thread and I want to make sure to test the right one @dominicrufa

dominicrufa commented 2 years ago

I forgot I was going to try that, can you link the code snippet? There are several in this thread and I want to make sure to test the right one @dominicrufa

https://github.com/openmm/openmm-ml/pull/20#issue-1117917500. if it fails, (which was reported by others in this thread), then try this: https://github.com/openmm/openmm-ml/pull/20#issuecomment-1029053808

mikemhenry commented 2 years ago

Using the version from my channel mamba update nnpops -c mmh Which pulls in nnpops 0.2 cuda112py39h453d82a_0 mmh/linux-64 493 KB Both snippets worked for me. With the second snippet https://github.com/openmm/openmm-ml/pull/20#issuecomment-1029053808 I got a mean time of 2.0347s per 100 steps (0.083s std)

With use_OptimizedTorchANI = False

1.9108s per 100 steps (0.0753s std)

(This is on my laptop, and I'm not sure if /home/mmh/miniconda3/envs/nnpops-private/lib/python3.9/site-packages/torchani/__init__.py:55: UserWarning: Dependency not satisfied, torchani.ase will not be available is an issue or not, so take this with a grain of salt, I'm mostly testing the snippet to make sure it works)

dominicrufa commented 2 years ago

are there any reasons why conda installs would yield such differences in performance? /with/without errors?

mikemhenry commented 2 years ago

Lots of reasons, the perforce differences are likely hardware differences, I ran the script on an NVIDIA GeForce RTX 2060 card on my laptop, so the GPU was also busy running my xserver. The errors I'm less sure about. I did check I was using a GPU, with CPU I get 5.669s With use_OptimizedTorchANI = False and 6.3206s with use_OptimizedTorchANI = True

jchodera commented 2 years ago

@dominicrufa : Is this something you can work with @mikemhenry on interactively on lilac to debug? Has anyone tried this on google colab, for example?

dominicrufa commented 2 years ago

@dominicrufa : Is this something you can work with @mikemhenry on interactively on lilac to debug? Has anyone tried this on google colab, for example?

if this is a hardware/conda versions issue, i don't know how i would go about debugging this. my env yaml is posted here. based on @raimis 's tutorial, google colab seems to have different performance than what mike and i see.

dominicrufa commented 2 years ago

@peastman , I implemented the changes/tests to the NNPOps implementation discussed here. I parameterized the unittest here, and while I can get the CUDA-disabled test to pass (using pytest), running CUDA with the nnpops implementation still throws a openmm.OpenMMException: Error invoking kernel: CUDA_ERROR_INVALID_HANDLE (400) exception with the latest conda-installable nightly build of openmm.

jchodera commented 2 years ago

Can you list which CUDA toolkit versions of these packages you installed, and which node/driver you were running on? I think we often see that when there is a mismatch between CUDA build version and driver version.

dominicrufa commented 2 years ago

Can you list which CUDA toolkit versions of these packages you installed, and which node/driver you were running on? I think we often see that when there is a mismatch between CUDA build version and driver version.

cudatoolkit=11.3.1. using lt node. i don't suspect that being the problem since i can manually equip a TorchForce with nnpops without openmm-ml and it will run on a gpu just fine.

jchodera commented 2 years ago

The CUDA Toolkit release notes state:

Each release of the CUDA Toolkit requires a minimum version of the CUDA driver. The CUDA driver is backward compatible, meaning that applications compiled against a particular version of the CUDA will continue to work on subsequent (later) driver releases.

CUDA 11.3 requires >=450.80.02, and it looks like lt20 has 465.19.01:

$ nvidia-smi
Fri Apr  8 20:42:13 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:84:00.0 Off |                  N/A |
| 28%   32C    P8     9W / 250W |      1MiB / 11178MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

peastman commented 1 year ago

This is superseded by #35.

openmm / openmm-ml

NNPOps Integration #20

Possible `CustomIntegrator` bug

Speed

openmm / openmm-ml

NNPOps Integration #20

Possible CustomIntegrator bug

Speed

Possible `CustomIntegrator` bug