Closed JSLJ23 closed 11 months ago
I have also tried this in a separate environment with CUDA 11.8 and still ran into the same issue...
mamba list cuda
>>> # Name Version Build Channel
>>> cuda-version 11.8 h70ddcb2_2 conda-forge
>>> cudatoolkit 11.8.0 h4ba93d1_12 conda-forge
What is the output of nvidia-smi? BTW, this is not the first time we encounter this error. See related discussion here https://github.com/openmm/openmm-torch/pull/106 Not sure we really got a closure on that...
For additional context, as far as I gathered the offending kernel comes from NNPOpsANISymmetryFunctions
in that case.
Perhaps we can check this by turning off the optimized symmetry function here:
https://github.com/openmm/NNPOps/blob/d15cb9196e283b6b55f88a93d85232458f64fa18/src/pytorch/OptimizedTorchANI.py#L43
The problem is that the error was kind of elusive last time, so it was hard to debug.
Tue Aug 15 14:32:02 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.98 Driver Version: 535.98 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 2080 ... Off | 00000000:01:00.0 Off | N/A |
| N/A 45C P0 26W / 90W | 6MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 950 G /usr/lib/Xorg 4MiB |
+---------------------------------------------------------------------------------------+
Your script runs without issue in a machine with a 4090. Both using the nnpops 0.6 conda package and the manually compiled master.
However, running on a 2080Ti like yours produces the error.
Replacing the offending component as suggested:
self.aev_computer = model.aev_computer
does indeed fix the error.
It seems like the compiler is not considering the sm_75 arch. That mixed with some kernel in SymmetryFunctions that I guess requires some arch-specific directives produces the issue.
cuobjdump /home/raul/mambaforge/envs/nnpops06/lib/python3.11/site-packages/NNPOps/libNNPOpsPyTorch.so | grep sm_ | sort | uniq
arch = sm_35
arch = sm_50
arch = sm_80
arch = sm_86
Furthermore, the reason why it works on the CI is that by default torch sets the architectures to the ones native to the system. The issue then lies in the conda feedstock not choosing the correct archs.
The bug requires a new build in the fedstock. Hopefully https://github.com/conda-forge/nnpops-feedstock/pull/26 does the trick.
class NNP(torch.nn.Module):
def __init__(self, atomic_numbers):
super().__init__()
# Store the atomic numbers
self.atomic_numbers = torch.tensor(atomic_numbers, device=device).unsqueeze(0)
# Create an ANI-2x model
self.ani2x = ANI2x(periodic_table_index=True).to(device)
# Accelerate the model
self.model = OptimizedTorchANI(self.ani2x, self.atomic_numbers).to(device)
# AEV computer fix
self.aev_computer = self.ani2x.aev_computer
def forward(self, positions):
# Prepare the positions
positions = positions.unsqueeze(0).float() * 10 # nm --> Å
# Run ANI-2x
result = self.model((self.atomic_numbers, positions))
# Get the potential energy
energy = result.energies[0] * 2625.5 # Hartree --> kJ/mol
return energy
# Create an instance of the model
nnp = NNP(atomic_numbers)
I've tried modifying the NNP
object to take the ani2x's AEV computer but it still gives the CUDA error.
May I know exactly how you are doing self.aev_computer = model.aev_computer
?
Sorry, I should have been more clear. I changed the definition in the constructor of OptimizedTorchANI.py directly. I think the moment you do ".to(device)" the error will pop out.
I changed this line there:
self.aev_computer = TorchANISymmetryFunctions(model.species_converter, model.aev_computer, atomicNumbers)
by this:
self.aev_computer = model.aev_computer
You can have this solution now by just defining your own OptimizedTorchAni, which is a short class currently defined as:
import torch
from torch import Tensor
from typing import Optional, Tuple
from NNPOps.BatchedNN import TorchANIBatchedNN
from NNPOps.EnergyShifter import TorchANIEnergyShifter, SpeciesEnergies
from NNPOps.SpeciesConverter import TorchANISpeciesConverter
from NNPOps.SymmetryFunctions import TorchANISymmetryFunctions
class OptimizedTorchANI(torch.nn.Module):
from torchani.models import BuiltinModel # https://github.com/openmm/NNPOps/issues/44
def __init__(self, model: BuiltinModel, atomicNumbers: Tensor) -> None:
super().__init__()
# Optimize the components of an ANI model
self.species_converter = TorchANISpeciesConverter(model.species_converter, atomicNumbers)
self.aev_computer = TorchANISymmetryFunctions(model.species_converter, model.aev_computer, atomicNumbers)
self.neural_networks = TorchANIBatchedNN(model.species_converter, model.neural_networks, atomicNumbers)
self.energy_shifter = TorchANIEnergyShifter(model.species_converter, model.energy_shifter, atomicNumbers)
def forward(self, species_coordinates: Tuple[Tensor, Tensor],
cell: Optional[Tensor] = None,
pbc: Optional[Tensor] = None) -> SpeciesEnergies:
species_coordinates = self.species_converter(species_coordinates)
species_aevs = self.aev_computer(species_coordinates, cell=cell, pbc=pbc)
species_energies = self.neural_networks(species_aevs)
species_energies = self.energy_shifter(species_energies)
return species_energies
If I am not missing something, you should be able to just copy paste this into your script:
import torch
from torch import Tensor
from typing import Optional, Tuple
from NNPOps.BatchedNN import TorchANIBatchedNN
from NNPOps.EnergyShifter import TorchANIEnergyShifter, SpeciesEnergies
from NNPOps.SpeciesConverter import TorchANISpeciesConverter
from NNPOps.SymmetryFunctions import TorchANISymmetryFunctions
class OptimizedTorchANI(torch.nn.Module):
from torchani.models import BuiltinModel # https://github.com/openmm/NNPOps/issues/44
def __init__(self, model: BuiltinModel, atomicNumbers: Tensor) -> None:
super().__init__()
# Optimize the components of an ANI model
self.species_converter = TorchANISpeciesConverter(model.species_converter, atomicNumbers)
self.aev_computer = model.aev_computer
self.neural_networks = TorchANIBatchedNN(model.species_converter, model.neural_networks, atomicNumbers)
self.energy_shifter = TorchANIEnergyShifter(model.species_converter, model.energy_shifter, atomicNumbers)
def forward(self, species_coordinates: Tuple[Tensor, Tensor],
cell: Optional[Tensor] = None,
pbc: Optional[Tensor] = None) -> SpeciesEnergies:
species_coordinates = self.species_converter(species_coordinates)
species_aevs = self.aev_computer(species_coordinates, cell=cell, pbc=pbc)
species_energies = self.neural_networks(species_aevs)
species_energies = self.energy_shifter(species_energies)
return species_energies
and remove the NNPops import from NNPOps import OptimizedTorchANI
.
Ok I managed to get hold of RTX 4090 system to test the NNPOps and like you mentioned, it works seamlessly there.
Wed Aug 16 17:40:57 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 Off | Off |
| 0% 37C P3 51W / 450W| 7269MiB / 24564MiB | 12% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2726 G /usr/lib/xorg/Xorg 1144MiB |
| 0 N/A N/A 2867 G /usr/bin/gnome-shell 557MiB |
| 0 N/A N/A 4362 G ...sion,SpareRendererForSitePerProcess 193MiB |
| 0 N/A N/A 6695 G ...5017253,17452613814888696451,262144 392MiB |
| 0 N/A N/A 8268 C ...mbaforge/envs/nnff_py310/bin/python 2186MiB |
| 0 N/A N/A 8864 C ...mbaforge/envs/nnff_py310/bin/python 2470MiB |
+---------------------------------------------------------------------------------------+
However I am noticing something very very strange when running the custom OptimizedTorchANI
with the self.aev_computer = model.aev_computer
versus the default OptimizedTorchANI
imported from NNPOps
.
The custom OptimizedTorchANI
seems to cause the system to explode with extremely high temperatures.
OptimizedTorchANI
#"Step","Time (ps)","Potential Energy (kJ/mole)","Temperature (K)"
100,0.10000000000000007,-1299286.6337109958,4157.272221653436
200,0.20000000000000015,-1298719.9741632198,9269.906210046393
300,0.3000000000000002,-1297707.8114830707,11445.388596274423
400,0.4000000000000003,-1298399.7325442587,12468.574661346423
500,0.5000000000000003,-1298875.3397286986,12172.953122353025
600,0.6000000000000004,-1299012.1397098755,10125.893963085939
700,0.7000000000000005,-1299297.581734464,9565.762065101066
800,0.8000000000000006,-1298897.1822553729,6906.9651615600405
900,0.9000000000000007,-1299240.9848396038,7325.153176464823
1000,1.0000000000000007,-1299094.3014499997,5827.495844342631
OptimizedTorchANI
#"Step","Time (ps)","Potential Energy (kJ/mole)","Temperature (K)"
100,0.10000000000000007,-1301536.311651804,29.84507966482548
200,0.20000000000000015,-1301530.3516933029,51.569098455429895
300,0.3000000000000002,-1301530.2664442887,83.21092035049655
400,0.4000000000000003,-1301522.8738798206,104.69981264325087
500,0.5000000000000003,-1301516.4535609935,115.24051176337566
600,0.6000000000000004,-1301508.6040790235,157.23755800264192
700,0.7000000000000005,-1301503.9827139233,157.7469660590436
800,0.8000000000000006,-1301514.2675243174,205.74598795558802
900,0.9000000000000007,-1301497.1392407422,163.9890396519284
1000,1.0000000000000007,-1301499.3586102133,201.60413670511966
This is from the Alanine dipeptide test system. Any idea why this might be happening? I've submitted this as an issue on the OpenMM-Torch repo as well.
Well my solution ignores the species converter, I guess one cannot do so that happily... I am confused, both the default and the custom explode but in different ways?
Is this also the case if you use the original ani2x? EDIT: I see the openmm-torch issue shows the same thing happens with original ANI2x
You could do as the example in the readme does:
# Construct ANI-2x and replace its operations with the optimized ones
nnp = torchani.models.ANI2x(periodic_table_index=True).to(device)
nnp.species_converter = TorchANISpeciesConverter(nnp.species_converter, species).to(device)
#nnp.aev_computer = TorchANISymmetryFunctions(nnp.species_converter, nnp.aev_computer, species).to(device)
nnp.neural_networks = TorchANIBatchedNN(nnp.species_converter, nnp.neural_networks, species).to(device)
nnp.energy_shifter = TorchANIEnergyShifter(nnp.species_converter, nnp.energy_shifter, species).to(device)
This should not be necessary when the new build drops hopefully later today. Then if the original error is solved we can move the discussion to the issue you opened in openmm-torch.
The second one with the default OptimizedTorchANI
actually works okay, the trajectory of the dipeptide stays intact and the reported temperatures are within the ballpark of the ones reported in the collab notebook example.
It takes awhile to warm up but does eventually reach 290K or so.
Yes the same thing happens when I just use the original ANI2x natively without the OptimizedTorchANI
.
Yea the original issue was solved so I'll close this.
New build is out, you should be able to run on a 2080.
Ok just downloaded and tested it, works pefectly! Thank you!
CUDA error: no kernel image
Hi NNPOps developers, I was trying to run the example of NNPOps but with the alanine dipeptide example but I am running into CUDA RuntimeErrors indicating that there are no kernel images available. Not sure how to go about debugging this so I was hoping to get some help on this.
Best regards, Joshua
Full error:
Environment:
Code I am trying to run:
Imports:
Ensure CUDA is available
Use alanine dipeptide as a test system
Remove MM forces
while ala2.system.getNumForces() > 0: ala2.system.removeForce(0)
The system should not contain any additional force and constrains
assert ala2.system.getNumConstraints() == 0 assert ala2.system.getNumForces() == 0
species = torch.tensor([[atom.element.atomic_number for atom in ala2.topology.atoms()]], device=device) positions = torch.tensor([ala2.positions.tolist()], dtype=torch.float32, requires_grad=True, device=device)
Alternatively, all the optimizations can be applied with OptimizedTorchANI
nnp = ANI2x(periodic_table_index=True).to(device) nnp = OptimizedTorchANI(nnp, species).to(device)
Compute energy and forces again
energy = nnp((species, positions)).energies positions.grad.zero_() energy.backward() forces = -positions.grad.clone()
print(energy, forces)
RuntimeError Traceback (most recent call last) Cell In[5], line 9 6 nnp = OptimizedTorchANI(nnp, species).to(device) 8 # Compute energy and forces again ----> 9 energy = nnp((species, positions)).energies 10 positions.grad.zero_() 11 energy.backward()
File ~/mambaforge/envs/nnff_py310/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/mambaforge/envs/nnff_py310/lib/python3.10/site-packages/NNPOps/OptimizedTorchANI.py:52, in OptimizedTorchANI.forward(self, species_coordinates, cell, pbc) 47 def forward(self, species_coordinates: Tuple[Tensor, Tensor], 48 cell: Optional[Tensor] = None, 49 pbc: Optional[Tensor] = None) -> SpeciesEnergies: 51 species_coordinates = self.species_converter(species_coordinates) ---> 52 species_aevs = self.aev_computer(species_coordinates, cell=cell, pbc=pbc) 53 species_energies = self.neural_networks(species_aevs) 54 species_energies = self.energy_shifter(species_energies)
File ~/mambaforge/envs/nnff_py310/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []
File ~/mambaforge/envs/nnff_py310/lib/python3.10/site-packages/NNPOps/SymmetryFunctions.py:121, in TorchANISymmetryFunctions.forward(self, species_positions, cell, pbc) 118 raise ValueError('Only fully periodic systems are supported, i.e. pbc = [True, True, True]') 120 radial, angular = operation(self.holder, positions[0], cell) --> 121 features = torch.cat((radial, angular), dim=1).unsqueeze(0) 123 return species, features
RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.