Particle position is NaNs with hybrid system

JSLJ23 commented 1 year ago

Dear openmm-ml developers,

I am trying to use this tool to run a Hybrid MM/ML simulation of a single protein chain in solvent with the protein atoms modelled by torchani and NNPOps, and the solvent with tip3pfb, but the simulation seems to be giving errors even after running energy minimisation. I have attached my code and the input file for the protein PDB which also has been energy minimised in Schrodinger.

Hoping to get some help on this and possibly figure out if the issue is due to the PDB itself or a bug in my code...

import sys
from time import perf_counter

import openmm.unit as unit
from openmm import LangevinMiddleIntegrator
from openmm.app import Simulation, StateDataReporter
from openmm.app.dcdreporter import DCDReporter
from openmm.app.forcefield import ForceField
from openmm.app.pdbfile import PDBFile
from openmm.app.pdbreporter import PDBReporter
from openmm.unit import femtosecond, kelvin, kilojoule_per_mole, picosecond
from openmmml import MLPotential
from pdbfixer import PDBFixer

# Load in the PDB strucure
fixer = PDBFixer(filename="./systems/4KD1_protein_only.pdb")
fixer.addSolvent(padding=10 * unit.angstrom)
fixer.removeHeterogens(True)
PDBFile.writeFile(fixer.topology, fixer.positions, open("./systems/4KD1_protein_only_solvated.pdb", "w"))
pdb = PDBFile(file="./systems/4KD1_protein_only_solvated.pdb")

# Setup force field
forcefield = ForceField("amber14-all.xml", "amber14/tip3pfb.xml")
pdb.system = forcefield.createSystem(pdb.topology)

# Select protein atoms
amino_acid_residues = {
    "ALA", "ASN", "CYS", "GLU", "HIS", "LEU", "MET", "PRO", "THR", "TYR", "ARG", "ASP", "GLN", "GLY", "ILE", "LYS", "PHE",
    "SER", "TRP", "VAL", "ACE", "NME", "NMA"
}
protein_atoms = set()

for residue in pdb.topology.residues():
    if residue.name in amino_acid_residues:
        for atom in residue.atoms():
            protein_atoms.add(atom.index)

# Create ML potential (single model to fit into GPU VRAM)
potential = MLPotential('ani2x')
pdb.system = potential.createMixedSystem(pdb.topology, pdb.system, protein_atoms, modelIndex=0)

# Setup simulation
n_fs = 1

temperature = 298.15 * kelvin
frictionCoeff = 1 / picosecond
timeStep = n_fs * femtosecond
integrator = LangevinMiddleIntegrator(temperature, frictionCoeff, timeStep)

simulation = Simulation(pdb.topology, pdb.system, integrator)
simulation.context.setPositions(pdb.positions)
simulation.context.setVelocitiesToTemperature(temperature)

reporter = StateDataReporter(file=sys.stdout, reportInterval=1000, step=True, time=True, potentialEnergy=True, temperature=True)
simulation.reporters.append(reporter)

trajectory = DCDReporter(file="./trajectories/4KD1_hybrid_1fs.dcd", reportInterval=10, enforcePeriodicBox=True)
simulation.reporters.append(trajectory)

# Run the simulation
n_steps = int(10_000 / n_fs)
simulation.minimizeEnergy()
simulation.step(n_steps)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[12], line 8
      4 simulation.minimizeEnergy()
      6 start = perf_counter()
----> 8 simulation.step(n_steps)
     10 end = perf_counter()
     12 print(end - start)

File ~/miniforge3/envs/nnff_py310/lib/python3.10/site-packages/openmm/app/simulation.py:141, in Simulation.step(self, steps)
    139 def step(self, steps):
    140     """Advance the simulation by integrating a specified number of time steps."""
--> 141     self._simulate(endStep=self.currentStep+steps)

File ~/miniforge3/envs/nnff_py310/lib/python3.10/site-packages/openmm/app/simulation.py:241, in Simulation._simulate(self, endStep, endTime)
    238 # Generate the reports.
    240 if len(wrapped) > 0:
--> 241     self._generate_reports(wrapped, True)
    242 if len(unwrapped) > 0:
    243     self._generate_reports(unwrapped, False)

File ~/miniforge3/envs/nnff_py310/lib/python3.10/site-packages/openmm/app/simulation.py:263, in Simulation._generate_reports(self, reports, periodic)
    259 state = self.context.getState(getPositions=getPositions, getVelocities=getVelocities, getForces=getForces,
    260                               getEnergy=getEnergy, getParameters=True, enforcePeriodicBox=periodic,
    261                               groups=self.context.getIntegrator().getIntegrationForceGroups())
    262 for reporter, next in reports:
--> 263     reporter.report(self, state)

File ~/miniforge3/envs/nnff_py310/lib/python3.10/site-packages/openmm/app/dcdreporter.py:107, in DCDReporter.report(self, simulation, state)
    102 if self._dcd is None:
    103     self._dcd = DCDFile(
    104         self._out, simulation.topology, simulation.integrator.getStepSize(),
    105         simulation.currentStep, self._reportInterval, self._append
    106     )
--> 107 self._dcd.writeModel(state.getPositions(), periodicBoxVectors=state.getPeriodicBoxVectors())

File ~/miniforge3/envs/nnff_py310/lib/python3.10/site-packages/openmm/app/dcdfile.py:125, in DCDFile.writeModel(self, positions, unitCellDimensions, periodicBoxVectors)
    123     positions = positions.value_in_unit(nanometers)
    124 if any(math.isnan(norm(pos)) for pos in positions):
--> 125     raise ValueError('Particle position is NaN.  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan')
    126 if any(math.isinf(norm(pos)) for pos in positions):
    127     raise ValueError('Particle position is infinite.  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan')

ValueError: Particle position is NaN.  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#nan

4KD1_protein_only.zip

peastman commented 1 year ago

ANI isn't designed to simulate proteins. It can't handle charged groups. It won't give realistic forces for them.

JSLJ23 commented 1 year ago

Ah I see, would retraining ANI on the SPICE dataset (or fine tuning it) be a viable strategy to this then?

peastman commented 1 year ago

You could try, though I'm skeptical it would produce useful results. The ANI model uses a very short cutoff distance. When you have charged groups, it's essential to include longer range interactions.

JSLJ23 commented 1 year ago

Hmm that's a really good point! Any other Openmm-torch compatible NNPs that you would suggest then?

peastman commented 1 year ago

Not that I know of. The field is still very young. There are a lot of papers describing model architectures, but very few trained models that are suitable for general purpose use. I'm hoping that will change over the next year or two.

JSLJ23 commented 1 year ago

Thanks for sharing. One more question though, would it be possible to still use OpenMM internal implementation of PME along with a version of TorchANI trained on the SPICE dataset to model both the short range and long range electrostatics if the TorchANI cutoff is set to be the same as the PME cutoff?

peastman commented 1 year ago

We have an implementation of PME in NNPOps. It's a pytorch custom operation, so you can combine it with an ANI model (or any pytorch model) and train them together.

JSLJ23 commented 1 year ago

Wow really cool! Could you elaborate slightly on how combining it actually works? Are there pyTorch weights on this ewald sumation?

peastman commented 1 year ago

See https://github.com/openmm/NNPOps/blob/master/src/pytorch/pme/pme.py. It's just a pytorch op. It takes positions, charges, and box vectors as input and returns energy. You can incorporate that operation into your model however you want.

JSLJ23 commented 1 year ago

When performing backpropagation, this class computes derivatives with respect to atomic positions and charges, but
not to any other parameters (box vectors, alpha, etc.).  In addition, it only computes first derivatives.
Attempting to compute a second derivative will throw an exception.  This means that if you use PME during training,
the loss function can only depend on energy, not forces.

I don't quite understand this block specifically. So when the PME class back-propagation function is called, are the forces computed with respect to the positions? But during training, a different back-propagation is called?

peastman commented 1 year ago

Yes, forces are computed with respect to positions as expected. It also can compute derivatives with respect to charges. That's important if your charges are computed by the model rather than being fixed.

But it won't compute derivatives with respect to anything else. For example, if you wanted a derivative of the energy with respect to the box vectors (used by some barostat algorithms), it won't compute it.

JSLJ23 commented 1 year ago

Make sense, thanks for clarifying.

If I wish to train this in tandem with TorchANI or some other NNP, I could just add a simple linear layer to the end of the output Tensor of this PME class to weight the energies and back-propagate an error against some ab initio ground truth energy, am I right?

Also, how should the compute_direct() and compute_reciprocal() work for something like TorchANI in NNPOps? Would it be something like TorchANI energy + energy from compute_reciprocal() - energy from compute_direct(), with the TorchANI cutoff set to match the compute_direct() cutoff?

peastman commented 1 year ago

It's really up to you. Think of it as a tool to use in building your model. You can use it however you want. A straightforward implementation would be ani_energy + compute_direct() + compute_reciprocal(). In that case the ANI model tries to learn the difference between the total energy and the PME energy.

JSLJ23 commented 1 year ago

Ok this makes sense as well and I'll probably give a shot to both:

ani_energy + compute_direct() + compute_reciprocal()
ani_energy - w1 × compute_direct() + w2 × compute_reciprocal(), where w1 and w2 could be simple linear layers of weights to adjust the PME energies.

Thank you for the input!

peastman commented 1 year ago

Why are you subtracting the direct space energy? The division between direct and reciprocal space is arbitrary, just done for computational convenience.

JSLJ23 commented 1 year ago

Oh I thought the direct space energy is going to be computed by TorchANI (or some other NNP) so my assumption was subtracting it from the sum would prevent double counting.

peastman commented 1 year ago

The ANI model will learn whatever you train it to learn. You can't just take an existing ANI model and add another term to it. That wouldn't be realistic at all. You're creating a new model that will need to be trained.

JSLJ23 commented 1 year ago

The idea I had in mind looks something like this. PME+ANI And yes I do plan to train a new ANI model from scratch in this manner.

openmm / openmm-ml

Particle position is NaNs with hybrid system #59