Initial implementation of MLPotential

peastman commented 3 years ago

This is an initial version of the API discussed in https://github.com/openmm/openmm/issues/2991. It includes support for the ANI1ccx and ANI2x potentials. This only includes the function to create a System modeled entirely with ML. I haven't yet written the function to create a hybrid System (modeling a ligand with ML and everything else with a conventional force field).

ppxasjsm commented 3 years ago

Very naive question:

Where does this conversion factor come from: 2625.5000885335317? I assume it is something very obvious. I think so far this makes sense to me. Will be curious to see how to deal with the hybrid system version will look like.

peastman commented 3 years ago

TorchANI returns energy in Hartrees, and OpenMM expects it in kJ/mol. Rather than having a mysterious constant there, I should probably change it to torchani.units.hartree2kjoulemol.

peastman commented 3 years ago

I added the function for creating Systems that mix an ML potential with a conventional force field.

jchodera commented 3 years ago

@peastman: Apologies for not having a chance to review this earlier. This looks great!

The createMixedSystem method is immediately highly useful since I think it will require minimal changes (if changes are required at all) to consume the resulting System for alchemical absolute and relative factories! @dominicrufa can help verify this.

Some comments:

Until we have proper documentation, it would be useful to update the README.md with a few examples of usage so it's easier to understand how the API is intended to be used. The docstring examples would be just fine for this purpose!
Another useful mode for createMixedSystem would be to allow atoms to be omitted and instead specify a molecular weight cutoff for molecules, such that any (non-solvent) molecules below that cutoff would be assigned as ML potential atoms.
An optional argument to assign the MM and ML potentials to different force groups would be useful.
Finally, it would also be useful create hybrid MM/ML potentials where a global context parameter (e.g. lambda_ml) controls the mixing of MM and ML so we can alchemically interpolate . This would presumably have to use CustomCVForce to mix the two, so this will presumably use a different method (e.g. createFractionallyMixedSystem) since this will be slower and result in a more complicated system.

peastman commented 3 years ago

Another useful mode for createMixedSystem would be to allow atoms to be omitted and instead specify a molecular weight cutoff for molecules, such that any (non-solvent) molecules below that cutoff would be assigned as ML potential atoms.

If we start building in logic for selecting atoms, that very quickly turns into an infinite rabbit hole. It's easy to implement whatever logic you want, but the variety of rules someone might want to implement will never end. For example, a cutoff on residue mass is easy to implement:

atoms = []
for residue in topology.residues():
  mass = sum(atom.element.mass for atom in residue.atoms())
  if mass > cutoff:
    atoms += list(residue.atoms())

But you specified non-solvent molecules, so we should also check for residue.name != 'HOH'. That won't exclude ions though, so we should also check for residue.name not in ('Na', 'Cl'). Or whatever other ions you're using. Maybe we should require the residue to have a minimum number of atoms? Then again, should we be looking at chains rather than residues? You can have multiple residues per molecule, and multiple molecules per chain.

It's best to just keep this separate. You know what atoms you want, and you can use whatever logic or whatever tools you want to select them.

An optional argument to assign the MM and ML potentials to different force groups would be useful.

Good idea.

Finally, it would also be useful create hybrid MM/ML potentials where a global context parameter (e.g. lambda_ml) controls the mixing of MM and ML so we can alchemically interpolate .

Can you give more details about how the system should be structured? What would be the full set of forces in the system, and what would the CustomCVForce contain? For example, do we have two copies of the standard forces, one with all interactions, and one with all interactions except the internal interactions of the ligand? Or do we try to split them up, so one copy has the internal interactions and one has all the other interactions?

jchodera commented 3 years ago

If we start building in logic for selecting atoms, that very quickly turns into an infinite rabbit hole. It's easy to implement whatever logic you want, but the variety of rules someone might want to implement will never end.

Since Topology lacks a useful selection language (like the MDTraj DSL) and is pretty hard to use to identify which atoms are of interest (we have essentially zero good examples), I don't think it's unreasonable to suggest we provide some convenience functions or arguments that would cover 80% of use cases.

The simplest way to tackle what you suggest is simply have min and max molecular weights and/or number of atoms for molecules.

Instead of optional arguments, what about a simple helper method, like MLPotential.guessLigandAtoms(topology) that could do a bit of intelligent filtering? This would be easier than turning Topology into something with a full-featured DSL or writing a host of examples, though both might be good in the long term. (My long-term preference would be to replace the OpenMM versions with a well-supported OpenFF version of these objects, of course.)

Can you give more details about how the system should be structured? What would be the full set of forces in the system, and what would the CustomCVForce contain? For example, do we have two copies of the standard forces, one with all interactions, and one with all interactions except the internal interactions of the ligand? Or do we try to split them up, so one copy has the internal interactions and one has all the other interactions?

I was thinking

System
+ standard forces (unmodified)
+ CustomCVForce
   + (-lambda_ml) * copy of forces with only atom selection not zeroed out
   + (+lambda_ml) * ML force applied to atom selection

Presumably, we would need to restrict this to only dealing with a single molecule (or part thereof) for the atom selection, so that we can simply use NoCutoff for treating the copy of forces within CustomCVForce so that we avoid the expense of PME.

Does something like that seem reasonable? That would work well for a ligand treated with CustomCVForce, but not two reacting species, though adding and subtracting MM potentials would be unsuitable for modeling multiple reacting components anyway because the energies explode.

peastman commented 3 years ago

Since Topology lacks a useful selection language (like the MDTraj DSL)

That's exactly my point. Why create a new half-baked atom selection system when MDTraj already solves that problem really well?

atoms = mdtraj.Topology.from_openmm(topology).select(...)

That's the Python philosophy (and also the Unix philosophy). Create lots of specialized tools, each of which does one thing really well, and make it easy to use them together.

jchodera commented 3 years ago

What would you think of using mdtraj in the examples then?

There are still, unfortunately, incompatibility problems between OpenMM and MDTraj Topology objects---we see a number of failures in conversion that we haven't had time to investigate or fix yet, unfortunately. Hopefully we can interoperate with the OpenFF Topology object at some point soon (cc @j-wags to tag in whoever is heading that up).

peastman commented 3 years ago

What would you think of using mdtraj in the examples then?

That would be fine.

openmm / openmm-ml

Initial implementation of MLPotential #1