Closed JSLJ23 closed 1 year ago
ANI isn't designed to simulate proteins. It can't handle charged groups. It won't give realistic forces for them.
Ah I see, would retraining ANI on the SPICE dataset (or fine tuning it) be a viable strategy to this then?
You could try, though I'm skeptical it would produce useful results. The ANI model uses a very short cutoff distance. When you have charged groups, it's essential to include longer range interactions.
Hmm that's a really good point! Any other Openmm-torch compatible NNPs that you would suggest then?
Not that I know of. The field is still very young. There are a lot of papers describing model architectures, but very few trained models that are suitable for general purpose use. I'm hoping that will change over the next year or two.
Thanks for sharing.
One more question though, would it be possible to still use OpenMM
internal implementation of PME along with a version of TorchANI trained on the SPICE dataset to model both the short range and long range electrostatics if the TorchANI cutoff is set to be the same as the PME cutoff?
We have an implementation of PME in NNPOps. It's a pytorch custom operation, so you can combine it with an ANI model (or any pytorch model) and train them together.
Wow really cool! Could you elaborate slightly on how combining it actually works? Are there pyTorch weights on this ewald sumation?
See https://github.com/openmm/NNPOps/blob/master/src/pytorch/pme/pme.py. It's just a pytorch op. It takes positions, charges, and box vectors as input and returns energy. You can incorporate that operation into your model however you want.
When performing backpropagation, this class computes derivatives with respect to atomic positions and charges, but
not to any other parameters (box vectors, alpha, etc.). In addition, it only computes first derivatives.
Attempting to compute a second derivative will throw an exception. This means that if you use PME during training,
the loss function can only depend on energy, not forces.
I don't quite understand this block specifically. So when the PME class back-propagation function is called, are the forces computed with respect to the positions? But during training, a different back-propagation is called?
Yes, forces are computed with respect to positions as expected. It also can compute derivatives with respect to charges. That's important if your charges are computed by the model rather than being fixed.
But it won't compute derivatives with respect to anything else. For example, if you wanted a derivative of the energy with respect to the box vectors (used by some barostat algorithms), it won't compute it.
Make sense, thanks for clarifying.
If I wish to train this in tandem with TorchANI or some other NNP, I could just add a simple linear layer to the end of the output Tensor of this PME class to weight the energies and back-propagate an error against some ab initio ground truth energy, am I right?
Also, how should the compute_direct()
and compute_reciprocal()
work for something like TorchANI in NNPOps? Would it be something like TorchANI
energy + energy from compute_reciprocal()
- energy from compute_direct()
, with the TorchANI cutoff set to match the compute_direct()
cutoff?
It's really up to you. Think of it as a tool to use in building your model. You can use it however you want. A straightforward implementation would be ani_energy + compute_direct() + compute_reciprocal()
. In that case the ANI model tries to learn the difference between the total energy and the PME energy.
Ok this makes sense as well and I'll probably give a shot to both:
ani_energy + compute_direct() + compute_reciprocal()
ani_energy - w1 × compute_direct() + w2 × compute_reciprocal()
, where w1 and w2 could be simple linear layers of weights to adjust the PME energies.Thank you for the input!
Why are you subtracting the direct space energy? The division between direct and reciprocal space is arbitrary, just done for computational convenience.
Oh I thought the direct space energy is going to be computed by TorchANI (or some other NNP) so my assumption was subtracting it from the sum would prevent double counting.
The ANI model will learn whatever you train it to learn. You can't just take an existing ANI model and add another term to it. That wouldn't be realistic at all. You're creating a new model that will need to be trained.
The idea I had in mind looks something like this. And yes I do plan to train a new ANI model from scratch in this manner.
Dear openmm-ml developers,
I am trying to use this tool to run a Hybrid MM/ML simulation of a single protein chain in solvent with the protein atoms modelled by
torchani
andNNPOps
, and the solvent withtip3pfb
, but the simulation seems to be giving errors even after running energy minimisation. I have attached my code and the input file for the protein PDB which also has been energy minimised in Schrodinger.Hoping to get some help on this and possibly figure out if the issue is due to the PDB itself or a bug in my code...
4KD1_protein_only.zip