Unit problem in the toolkit showcase notebook?

openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io

http://openforcefield.org

MIT License

311 stars 91 forks source link

Unit problem in the toolkit showcase notebook? #1446

Closed ziyuanzhao2000 closed 1 year ago

ziyuanzhao2000 commented 1 year ago

In the "Minimize the combined system" step of the notebook, there's the code as follows:

simulation.minimizeEnergy(
    tolerance=openmm_unit.Quantity(value=50.0, unit=openmm_unit.kilojoule_per_mole)
)
minimized_state = simulation.context.getState(
    getPositions=True, getEnergy=True, getForces=True
)

However, as I understand, the OpenMM code is supposed to accept the tolerance in the unit of kJ/mol/nm, which has the unit of force rather than energy as currently it stands. Curiously, the code cell does run without problem, and I had no problem running other systems using the wrong unit. Shall we at least fix the notebook to use the correct unit, even if it might be an OpenMM but not OpenFF's problem with unit checking or implicit unit conversion?

ziyuanzhao2000 commented 1 year ago

By the way, I wish to ask a question related to energy minimization using OpenMM on the system produced from OpenFF's Interchange object. As of now it seems relatively slow, which is mentioned in another issue about the showcase notebook in September. I'm currently writing a tool based upon OpenFF's API for modifying chemical structures (will work for proteins, for sure). A function I've implemented is to procedurally grow functional groups on the existing structure. I hope to do a local energy minimization after every such modification, but as of now using OpenMM seems pretty slow relative to, say, similar minimization functionality provided by softwares like pymol and MOE. For example, for a modified dipeptide, I need 90s to minimize the system... Please kindly advise how I can make it faster, and maybe if we can discuss this in a separate issue. Thanks!

Yoshanuikabundi commented 1 year ago

Good catch, thanks! Looks like this was previously incorrectly documented by OpenMM. We should definitely fix this - I can do it, or if you'd like to we'd welcome a PR!

I'm not aware of anything that would cause an OpenMM System produced by our tooling to be slower than any other System, but that doesn't mean it doesn't exist! If you have an example of a system that is slower to minimize with a force field produced by Interchange than by one of the default OpenMM force fields, I'm sure @mattwthompson would love you to raise an issue on the Interchange issue tracker.

If our force fields are as fast as any other OpenMM force field and the issue is comparing OpenMM to other software, the most likely reason is that OpenMM by default minimizes a system to machine precision, whereas the other software may be less strict. I'd try setting a higher value for the tolerance parameter you asked about above - that's how we solved a similar issue. If you've done that, maybe OpenMM isn't detecting or working with your GPU?

ziyuanzhao2000 commented 1 year ago

Thanks for the quick response! I just submitted a PR with that small fix of typo.

It's good to know that OpenFF shouldn't produce an OpenMM System slower than using OpenMM alone, based on your experience. I did try to increase the tolerance, but then I got some less ideal (by visual inspection) optimized structure so I'm not sure if that's the solution. I was also trying to understand OpenFF's Sage forcefield and compare that to Amber ff14sb that I had experience using in OpenMM and other common options like MMFF94 to see if the reason might be with the forcefield used by the different softwares. Anyway, I will open a new issue at the proper place if I find anything definitive that explains the relative speed of energy minimization...

mattwthompson commented 1 year ago

Energy minimization is generally not something that MM engines strive to be absolutely perfect about; the goal is to remove high-energy steric clashes and get to a reasonable enough starting configuration that the system doesn't immediately crash during MD, something that does not require getting close to the global minimum.

I was also trying to understand OpenFF's Sage forcefield and compare that to Amber ff14sb

Do keep in mind that Sage is a small molecule force field by design, having been trained on QM-optimized structures of drug-like molecules, physical properties of condensed-phase systems of similar small molecules, etc. It'll probably give you parameters for peptides and other biopolymers, but these are not designed at all for use in macromolecules so their performance will probably be quite poor for things like backbone structure. Rosemary will include protein-specific torsions and we hope it will perform as well as or better than the existing protein force fields.

ziyuanzhao2000 commented 1 year ago

Thanks for the reminder. I'm certainly looking forward to the later release of Rosemary as the definitive protein force field.

Energy minimization is generally not something that MM engines strive to be absolutely perfect about

This I understand, but I wonder why other chemical toolboxes, say, openbabel, that also supports energy minimization can be way faster than OpenMM's minimizer? If I use openbabel, I almost get a "minimized" configuration that is good enough immediately. I've yet to understand the difference in the computation performed by the different tools. Sorry about nagging on, but I hope you may provide some insight about this. @mattwthompson

j-wags commented 1 year ago

the most likely reason is that OpenMM by default minimizes a system to machine precision, whereas the other software may be less strict

Another contender may be that other minimizers operate in different coordinate spaces. For "long" molecules starting from a naively-generated conformer, it's likely that lots of torsions need to be rotated to reach a local energy minimum. But I believe that OpenMM uses a cartesian minimizer, which would need a lot more steps to perform a torsion rotation. I'd imagine that the minimizer in a tool like MOE that's more purpose-built for drug discovery/biomolecules/polymers might operate natively in dihedral coordinates.

For example, for a modified dipeptide, I need 90s to minimize the system...

Are you including running create_openmm_system or create_interchange in these cases? Each time those are run, the OpenFF Toolkit will need to run an AM1 optimization, which could take tens of seconds for a dipeptide. I could imagine the timing here being dominated by that call, whereas other tools like openbabel use faster and more approximate charge methods.

Unfortunately the use of AM1 charges is defined as part of the Sage force field, so AM1 opts will need to be run at least once for each unique chemical species in your workflow. If you're running the same molecule repeatedly, consider making the Interchange or OpenMM System object just once (since this is the expensive step), then saving it, and loading it whenever you need it in the future.

ziyuanzhao2000 commented 1 year ago

Hi @j-wags , thanks for your additional insight. Unfortunately, this single call is taking me that much time

%%time
simulation.minimizeEnergy(
    tolerance=openmm_unit.Quantity(value=1e2, unit=openmm_unit.kilojoules_per_mole / openmm_unit.nanometer),
)

I compared to using AM1-BCC + GAFF for on the fly parametrization of my molecule, as implemented by openmmforcefield and that seems faster. I will prepare a minimal example and post that to the Interchange issue tracker for further discussion, as I'm not sure if it's an OpenFF problem at this point.

j-wags commented 1 year ago

Thanks, @ziyuanzhao2000 - The reproducing example would be particularly useful here!

j-wags commented 1 year ago

I think this has been closed by #1447, and we can debug the slow runtime in follow-up issues.