openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
301 stars 88 forks source link

Fastest way of getting interchange['Electrostatics'] values? #1853

Closed David-Araripe closed 3 months ago

David-Araripe commented 3 months ago

I'm using openff 2.1.0 to parametrize some ligands for molecular dynamics simulations. For this, I also need the Electrostatics of the interchange object. However, I noticed that the parametrization takes quite a long time to run.

Is there a more straightforward (and faster) way to get these values? I'm not going to use the Interchange object later on, so I would just like to extract the Electrostatics values that are returned. I also tried doing this with multiple threads, but there was an error accessing other tools in the toolkit required for calculating the electrostatics.

Here's how I'm currently doing it:

from openff.toolkit import ForceField, Topology
from openff.interchange import Interchange
from openff.interchange.smirnoff._create import _electrostatics

forcefield = ForceField('openff-2.1.0.offxml')
mol = Molecule.from_file("ligand.sdf")  # where ligand.sdf contains only one structure
topology = Topology.from_molecules(mol)
parameters = forcefield.label_molecules(topology)[0]

interchange = Interchange()
_topology = Interchange.validate_topology(topology)
_electrostatics(
    interchange,
    forcefield,
    _topology,
    None,
    None,
)
partial_charges = interchange['Electrostatics'].charges.values()

Do you have any feedback on how could I do this more efficiently?

mattwthompson commented 3 months ago

You're taking more or less the most direct API path to the charges[^1] but probably running into the fundamental issue of assigning charges with AM1-BCC, especially if your ligand(s) is large^2. In all likelihood, the time it takes to assign vdW and all valence parameters is less than the time it takes to assign partial charges. So it's not you - it's AM1!

If you're using only free and open-source tools, AmberTool's sqm is assigning the partial charges. OpenEye's commercial tools can do the same thing[^3] but faster, and they offer free licenses for academics. (If OpenEye Toolkits are installed and properly licensed, ELF10 will be used in these sorts of code paths).

In the future, directly running AM1-BCC may be replaced by a lightning-fast graph neutral net from NAGL, but that's not released yet.

Until then, you could just use the Molecule API to assign partial charges directly, and save those out to a file:

molecule.assign_partial_charges(partial_charge_method="am1bcc")  # or "am1bccelf10" if you have OpenEye licesed
molecule.to_file('ligand_with_charges.sdf', file_format='sdf')

Even though it's hidden away, this is basically what Interchange ultimately does when using the high-level calls like ForceField.create_openmm_system.

[^1]: There is also ForceField.get_partial_charges, which parameterizes the entire force field and just returns the partial charges in one go. Looking at your code makes me realize it could skip the other handlers and just go straight for the electrostatics ...

[^3]: Okay, it's actually a slightly different variant of AM1-BCC, but it's similar and people have argued it's better. You can read more here: https://openforcefield.github.io/standards/standards/smirnoff/#toolkitam1bcc-temporary-support-for-toolkit-based-am1-bcc-partial-charges

David-Araripe commented 3 months ago

Thank you very much for the very detailed answer! I tried approach 1) you mentioned;

There is also ForceField.get_partial_charges, which parameterizes the entire force field and just returns the partial charges in one go. Looking at your code makes me realize it could skip the other handlers and just go straight for the electrostatics ...

and I noticed a significant speed-up. That is what I was looking for! Thanks! I'm closing this issue since your answer already solved it.

mattwthompson commented 3 months ago

Happy to hear this is working, though I admit I'm a little surprised. There shouldn't be a functional difference in timings - if anything, ForceField.get_partial_charges should be (edit) slower!

Are you able to share examples of ligand(s) you're working with, or maybe some general features (size, chemistry, whether it's a bunch of rings or not, etc.)? It's fine if you can't, I can run comparisons on some public datasets.

David-Araripe commented 3 months ago

Hi @mattwthompson, I will make a more robust test here by timing how much each of the approaches takes to get the ligand's charges. I will come back to you hopefully by the end of the day (CET) here!

Is there any caching being done in the background that I should be aware of while testing this?

mattwthompson commented 3 months ago

There's a little bit of caching. Interchange caches charges based on the parameters of this function: https://github.com/openforcefield/openff-interchange/blob/v0.3.25/openff/interchange/smirnoff/_nonbonded.py#L464-L476

Which is to say that presenting the same Molecule object with the same mappes SMILES and the same partial charge method in the same Python process, the result will be looked up from a cache instead of re-computed. Since ForceField.get_partial_charges calls Interchange, so that may explain the speedup you saw if you were calling it from the same interpreter. (Note I had a typo above, I'd expect this to be slower than your original approach on the first call, though I haven't tried it yet).

There's no caching if using (only) the Molecule API.

David-Araripe commented 2 months ago

Hi @mattwthompson thanks for your patience and support. I ran a comparison, this time on two different Python processes to make sure I don't have caching. Here are the results, so you were right about assign_partial_charges being a slower approach than calling directly _electrostatics.

image

Please let me know if you're also interested in which molecular structures I used for this. I ran both of them locally (but one after the other) while I was using the machine so it might not be a 100% accurate experimental, but I think it can give a good idea

mattwthompson commented 2 months ago

This looks really cool! Are the blue points on the right from Molecule.assign_partial_charges or ForceField.get_partial_charges?

David-Araripe commented 2 months ago

Oh, I didn't realize they would be different on the back-end, otherwise I would've included it in the test. But those points were from ForceField.get_partial_charges

mattwthompson commented 2 months ago

Okay, great. Molecule.assign_partial_charges should be the quickest of them all (if marginal) so I wanted to be sure.

I suppose we can attribute the earlier result to something like run-to-run variance or (waves hands) magic happening under the hood. There's some magic that happens with sqm that causes slight differences at runtime (#1842 being a crazy and recent example) which are likely minor but a hassle to understand deeply.

Thanks for satisfying my curiosity on this - I have nothing more to ask of you, but please reach out if you run into any future questions about using OpenFF with your work!