openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
302 stars 88 forks source link

assigning partial charges fails if conformer generation fails #1741

Open richardjgowers opened 9 months ago

richardjgowers commented 9 months ago

Description

I've got reports of multiple simulations failing when conformer generation fails as part of the partial charge assignment.

Reproduction

I think the actual error happens in SmirnoffTemplateGenerator, but I think this is now running through Interchange(?)

ff = ForceField('openff-2.0.0.offxml')
cubic_box = unit.Quantity(30 * np.eye(3), unit.angstrom)

rdk_mol = Chem.MolFromSmiles('CC')
off_mol = toolkit.topology.Molecule(rdk_mol)

with mock.patch('rdkit.Chem.AllChem.EmbedMultipleConfs', return_value=0) as m:
    i = interchange.Interchange.from_smirnoff(topology=[off_mol], force_field=ff, box=cubic_box)

Output

Please include the output, including full tracebacks of any errors, resulting from the reproduction.

ValueError: No registered toolkits can provide the capability "assign_partial_charges" for args "()" and kwargs "{'molecule': Molecule with name xx and SMILES xx

'partial_charge_method': 'am1bcc', 'use_conformers': None, 'strict_n_conformers': False, 'normalize_partial_charges': True, '_cls': <class 'openff.toolkit.topology.molecule.Molecule'>}"

Available toolkits are: [ToolkitWrapper around The RDKit version 2023.03.2, ToolkitWrapper around AmberTools version 22.0, ToolkitWrapper around Built-in Toolkit version None]

ToolkitWrapper around The RDKit version 2023.03.2 <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : partial_charge_method 'am1bcc' is not available from RDKitToolkitWrapper. Available charge methods are ['gasteiger', 'mmff94']

 ToolkitWrapper around AmberTools version 22.0 <class 'openff.toolkit.utils.exceptions.ConformerGenerationError'> : RDKit conformer generation failed.

ToolkitWrapper around Built-in Toolkit version None <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : Partial charge method "am1bcc"" is not supported by the Built-in toolkit. Available charge methods are ['zeros', 'formal_charge']

The molecule(s) already has a conformer, so failing on partial charge generator because a conformer couldn't be generated is a little frustrating. Would it be possible to add some fallback option to use the molecule's existing conformer to assign partial charges?

mattwthompson commented 9 months ago

The toolkit lies at the center of the Russian doll; the layers radiating outward are Interchange, openmmforcefields, and then whatever calls SmirnoffTemplateGenerator. The most straightforward reproduction would be something like this

from unittest import mock

from openff.toolkit import Molecule

molecule = Molecule.from_smiles("CC")

with mock.patch('rdkit.Chem.AllChem.EmbedMultipleConfs', return_value=0) as m:
    molecule.assign_partial_charges(partial_charge_method="am1bcc")

As I understand it, the toolkit throwing away conformers is intentional. This happens here, just before calling sqm. There is the backdoor of use_conformers if interacting with the toolkit API directly; this works locally:

molecule = Molecule.from_smiles("CC")
molecule.generate_conformers()

with mock.patch('rdkit.Chem.AllChem.EmbedMultipleConfs', return_value=0) as m:
    try:
        molecule.assign_partial_charges(partial_charge_method="am1bcc")
    except ValueError:
        molecule.assign_partial_charges(partial_charge_method="am1bcc", use_conformers=molecule.conformers)

assert molecule.partial_charges is not None

I agree it's a surprising error. I'd be more than a little frustrating to see a conformer generation failure crash a pipeline when my molecule(s) already have conformers. Until just now, I haven't pieced together that this backdoor could be made into default behavior. I don't read the SMIRNOFF specification to have any guidance on what should happen here:

This tag calculates partial charges using the default settings of the highest-priority cheminformatics toolkit that can perform AM1-BCC charge assignment. Currently, if the OpenEye toolkit is licensed and available, this will use QuacPac configured to generate charges using AM1-BCC ELF10 for each unique molecule in the topology. Otherwise RDKit will be used for initial conformer generation and the AmberTools antechamber/sqm software will be used for charge calculation.

If this tag is specified for a force field, conformer generation will be performed regardless of whether conformations of the input molecule were provided. If RDKit/AmberTools are used as the toolkit backend for this calculation, only the first conformer is used for AM1-BCC calculation.

It's easy to see a strong case for falling back to user-generated conformers by default. This would be a major behavior change ... for what I believe to be a tiny subset of inputs. The implementation should be straightforward[^1]. I'll need to think about this a little more before committing to it ... I really wish the standard didn't force me to make these decisions. In the mean time, I'll give @j-wags the opportunity to provide a hard veto.

[^1]: Provided a molecule that reliably fails conformer generation but doesn't have other insane things going on - IME this has been hard for me to come by.

j-wags commented 9 months ago

Yeah, I agree with Matt's response. The root of this is in the Toolkit so I'll move it over to that tracker.

The "making a new conformer for charge assignment" behavior is a deliberate feature. It keeps us closer to our goal of assigning parameters based only on the molecular graph. This philosophy isn't perfectly implemented - OE and RDKit+AmberTools are different implementations and so their conformer generation and charge assignment can vary - but it should at least give the same charges if a user uses the same molecular graph twice in one day. Which wouldn't be the case if we used the input geometry.

I'd prefer not to silently fall back to user conformers here, and I think that making the users specify use_conformers for the charge assignment is an appropriate acknowledgement that the force field is being applied in a nonstandard way.

I'm surprised that there's a molecule suitable for AM1BCC that RDKit can't make a conformer for, and I wonder if this is really a sanitization issue in disguise. Could you share the input that causes this error?

richardjgowers commented 9 months ago

@j-wags can't share the input, it's got a propellane so I imagine that might trip up conformer generation. In general I think conformer generation is good but never foolproof.

Tracing my error backwards, I'm using a SMIRNOFFTemplateGenerator here:

https://github.com/openmm/openmmforcefields/blob/main/openmmforcefields/generators/template_generators.py#L1395

Which calls through to ForceField here:

https://github.com/openforcefield/openff-toolkit/blob/main/openff/toolkit/typing/engines/smirnoff/forcefield.py#L1127

Which is finally hitting interchange here:

https://github.com/openforcefield/openff-toolkit/blob/main/openff/toolkit/typing/engines/smirnoff/forcefield.py#L1223

Which then later goes back through toolkit's amber/rdkit interfaces. tl;dr this is sounding like a 3 repository fix :/

From this, there's not really a way for me to specify use_conformers as I'm not directly using the toolkit here. Is the final answer to generate partial charges manually (potentially falling back to my existing conformer) and pass these through as "user charges" to this call chain? Works for me, but I also know this is affecting Flare for the same systems and who knows who else.

mattwthompson commented 8 months ago

Either of these would be one-package fixes for the whole thing, though neither are permitted so long as the OpenFF stack remains a reference implementation of SMIRNOFF. (Of course, the spec does not come from Mount Sinai and it can be changed.)

diff --git a/openff/interchange/smirnoff/_nonbonded.py b/openff/interchange/smirnoff/_nonbonded.py
index 2309b66c..d821c85e 100644
--- a/openff/interchange/smirnoff/_nonbonded.py
+++ b/openff/interchange/smirnoff/_nonbonded.py
@@ -618,6 +618,8 @@ class SMIRNOFFElectrostaticsCollection(ElectrostaticsCollection, SMIRNOFFCollect
         dict[PotentialKey, Potential],
     ]:
         """Construct a slot and potential map for a charge model based parameter handler."""
+        from openff.toolkit.utils.exceptions import ConformerGenerationError
+
         unique_molecule = copy.deepcopy(unique_molecule)
         reference_smiles = unique_molecule.to_smiles(
             isomeric=True,
@@ -644,10 +646,18 @@ class SMIRNOFFElectrostaticsCollection(ElectrostaticsCollection, SMIRNOFFCollect
                 "ToolkitAM1BCCHandler or ChargeIncrementModelHandler are expected.",
             )

-        partial_charges = cls._compute_partial_charges(
-            unique_molecule,
-            method=partial_charge_method,
-        )
+        try:
+            partial_charges = cls._compute_partial_charges(
+                unique_molecule,
+                method=partial_charge_method,
+            )
+        except ConformerGenerationError as error:
+            partial_charges = cls._compute_partial_charges(
+                unique_molecule,
+                method=partial_charge_method,
+                use_conformers=unique_molecule.conformers,
+            )
+

         matches = {}
         potentials = {}
diff --git a/openff/toolkit/utils/ambertools_wrapper.py b/openff/toolkit/utils/ambertools_wrapper.py
index 2a3ee1e5..d0bdacd0 100644
--- a/openff/toolkit/utils/ambertools_wrapper.py
+++ b/openff/toolkit/utils/ambertools_wrapper.py
@@ -177,13 +177,19 @@ class AmberToolsToolkitWrapper(base_wrapper.ToolkitWrapper):
             if charge_method["rec_confs"] == 0:
                 mol_copy._conformers = None
             else:
-                mol_copy.generate_conformers(
-                    n_conformers=charge_method["rec_confs"],
-                    rms_cutoff=0.25 * unit.angstrom,
-                    toolkit_registry=rdkit_wrapper.RDKitToolkitWrapper(),
-                )
-            # TODO: What's a "best practice" RMS cutoff to use here?
-        else:
+                try:
+                    mol_copy.generate_conformers(
+                        n_conformers=charge_method["rec_confs"],
+                        rms_cutoff=0.25 * unit.angstrom,
+                        toolkit_registry=rdkit_wrapper.RDKitToolkitWrapper(),
+                    )
+                except ConformerGenerationError as error:
+                    if mol_copy.conformers is None:
+                        raise error
+                    else:
+                        use_conformers = mol_copy.conformers
+        
+        if use_conformers is not None:
             mol_copy._conformers = None
             for conformer in use_conformers:
                 mol_copy._add_conformer(conformer)

Downstream, things are already quite funky - apparently any molecule that already has partial charges does not have partial charges assigned with ToolkitAM1BCCHandler and, by extension, does not need conformer generation to succeed while assigning partial charges. This behavior unambiguously contradicts the SMIRNOFF spec.

j-wags commented 8 months ago

Is the final answer to generate partial charges manually (potentially falling back to my existing conformer) and pass these through as "user charges" to this call chain?

Yes, I think this is the right thing to do in the short run. Our API when accessed directly provides workarounds for this, and it looks like OpenMMForceFields also allows externally-derived charges in.

One thing we could do in the medium term is add more fallbacks to our conf gen calls to RDKitToolkitWrapper. This doesn't dent our "reproducibility" goals (to the extent that we were ever achieving them) because molecules that succeed in conf gen will still get the same results. But if you can provide a setting for any conformer generator in RDKit that works for the mystery molecules, we can add that as a third fallback.

mark-mackey-cresset commented 7 months ago

If people are looking for inputs that trigger this bug, then https://github.com/openforcefield/openff-toolkit/issues/1560 provides an example.