Closed mattwthompson closed 1 week ago
+1 but I think we should make a clear decision about how to recommend users handle organic salts - is the metal ion its own molecule or should it be bonded to the anion?
This is a good idea but it will
I'm leaning toward "do it anyway", but maybe we start with emitting a warning for a while so that folks who will be affected can know to downpin/work with us on migrating their use cases.
break user workflows that rely on multi-molecule Molecules
I'd argue these should be broken, since the class does not appear to be designed to handle multi-molecule structures, isn't tested for this use case and the various permutations (here or downstream!), and this causes many sharp edges that are easy to demonstrate, i.e.
In [1]: from openff.toolkit import Molecule, ForceField
In [2]: Molecule.from_smiles("C.C").to_topology().n_molecules
Out[2]: 1
In [3]: ForceField("openff-2.2.0.offxml").create_interchange(
...: Molecule.from_smiles("C.C").to_topology()
...: )
/Users/mattthompson/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/toolkit/typing/engines/smirnoff/forcefield.py:101: SyntaxWarning: invalid escape sequence '\*'
"""
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], line 1
----> 1 ForceField("openff-2.2.0.offxml").create_interchange(
2 Molecule.from_smiles("C.C").to_topology()
3 )
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/utilities/utilities.py:80, in requires_package.<locals>.inner_decorator.<locals>.wrapper(*args, **kwargs)
77 except Exception as e:
78 raise e
---> 80 return function(*args, **kwargs)
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/toolkit/typing/engines/smirnoff/forcefield.py:1252, in ForceField.create_interchange(self, topology, toolkit_registry, charge_from_molecules, partial_bond_orders_from_molecules, allow_nonintegral_charges)
1249 used_registry = GLOBAL_TOOLKIT_REGISTRY
1251 with toolkit_registry_manager(used_registry):
-> 1252 return Interchange.from_smirnoff(
1253 force_field=self,
1254 topology=topology,
1255 charge_from_molecules=charge_from_molecules,
1256 partial_bond_orders_from_molecules=partial_bond_orders_from_molecules,
1257 allow_nonintegral_charges=allow_nonintegral_charges,
1258 )
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/interchange/components/interchange.py:268, in Interchange.from_smirnoff(cls, force_field, topology, box, positions, charge_from_molecules, partial_bond_orders_from_molecules, allow_nonintegral_charges)
219 """
220 Create a new object by parameterizing a topology with a SMIRNOFF force field.
221
(...)
264
265 """
266 from openff.interchange.smirnoff._create import _create_interchange
--> 268 return _create_interchange(
269 force_field=force_field,
270 topology=topology,
271 box=box,
272 positions=positions,
273 charge_from_molecules=charge_from_molecules,
274 partial_bond_orders_from_molecules=partial_bond_orders_from_molecules,
275 allow_nonintegral_charges=allow_nonintegral_charges,
276 )
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/interchange/smirnoff/_create.py:112, in _create_interchange(force_field, topology, box, positions, charge_from_molecules, partial_bond_orders_from_molecules, allow_nonintegral_charges)
109 _impropers(interchange, force_field, _topology)
111 _vdw(interchange, force_field, _topology)
--> 112 _electrostatics(
113 interchange,
114 force_field,
115 _topology,
116 charge_from_molecules,
117 allow_nonintegral_charges,
118 )
119 _plugins(interchange, force_field, _topology)
121 _virtual_sites(interchange, force_field, _topology)
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/interchange/smirnoff/_create.py:267, in _electrostatics(interchange, force_field, topology, charge_from_molecules, allow_nonintegral_charges)
262 else:
263 return
265 interchange.collections.update(
266 {
--> 267 "Electrostatics": SMIRNOFFElectrostaticsCollection.create(
268 parameter_handler=[
269 handler
270 for handler in [
271 force_field._parameter_handlers.get(name, None)
272 for name in [
273 "Electrostatics",
274 "ChargeIncrementModel",
275 "ToolkitAM1BCC",
276 "LibraryCharges",
277 ]
278 ]
279 if handler is not None
280 ],
281 topology=topology,
282 charge_from_molecules=charge_from_molecules,
283 allow_nonintegral_charges=allow_nonintegral_charges,
284 ),
285 },
286 )
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/interchange/smirnoff/_nonbonded.py:452, in SMIRNOFFElectrostaticsCollection.create(cls, parameter_handler, topology, charge_from_molecules, allow_nonintegral_charges)
436 toolkit_handler_with_metadata = [
437 p for p in parameter_handlers if type(p) is ElectrostaticsHandler
438 ][0]
440 handler = cls(
441 type=toolkit_handler_with_metadata.TAGNAME,
442 scale_12=toolkit_handler_with_metadata.scale12,
(...)
449 exception_potential=toolkit_handler_with_metadata.exception_potential,
450 )
--> 452 handler.store_matches(
453 parameter_handlers,
454 topology,
455 charge_from_molecules=charge_from_molecules,
456 allow_nonintegral_charges=allow_nonintegral_charges,
457 )
458 handler._charges = dict()
460 return handler
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/interchange/smirnoff/_nonbonded.py:863, in SMIRNOFFElectrostaticsCollection.store_matches(self, parameter_handler, topology, charge_from_molecules, allow_nonintegral_charges)
858 # TODO: Here is where the toolkit calls self.check_charges_assigned(). Do we skip this
859 # entirely given that we are not accepting `charge_from_molecules`?
861 if not flag:
862 # TODO: Rename this method to something like `_find_matches`
--> 863 matches, potentials = self._find_reference_matches(
864 parameter_handlers,
865 unique_molecule,
866 )
868 self.potentials.update(potentials)
870 for unique_molecule_atom in unique_molecule.atoms:
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/interchange/smirnoff/_nonbonded.py:721, in SMIRNOFFElectrostaticsCollection._find_reference_matches(cls, parameter_handlers, unique_molecule)
711 slot_matches, slot_potentials = cls._find_slot_matches(
712 parameter_handler,
713 unique_molecule,
714 )
716 if handler_type in ["ToolkitAM1BCC", "ChargeIncrementModel"]:
717 (
718 partial_charge_method,
719 am1_matches,
720 am1_potentials,
--> 721 ) = cls._find_charge_model_matches(
722 parameter_handler,
723 unique_molecule,
724 )
726 if slot_matches is None and am1_matches is None:
727 raise NotImplementedError()
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/interchange/smirnoff/_nonbonded.py:657, in SMIRNOFFElectrostaticsCollection._find_charge_model_matches(cls, parameter_handler, unique_molecule)
651 else:
652 raise InvalidParameterHandlerError(
653 f"Encountered unknown handler of type {type(parameter_handler)} where only "
654 "ToolkitAM1BCCHandler or ChargeIncrementModelHandler are expected.",
655 )
--> 657 partial_charges = cls._compute_partial_charges(
658 unique_molecule,
659 unique_molecule.to_smiles(
660 isomeric=True,
661 explicit_hydrogens=True,
662 mapped=True,
663 ),
664 method=partial_charge_method,
665 )
667 matches = {}
668 potentials = {}
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/interchange/smirnoff/_nonbonded.py:472, in SMIRNOFFElectrostaticsCollection._compute_partial_charges(cls, molecule, mapped_smiles, method)
470 """Call out to the toolkit's toolkit wrappers to generate partial charges."""
471 molecule = copy.deepcopy(molecule)
--> 472 molecule.assign_partial_charges(method)
474 return molecule.partial_charges
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/toolkit/topology/molecule.py:2671, in FrozenMolecule.assign_partial_charges(self, partial_charge_method, strict_n_conformers, use_conformers, toolkit_registry, normalize_partial_charges)
2657 warnings.warn(
2658 f"Warning! Partial charge method '{partial_charge_method}' is not designed "
2659 "for use on large (i.e. > 150 atoms) molecules and may crash or take hours to "
(...)
2663 stacklevel=2,
2664 )
2666 if isinstance(toolkit_registry, ToolkitRegistry):
2667 # We may need to try several toolkitwrappers to find one
2668 # that supports the desired partial charge method, so we
2669 # tell the ToolkitRegistry to continue trying ToolkitWrappers
2670 # if one raises an error (raise_exception_types=[])
-> 2671 toolkit_registry.call(
2672 "assign_partial_charges",
2673 molecule=self,
2674 partial_charge_method=partial_charge_method,
2675 use_conformers=use_conformers,
2676 strict_n_conformers=strict_n_conformers,
2677 normalize_partial_charges=normalize_partial_charges,
2678 raise_exception_types=[],
2679 _cls=self.__class__,
2680 )
2681 elif isinstance(toolkit_registry, ToolkitWrapper):
2682 toolkit_wrapper: ToolkitWrapper = toolkit_registry
File ~/micromamba/envs/yammbs-dev/lib/python3.12/site-packages/openff/toolkit/utils/toolkit_registry.py:384, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
382 for toolkit, error in errors:
383 msg += f" {toolkit} {type(error)} : {error}\n"
--> 384 raise ValueError(msg)
ValueError: No registered toolkits can provide the capability "assign_partial_charges" for args "()" and kwargs "{'molecule': Molecule with name '' and SMILES '[H]C([H])([H])[H].[H]C([H])([H])[H]', 'partial_charge_method': 'am1bccelf10', 'use_conformers': None, 'strict_n_conformers': False, 'normalize_partial_charges': True, '_cls': <class 'openff.toolkit.topology.molecule.Molecule'>}"
Available toolkits are: [ToolkitWrapper around OpenEye Toolkit version 2024.1.1, ToolkitWrapper around The RDKit version 2024.03.5, ToolkitWrapper around AmberTools version 22.0, ToolkitWrapper around Built-in Toolkit version None]
ToolkitWrapper around OpenEye Toolkit version 2024.1.1 <class 'openff.toolkit.utils.exceptions.InvalidConformerError'> : molecule.add_conformer given input of the wrong shape: Given (5, 3), expected (10, 3)
ToolkitWrapper around The RDKit version 2024.03.5 <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : partial_charge_method 'am1bccelf10' is not available from RDKitToolkitWrapper. Available charge methods are {'mmff94': {}, 'gasteiger': {}}
ToolkitWrapper around AmberTools version 22.0 <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : partial_charge_method 'am1bccelf10' is not available from AmberToolsToolkitWrapper. Available charge methods are {'am1bcc': {'antechamber_keyword': 'bcc', 'min_confs': 1, 'max_confs': 1, 'rec_confs': 1}, 'am1-mulliken': {'antechamber_keyword': 'mul', 'min_confs': 1, 'max_confs': 1, 'rec_confs': 1}, 'gasteiger': {'antechamber_keyword': 'gas', 'min_confs': 0, 'max_confs': 0, 'rec_confs': 0}}
ToolkitWrapper around Built-in Toolkit version None <class 'openff.toolkit.utils.exceptions.ChargeMethodUnavailableError'> : Partial charge method "am1bccelf10"" is not supported by the Built-in toolkit. Available charge methods are {'zeros': {'rec_confs': 0, 'min_confs': 0, 'max_confs': 0}, 'formal_charge': {'rec_confs': 0, 'min_confs': 0, 'max_confs': 0}}
Is the right/simplest thing when passed something that is multimolecule to recognize that on read in, and split it into two molecules, so that it returns a list of molecules, rather a single molecule?
+1 but I think we should make a clear decision about how to recommend users handle organic salts - is the metal ion its own molecule or should it be bonded to the anion?
I'm pretty sure it should be it's own molecule? That makes things much simpler overall.
Is the right/simplest thing when passed something that is multimolecule to recognize that on read in, and split it into two molecules, so that it returns a list of molecules, rather a single molecule?
This is what happens with Molecule.from_file
when it's a list of SMILES, so there's precedent for Molecule.from_x
returning a list of molecules rather than a single one.
I'm leaning toward "do it anyway", but maybe we start with emitting a warning for a while so that folks who will be affected can know to downpin/work with us on migrating their use cases.
Warnings either way would be handy -- if the ultimate decision winds up being "do nothing" because it would break too many workflows, a warning that multimolecule Molecules are not fully supported would still be helpful.
I think warnings are the right way to go here. Continuing to do nothing gives us a weak object model where developers who try to build off of our core objects get burned by unforeseen possibilities (like what happened here). But stopping support for this altogether will blow up SPICE and probably trip up people trying to do any early work with metals.
(Molecule.from_file
DOES have the behavior of sometimes returning a Molecule and other times returning a list of Molecules and I consider that to be a big mistake - From what I've heard lots of people have been burned by that and nobody has told me they appreciate the behavior or thought it was a good design decision)
Metal representations will be a whole additional question to solve (SMARTS-based parameter assignment will depend heavily on how that's handled), so let's cross that bridge once we get to it.
Is your feature request related to a problem? Please describe.
The
Molecule
class is intended to represent a molecule - defined as a single connected graph of atoms - but has little or no handling for inputs that contain multiple molecules.This gives me the impression that I can safely feed the toolkit an arbitrary PDB file, which I didn't think was the case?
This also leads to dangerous behavior when feeding things through to other parts of the API that may or may not handle the case of multi-molecule
Molecule
s, i.e.Describe the solution you'd like
Raise an exception when parsing inputs containing multiple disconnected molecular graphs.
Describe alternatives you've considered
An alternative involves adding a vast number of tests involving passing a multi-molecule
Molecule
. I don't really want to do that.Additional context
I raised a similar suggestion years ago, but didn't get much traction.