openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
302 stars 88 forks source link

`Topology.__init__` cannot handle `_SimpleMolecules`s #1783

Closed mattwthompson closed 5 months ago

mattwthompson commented 7 months ago

Describe the bug

Topology uses some hard-coded calls to Molecule in places that prevent it from re-initializing a copy of itself when _SimpleMolecules are present. This state is not forbidden, but fails when Topology.__init__ is called, which includes some important points.

To Reproduce

In [1]: from openff.toolkit.topology._mm_molecule import _SimpleMolecule

In [2]: from openff.toolkit import Molecule, Topology

In [3]: topology = Topology.from_molecules(
   ...:     [_SimpleMolecule.from_molecule(Molecule.from_smiles("CCO"))]
   ...: )

In [4]: topology, [*topology.molecules], [*topology.atoms]
Out[4]:
(<openff.toolkit.topology.topology.Topology at 0x12fcf2af0>,
 [<openff.toolkit.topology._mm_molecule._SimpleMolecule at 0x12ff84fa0>],
 [<openff.toolkit.topology._mm_molecule._SimpleAtom at 0x10591faf0>,
  <openff.toolkit.topology._mm_molecule._SimpleAtom at 0x1058ced60>,
  <openff.toolkit.topology._mm_molecule._SimpleAtom at 0x1058ced90>,
  <openff.toolkit.topology._mm_molecule._SimpleAtom at 0x105932100>,
  <openff.toolkit.topology._mm_molecule._SimpleAtom at 0x12ff949d0>,
  <openff.toolkit.topology._mm_molecule._SimpleAtom at 0x12ff948e0>,
  <openff.toolkit.topology._mm_molecule._SimpleAtom at 0x12ff79c10>,
  <openff.toolkit.topology._mm_molecule._SimpleAtom at 0x12ff79fd0>,
  <openff.toolkit.topology._mm_molecule._SimpleAtom at 0x12ff791f0>])

In [5]: Topology(topology)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 Topology(topology)

File ~/software/openff-toolkit/openff/toolkit/topology/topology.py:430, in Topology.__init__(self, other)
    428 # TODO: Try to construct Topology copy from `other` if specified
    429 if isinstance(other, Topology):
--> 430     self.copy_initializer(other)
    431 elif isinstance(other, FrozenMolecule):
    432     self.from_molecules([other])

File ~/software/openff-toolkit/openff/toolkit/topology/topology.py:1212, in Topology.copy_initializer(self, other)
   1210 def copy_initializer(self, other):
   1211     other_dict = deepcopy(other.to_dict())
-> 1212     self._initialize_from_dict(other_dict)

File ~/software/openff-toolkit/openff/toolkit/topology/topology.py:1283, in Topology._initialize_from_dict(self, topology_dict)
   1280     self.box_vectors = Quantity(box_vectors_unitless, box_vectors_unit)
   1282 for molecule_dict in topology_dict["molecules"]:
-> 1283     new_mol = Molecule.from_dict(molecule_dict)
   1284     self._add_molecule_keep_cache(new_mol)
   1285 self._invalidate_cached_properties()

File ~/software/openff-toolkit/openff/toolkit/topology/molecule.py:1224, in FrozenMolecule.from_dict(cls, molecule_dict)
   1222 # This implementation is a compromise to let this remain as a classmethod
   1223 mol = cls()
-> 1224 mol._initialize_from_dict(molecule_dict)
   1225 return mol

File ~/software/openff-toolkit/openff/toolkit/topology/molecule.py:1241, in FrozenMolecule._initialize_from_dict(self, molecule_dict)
   1239 self.name = molecule_dict["name"]
   1240 for atom_dict in molecule_dict["atoms"]:
-> 1241     self._add_atom(**atom_dict, invalidate_cache=False)
   1243 for bond_dict in molecule_dict["bonds"]:
   1244     bond_dict["atom1"] = int(bond_dict["atom1"])

TypeError: _add_atom() got an unexpected keyword argument 'meatadata'

The last line of the traceback is not useful, but scroll up a couple steps and you'll see it comes from new_mol = Molecule.from_dict(molecule_dict) - it's assumed that the dict must be transformed into a Molecule and no other class (_SimpleMolecule or even another subclass of FrozenMolecule

Additional context

This prevents i.e. using the existing Topology.to_openmm and the things it enables.

j-wags commented 7 months ago

Ah, yeah, we should fix this. I don't have a good sense for how. The main two routes would seem to be:

  1. Have serialized mols contain a new type field. This would break existing mol serialization and would likely require us to implement a versioning scheme for serialized molecules, with all the forward- and reverse-compatibility fun that we already have with SMIRNOFF files. (for example, if someone makes a custom molecule subclass, how would the deserializer know how to recognize and deserialize it? My first thought is that this would send us down the road of making a Molecule subclass registry that maps type to mol subclasses so the right deserializer/initializer can be pulled up))
  2. Have the Topology copy_initializer not use to_dict and from_dict and instead handle live in-memory object. This way, during Molecule copying it has actual instances of the exact Molecule subclasses it needs to copy so their copy constructors can be called directly.

Neither option is great. Given current workloads I don't think we can do an adequate job of 1, so on that basis alone I'm in favor of 2.

Yoshanuikabundi commented 5 months ago

@j-wags @mattwthompson This is blocking the RNA + Sage vignette (similar to https://github.com/openforcefield/openff-interchange/issues/788) - Topology(old_top) is used in a few places in Interchange to copy a Topology. I'm able to combine topologies successfully by patching _combine_topologies not to use it:

def _combine_topologies(topology1: Topology, topology2: Topology) -> Topology:
    topology1_ = Topology(other=topology1)
    # No need to copy topology2, as add_molecule makes a copy

    for molecule in topology2.molecules:
        topology1_.add_molecule(molecule)

    return topology1_

But then exporting to OpenMM also calls the __init__ method:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[44], line 1
----> 1 combined_interchange.minimize()

File ~/Documents/openff/interchange/openff/interchange/components/interchange.py:357, in Interchange.minimize(self, engine, force_tolerance, max_iterations)
    354 if engine == "openmm":
    355     from openff.interchange.operations.minimize.openmm import minimize_openmm
--> 357     minimized_positions = minimize_openmm(
    358         self,
    359         tolerance=force_tolerance,
    360         max_iterations=max_iterations,
    361     )
    362     self.positions = minimized_positions
    363 else:

File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/utilities/utilities.py:80, in requires_package.<locals>.inner_decorator.<locals>.wrapper(*args, **kwargs)
     77 except Exception as e:
     78     raise e
---> 80 return function(*args, **kwargs)

File ~/Documents/openff/interchange/openff/interchange/operations/minimize/openmm.py:25, in minimize_openmm(interchange, tolerance, max_iterations)
     22 import openmm.unit
     23 from openff.units.openmm import from_openmm
---> 25 simulation = interchange.to_openmm_simulation(
     26     openmm.LangevinMiddleIntegrator(
     27         293.15 * openmm.unit.kelvin,
     28         1.0 / openmm.unit.picosecond,
     29         2.0 * openmm.unit.femtosecond,
     30     ),
     31 )
     33 simulation.context.computeVirtualSites()
     35 try:

File ~/Documents/openff/interchange/openff/interchange/components/interchange.py:529, in Interchange.to_openmm_simulation(self, integrator, combine_nonbonded_forces, add_constrained_forces, **kwargs)
    524 import openmm.app
    526 from openff.interchange.interop.openmm._positions import to_openmm_positions
    528 simulation = openmm.app.Simulation(
--> 529     topology=self.to_openmm_topology(),
    530     system=self.to_openmm(
    531         combine_nonbonded_forces=combine_nonbonded_forces,
    532         add_constrained_forces=add_constrained_forces,
    533     ),
    534     integrator=integrator,
    535     **kwargs,
    536 )
    538 # If the system contains virtual sites, the positions must, so no obvious case in which
    539 # include_virtual_sites could possibly be False
    540 if self.positions is not None:

File ~/Documents/openff/interchange/openff/interchange/components/interchange.py:456, in Interchange.to_openmm_topology(self, ensure_unique_atom_names)
    453 """Export components of this Interchange to an OpenMM Topology."""
    454 from openff.interchange.interop.openmm._topology import to_openmm_topology
--> 456 return to_openmm_topology(
    457     self,
    458     ensure_unique_atom_names=ensure_unique_atom_names,
    459 )

File ~/Documents/openff/interchange/openff/interchange/interop/openmm/_topology.py:34, in to_openmm_topology(interchange, ensure_unique_atom_names)
     29 from openff.interchange.interop._virtual_sites import (
     30     _virtual_site_parent_molecule_mapping,
     31 )
     33 # Copy topology to avoid modifying input (eg, when generating atom names)
---> 34 topology = Topology(interchange.topology)
     36 virtual_site_molecule_map = _virtual_site_parent_molecule_mapping(interchange)
     38 molecule_virtual_site_map = defaultdict(list)

File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/toolkit/topology/topology.py:429, in Topology.__init__(self, other)
    427 # TODO: Try to construct Topology copy from `other` if specified
    428 if isinstance(other, Topology):
--> 429     self.copy_initializer(other)
    430 elif isinstance(other, FrozenMolecule):
    431     self.from_molecules([other])

File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/toolkit/topology/topology.py:1208, in Topology.copy_initializer(self, other)
   1206 def copy_initializer(self, other):
   1207     other_dict = deepcopy(other.to_dict())
-> 1208     self._initialize_from_dict(other_dict)

File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/toolkit/topology/topology.py:1279, in Topology._initialize_from_dict(self, topology_dict)
   1276     self.box_vectors = Quantity(box_vectors_unitless, box_vectors_unit)
   1278 for molecule_dict in topology_dict["molecules"]:
-> 1279     new_mol = Molecule.from_dict(molecule_dict)
   1280     self._add_molecule_keep_cache(new_mol)
   1281 self._invalidate_cached_properties()

File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/toolkit/topology/molecule.py:1224, in FrozenMolecule.from_dict(cls, molecule_dict)
   1222 # This implementation is a compromise to let this remain as a classmethod
   1223 mol = cls()
-> 1224 mol._initialize_from_dict(molecule_dict)
   1225 return mol

File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/toolkit/topology/molecule.py:1239, in FrozenMolecule._initialize_from_dict(self, molecule_dict)
   1236 # TODO: Provide useful exception messages if there are any failures
   1238 self._initialize()
-> 1239 self.name = molecule_dict["name"]
   1240 for atom_dict in molecule_dict["atoms"]:
   1241     self._add_atom(**atom_dict, invalidate_cache=False)

KeyError: 'name'

And this doesn't seem to have as easy a fix.

Here's a notebook that generates an RNA molecule, parametrizes it with OpenMM's Amber14 RNA FF distribution, and then combines it with a SMIRNOFF-parametrized topology: rna_from_rdkit.zip

Is there a reason Topology(top) can't just be implemented as deepcopy(top)? We could even follow up with some cleanup code like invalidating the caches just to make sure its a "good" topology.

mattwthompson commented 5 months ago

I forget if Topology.__deepcopy__ is unimplemented because nobody has tried it yet or because refactoring the class is not permitted - but it's one of those reasons

Yoshanuikabundi commented 5 months ago

I guess my point was more, does Topology even have anything that requires bespoke code for the deep copy operation - can't we just rely on the default deepcopy implementation?

j-wags commented 5 months ago

It's complicated and I need to chestersons-fence it a bit to recall whether there's a good reason we're not using deepcopy. But resolving this is a high priority and I'm working on it now.

j-wags commented 5 months ago

As a followup, I entered a meditative trance for a while trying to think of reasons not to use deepcopy and found none, so #1818 moves the topology copy constructor over to a deepcopy-based approach.

mattwthompson commented 5 months ago

Fixed with 0.15.2


   ...: In [5]: Topology(topology)
Out[2]: <openff.toolkit.topology.topology.Topology at 0x168bd69d0>