Closed mattwthompson closed 5 months ago
Ah, yeah, we should fix this. I don't have a good sense for how. The main two routes would seem to be:
type
field. This would break existing mol serialization and would likely require us to implement a versioning scheme for serialized molecules, with all the forward- and reverse-compatibility fun that we already have with SMIRNOFF files. (for example, if someone makes a custom molecule subclass, how would the deserializer know how to recognize and deserialize it? My first thought is that this would send us down the road of making a Molecule subclass registry that maps type
to mol subclasses so the right deserializer/initializer can be pulled up))copy_initializer
not use to_dict
and from_dict
and instead handle live in-memory object. This way, during Molecule copying it has actual instances of the exact Molecule subclasses it needs to copy so their copy constructors can be called directly.Neither option is great. Given current workloads I don't think we can do an adequate job of 1, so on that basis alone I'm in favor of 2.
@j-wags @mattwthompson This is blocking the RNA + Sage vignette (similar to https://github.com/openforcefield/openff-interchange/issues/788) - Topology(old_top)
is used in a few places in Interchange to copy a Topology. I'm able to combine topologies successfully by patching _combine_topologies
not to use it:
def _combine_topologies(topology1: Topology, topology2: Topology) -> Topology:
topology1_ = Topology(other=topology1)
# No need to copy topology2, as add_molecule makes a copy
for molecule in topology2.molecules:
topology1_.add_molecule(molecule)
return topology1_
But then exporting to OpenMM also calls the __init__
method:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Cell In[44], line 1
----> 1 combined_interchange.minimize()
File ~/Documents/openff/interchange/openff/interchange/components/interchange.py:357, in Interchange.minimize(self, engine, force_tolerance, max_iterations)
354 if engine == "openmm":
355 from openff.interchange.operations.minimize.openmm import minimize_openmm
--> 357 minimized_positions = minimize_openmm(
358 self,
359 tolerance=force_tolerance,
360 max_iterations=max_iterations,
361 )
362 self.positions = minimized_positions
363 else:
File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/utilities/utilities.py:80, in requires_package.<locals>.inner_decorator.<locals>.wrapper(*args, **kwargs)
77 except Exception as e:
78 raise e
---> 80 return function(*args, **kwargs)
File ~/Documents/openff/interchange/openff/interchange/operations/minimize/openmm.py:25, in minimize_openmm(interchange, tolerance, max_iterations)
22 import openmm.unit
23 from openff.units.openmm import from_openmm
---> 25 simulation = interchange.to_openmm_simulation(
26 openmm.LangevinMiddleIntegrator(
27 293.15 * openmm.unit.kelvin,
28 1.0 / openmm.unit.picosecond,
29 2.0 * openmm.unit.femtosecond,
30 ),
31 )
33 simulation.context.computeVirtualSites()
35 try:
File ~/Documents/openff/interchange/openff/interchange/components/interchange.py:529, in Interchange.to_openmm_simulation(self, integrator, combine_nonbonded_forces, add_constrained_forces, **kwargs)
524 import openmm.app
526 from openff.interchange.interop.openmm._positions import to_openmm_positions
528 simulation = openmm.app.Simulation(
--> 529 topology=self.to_openmm_topology(),
530 system=self.to_openmm(
531 combine_nonbonded_forces=combine_nonbonded_forces,
532 add_constrained_forces=add_constrained_forces,
533 ),
534 integrator=integrator,
535 **kwargs,
536 )
538 # If the system contains virtual sites, the positions must, so no obvious case in which
539 # include_virtual_sites could possibly be False
540 if self.positions is not None:
File ~/Documents/openff/interchange/openff/interchange/components/interchange.py:456, in Interchange.to_openmm_topology(self, ensure_unique_atom_names)
453 """Export components of this Interchange to an OpenMM Topology."""
454 from openff.interchange.interop.openmm._topology import to_openmm_topology
--> 456 return to_openmm_topology(
457 self,
458 ensure_unique_atom_names=ensure_unique_atom_names,
459 )
File ~/Documents/openff/interchange/openff/interchange/interop/openmm/_topology.py:34, in to_openmm_topology(interchange, ensure_unique_atom_names)
29 from openff.interchange.interop._virtual_sites import (
30 _virtual_site_parent_molecule_mapping,
31 )
33 # Copy topology to avoid modifying input (eg, when generating atom names)
---> 34 topology = Topology(interchange.topology)
36 virtual_site_molecule_map = _virtual_site_parent_molecule_mapping(interchange)
38 molecule_virtual_site_map = defaultdict(list)
File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/toolkit/topology/topology.py:429, in Topology.__init__(self, other)
427 # TODO: Try to construct Topology copy from `other` if specified
428 if isinstance(other, Topology):
--> 429 self.copy_initializer(other)
430 elif isinstance(other, FrozenMolecule):
431 self.from_molecules([other])
File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/toolkit/topology/topology.py:1208, in Topology.copy_initializer(self, other)
1206 def copy_initializer(self, other):
1207 other_dict = deepcopy(other.to_dict())
-> 1208 self._initialize_from_dict(other_dict)
File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/toolkit/topology/topology.py:1279, in Topology._initialize_from_dict(self, topology_dict)
1276 self.box_vectors = Quantity(box_vectors_unitless, box_vectors_unit)
1278 for molecule_dict in topology_dict["molecules"]:
-> 1279 new_mol = Molecule.from_dict(molecule_dict)
1280 self._add_molecule_keep_cache(new_mol)
1281 self._invalidate_cached_properties()
File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/toolkit/topology/molecule.py:1224, in FrozenMolecule.from_dict(cls, molecule_dict)
1222 # This implementation is a compromise to let this remain as a classmethod
1223 mol = cls()
-> 1224 mol._initialize_from_dict(molecule_dict)
1225 return mol
File ~/Documents/openff/interchange/.soap/gromacs/lib/python3.11/site-packages/openff/toolkit/topology/molecule.py:1239, in FrozenMolecule._initialize_from_dict(self, molecule_dict)
1236 # TODO: Provide useful exception messages if there are any failures
1238 self._initialize()
-> 1239 self.name = molecule_dict["name"]
1240 for atom_dict in molecule_dict["atoms"]:
1241 self._add_atom(**atom_dict, invalidate_cache=False)
KeyError: 'name'
And this doesn't seem to have as easy a fix.
Here's a notebook that generates an RNA molecule, parametrizes it with OpenMM's Amber14 RNA FF distribution, and then combines it with a SMIRNOFF-parametrized topology: rna_from_rdkit.zip
Is there a reason Topology(top)
can't just be implemented as deepcopy(top)
? We could even follow up with some cleanup code like invalidating the caches just to make sure its a "good" topology.
I forget if Topology.__deepcopy__
is unimplemented because nobody has tried it yet or because refactoring the class is not permitted - but it's one of those reasons
I guess my point was more, does Topology
even have anything that requires bespoke code for the deep copy operation - can't we just rely on the default deepcopy
implementation?
It's complicated and I need to chestersons-fence it a bit to recall whether there's a good reason we're not using deepcopy. But resolving this is a high priority and I'm working on it now.
As a followup, I entered a meditative trance for a while trying to think of reasons not to use deepcopy and found none, so #1818 moves the topology copy constructor over to a deepcopy-based approach.
Fixed with 0.15.2
...: In [5]: Topology(topology)
Out[2]: <openff.toolkit.topology.topology.Topology at 0x168bd69d0>
Describe the bug
Topology
uses some hard-coded calls toMolecule
in places that prevent it from re-initializing a copy of itself when_SimpleMolecule
s are present. This state is not forbidden, but fails whenTopology.__init__
is called, which includes some important points.To Reproduce
The last line of the traceback is not useful, but scroll up a couple steps and you'll see it comes from
new_mol = Molecule.from_dict(molecule_dict)
- it's assumed that the dict must be transformed into aMolecule
and no other class (_SimpleMolecule
or even another subclass ofFrozenMolecule
Additional context
This prevents i.e. using the existing
Topology.to_openmm
and the things it enables.