openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
313 stars 92 forks source link

Preserve Molecule core properties through RDMol/OEMol conversions #135

Open j-wags opened 5 years ago

j-wags commented 5 years ago

The goals in this issue are

1) Make sure that core Molecule properties survive conversion to/from OEMol/RDMol.

2) Determine whether the core properties survive filo I/O in supported file formats.

3) Determine whether secondary properties survive conversion to/from OEMol/RDMol.

4) Determine whether secondary properties survive filo I/O in supported file formats.

Core properties of Molecule include:

Note that core properties do not include VirtualSites

Secondary properties are anything serializeable that users add to the molecule._properties dict.

j-wags commented 5 years ago

Also, test specifically that charges are read from MOL2's

j-wags commented 5 years ago

I'm thinking that the strict test of "successful round trips" is to pass:

off_mol1 = (any valid openforcefield.topology.molecule.Molecule object with just core properties)
toolkit_mol = toolkit_wrapper.to_openeye(off_mol) # (or any toolkit molecule object)
off_mol2 = toolkit_wrapper.from_object(toolkit_mol)
assert off_mol1 == off_mol2

In this round-trip framework, we need to think of how each property of an OFFMol can map in and out of a toolkit molecule representation. For example, I hit a problem with OEMols, where oemol.SetTitle only works on strings, while OFFMols can have their name be a string or None. There's no safe mapping for None in and out of an OEMol. We could always stash these core properties into arbitrary property dictionaries (both OEMols and RDMols have these), but it would be weird to convert a OFFMol to an OEmol, and not have it's name go into the OEMol's Title field.

Many fields are natively supported, so fields like "atom element" and "integer bond orders" are no problem.

Generally, not knowing what interfaces we'll need to comply with in the future, I think this means we want to limit the OFFMol's property space when possible. For example, I'm removing None as a valid OFFMol name, and strictly requiring strings ("" is the default).

Goal 1 has a few implications.

I'll update this post with more interface notes as they are implemented.

j-wags commented 5 years ago

Molecule Core Property Mapping Spec

(in progress, this should go somewhere official eventually)

OFFMol OEMol RDMol
No Name molecule.name = "" oemol.SetTitle("") rdmol.SetProp('name','')
No partial charges molecule._partial_charges = None each oeAtom.GetPartialCharge() returns 0.0 each rdatom.HasProp('partial_charge' returns False
No conformers molecule._conformers = None oemol.GetConformers() yields a dictionary mapping all atom indices to (0,0,0)'s rdmol.GetConformers() returns ()
j-wags commented 5 years ago

Atom Core Property Mapping Spec

Note: We only care about round trips from OFFMol --> ToolkitMol --> OFFMol, therefore if OFFMol doesn't a permit a blank value for a field, then we don't have to worry about the toolkit representation of the blank value. (?)

OFFAtom OEAtom RDAtom
No name Atom.name = "" oeatom.SetName("") rdatom.SetProp("name","")
No element Not permitted N/A N/A
No formal_charge Not permitted N/A N/A
No is_aromatic Not permitted N/A N/A
No stereochemistry None oechem.OEPerceiveCIPStereo(oemol, oeatom) returns oechem.OECIPAtomStereo_NotStereo, and oeatom.SetStereo() is not called on atom during construction (might open a low-priority issue to inspect closer) rdatom.GetChiralTag() returns rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED OR returns something other than CHI_TETRAHEDRAL_CCW or CHI_TETRAHEDRAL_CW (Are there other possible return values? Need to inspect)
j-wags commented 5 years ago

Bond core property mapping spec

OFFBond OEBond RDBond
No atom1 Not permitted N/A N/A
No atom2 Not permitted N/A N/A
No bond_order Not permitted N/A N/A
No fractional_bond_order None oebond.HasData('fractional_bond_order') returns False rdbond.HasProp("fractional_bond_order") returns 0 or False
No is_aromatic Not permitted N/A N/A
No stereochemistry None oebond.HasStereoSpecified() returns False OR oechem.OECIPBondStereo_NotStereo (This will require further inspection) rdb.GetStereo() returns none of [Chem.BondStereo.STEREOCIS, Chem.BondStereo.STEREOZ, Chem.BondStereo.STEREOTRANS, Chem.BondStereo.STEREOE] OR running Chem.AssignStereochemistry(rdmol, cleanIt=True, force=True) then rdmol.GetProp('_CIPCode') returns... ? (Incomplete)
j-wags commented 5 years ago

Meeting notes, also posted to Slack:

2018_11_07 Jeff Caitlin Chat

j-wags commented 5 years ago

First pass implemented in 0d29212c5173033a905813935a58346c4323dcbe 7651ecdc3dfdaa0c0d8909d68543d307f8dc670c and 6a03e2604ffade3bcbb13bb74731db6f4bad0558

Will wait for review to close issue.

j-wags commented 5 years ago

Note - the name of the atom field that we send partial charges to will be changed as a result of work in #250. Update the table in this issue when changes are merged.

mattwthompson commented 1 year ago

Is there anything left to do here? Is roundtripping through formats and object models we don't control an absolutely essential behavior?

If there's nothing to act on here we should close this and consider moving the notes to somewhere like developer docs.