Open j-wags opened 5 years ago
Also, test specifically that charges are read from MOL2's
I'm thinking that the strict test of "successful round trips" is to pass:
off_mol1 = (any valid openforcefield.topology.molecule.Molecule object with just core properties)
toolkit_mol = toolkit_wrapper.to_openeye(off_mol) # (or any toolkit molecule object)
off_mol2 = toolkit_wrapper.from_object(toolkit_mol)
assert off_mol1 == off_mol2
In this round-trip framework, we need to think of how each property of an OFFMol can map in and out of a toolkit molecule representation. For example, I hit a problem with OEMols, where oemol.SetTitle only works on strings, while OFFMols can have their name be a string or None. There's no safe mapping for None in and out of an OEMol. We could always stash these core properties into arbitrary property dictionaries (both OEMols and RDMols have these), but it would be weird to convert a OFFMol to an OEmol, and not have it's name go into the OEMol's Title field.
Many fields are natively supported, so fields like "atom element" and "integer bond orders" are no problem.
Generally, not knowing what interfaces we'll need to comply with in the future, I think this means we want to limit the OFFMol's property space when possible. For example, I'm removing None as a valid OFFMol name, and strictly requiring strings ("" is the default).
Goal 1 has a few implications.
I'll update this post with more interface notes as they are implemented.
(in progress, this should go somewhere official eventually)
OFFMol | OEMol | RDMol | |
---|---|---|---|
No Name | molecule.name = "" |
oemol.SetTitle("") |
rdmol.SetProp('name','') |
No partial charges | molecule._partial_charges = None |
each oeAtom.GetPartialCharge() returns 0.0 |
each rdatom.HasProp('partial_charge' returns False |
No conformers | molecule._conformers = None |
oemol.GetConformers() yields a dictionary mapping all atom indices to (0,0,0) 's |
rdmol.GetConformers() returns () |
Note: We only care about round trips from OFFMol --> ToolkitMol --> OFFMol, therefore if OFFMol doesn't a permit a blank value for a field, then we don't have to worry about the toolkit representation of the blank value. (?)
OFFAtom | OEAtom | RDAtom | |
---|---|---|---|
No name | Atom.name = "" | oeatom.SetName("") |
rdatom.SetProp("name","") |
No element | Not permitted | N/A | N/A |
No formal_charge | Not permitted | N/A | N/A |
No is_aromatic | Not permitted | N/A | N/A |
No stereochemistry | None | oechem.OEPerceiveCIPStereo(oemol, oeatom) returns oechem.OECIPAtomStereo_NotStereo , and oeatom.SetStereo() is not called on atom during construction (might open a low-priority issue to inspect closer) |
rdatom.GetChiralTag() returns rdkit.Chem.rdchem.ChiralType.CHI_UNSPECIFIED OR returns something other than CHI_TETRAHEDRAL_CCW or CHI_TETRAHEDRAL_CW (Are there other possible return values? Need to inspect) |
OFFBond | OEBond | RDBond | |
---|---|---|---|
No atom1 | Not permitted | N/A | N/A |
No atom2 | Not permitted | N/A | N/A |
No bond_order | Not permitted | N/A | N/A |
No fractional_bond_order | None | oebond.HasData('fractional_bond_order') returns False |
rdbond.HasProp("fractional_bond_order") returns 0 or False |
No is_aromatic | Not permitted | N/A | N/A |
No stereochemistry | None | oebond.HasStereoSpecified() returns False OR oechem.OECIPBondStereo_NotStereo (This will require further inspection) |
rdb.GetStereo() returns none of [Chem.BondStereo.STEREOCIS, Chem.BondStereo.STEREOZ, Chem.BondStereo.STEREOTRANS, Chem.BondStereo.STEREOE] OR running Chem.AssignStereochemistry(rdmol, cleanIt=True, force=True) then rdmol.GetProp('_CIPCode') returns... ? (Incomplete) |
Meeting notes, also posted to Slack:
2018_11_07 Jeff Caitlin Chat
First pass implemented in 0d29212c5173033a905813935a58346c4323dcbe 7651ecdc3dfdaa0c0d8909d68543d307f8dc670c and 6a03e2604ffade3bcbb13bb74731db6f4bad0558
Will wait for review to close issue.
Note - the name of the atom field that we send partial charges to will be changed as a result of work in #250. Update the table in this issue when changes are merged.
Is there anything left to do here? Is roundtripping through formats and object models we don't control an absolutely essential behavior?
If there's nothing to act on here we should close this and consider moving the notes to somewhere like developer docs.
The goals in this issue are
1) Make sure that core Molecule properties survive conversion to/from OEMol/RDMol.
2) Determine whether the core properties survive filo I/O in supported file formats.
3) Determine whether secondary properties survive conversion to/from OEMol/RDMol.
4) Determine whether secondary properties survive filo I/O in supported file formats.
Core properties of Molecule include:
Note that core properties do not include VirtualSites
Secondary properties are anything serializeable that users add to the molecule._properties dict.