Closed ChayaSt closed 5 years ago
Be sure to loop in @j-wags who is working on our Open Forcefield serialized representations of molecules and topologies!
QCSchema is always in Bohr without exception. If it is not in Bohr it is an error in the input.
Update on this:
I was not able to get OpenEye to perceive connectivity and stereochemistry from JSON geometry. It does perceive it if it gets read in from a file but we don't want to do that.
RDKit does a find job perceiving stereochemistry from the JSON geometry but it also need the connectivity table to generate a molecular graph. So currently cmiles
uses RDKit and the connectivity table to create a molecule.
Also, cmiles
assumes the QCSchema so all units are assumed to be in Bohr. @j-wags, once the Open Forcefield serialized representation is complete, we should add the ability to generate molecular id's from that representation. We will need a way to keep track of units then.
Thanks for the ping, @ChayaSt. Right now, all quantities in the SMIRNOFF Molecule
class are unit-wrapped numpy arrays (simtk.unit.Quantity
objects). This should make unit conversion straightforward when we get there.
I was not able to get OpenEye to perceive connectivity and stereochemistry from JSON geometry. It does perceive it if it gets read in from a file but we don't want to do that.
I was wrestling with OE and stereochemistry two weeks ago. One helpful function had been oechem.OE3DToInternalStereo(oemol)
, though I mucked around with a bunch of other functions. If you're up for it, we could do a screenshare today to look into this.
I was wrestling with OE and stereochemistry two weeks ago. One helpful function had been oechem.OE3DToInternalStereo(oemol), though I mucked around with a bunch of other functions. If you're up for it, we could do a screenshare today to look into this.
Thanks! I first wasn't even able to get Openeye to perceive connectivity from coordinates and found the function oemol.SetDimension(3)
that then allows oechem.OEDetermineConnectivity(oemol)
to work. I then called a bunch of functions on the molecule:
oechem.OEFindRingAtomsAndBonds(molecule)
oechem.OEPerceiveBondOrders(molecule)
oechem.OEAssignImplicitHydrogens(molecule)
oechem.OEAssignFormalCharges(molecule)
oechem.OEAssignAromaticFlags(molecule)
These functions are always called when openeye reads a molecule from an xyz file. Once I did this the molecule has the stereochemistry information and I don't need to perceive it.
That's great. Thanks for looking into this!
We had talked about finding undefined stereochemistry the other day, and I wanted to document that somewhere: Basically, I couldn't find a built-in OE function to check for undefined stereochemistry. Here's what I'm using currently in SMIRNOFF:
from openeye import oechem
from openforcefield.topology.molecule import Molecule
oechem.OEPerceiveChiral(oemol)
# Check that all stereo is specified
unspec_chiral = False
unspec_db = False
problematic_atoms = list()
problematic_bonds = list()
for oeatom in oemol.GetAtoms():
if oeatom.IsChiral():
if not (oeatom.HasStereoSpecified()):
unspec_chiral = True
problematic_atoms.append(oeatom)
for oebond in oemol.GetBonds():
if oebond.IsChiral():
if not (oebond.HasStereoSpecified()):
unspec_db = True
problematic_bonds.append(oebond)
Thank you!
This is really helpful!
@j-wags, turns out the code snippet above also finds spurious missing steroechemistry.
Example SMILES: N[C@@H](CCCNC(N)=N)C(O)=O
It finds this bond to be missing stereochemistry: [(7, 'C', 9, 'N', 2)]
(that's C=N) which is a terminal bond so it's not chiral.
Turns out, atom.HasStereoSpecified(oechem.OEAtomStereo_Tetrahedral)
and bond.HasStereoSepcified(oechem.OEBondStereo_CisTrans)
will be true for chiral bonds and atoms even though the specific chirality is not defined. You have to check what the handedness is and check if that is undefined.
Here is the code I'm using now.
def is_stereochemistry_defined(molecule):
unspec_chiral = False
unspec_db = False
problematic_atoms = list()
problematic_bonds = list()
for atom in molecule.GetAtoms():
if atom.IsChiral() or atom.HasStereoSpecified(oechem.OEAtomStereo_Tetrahedral):
# Check if handness is specified
v = []
for nbr in atom.GetAtoms():
v.append(nbr)
stereo = atom.GetStereo(v, oechem.OEAtomStereo_Tetrahedral)
if stereo == oechem.OEAtomStereo_Undefined:
unspec_chiral = True
problematic_atoms.append((atom.GetIdx(), oechem.OEGetAtomicSymbol(atom.GetAtomicNum())))
for bond in molecule.GetBonds():
if bond.IsChiral() or bond.HasStereoSpecified(oechem.OEBondStereo_CisTrans):
v = []
for neigh in bond.GetBgn().GetAtoms():
if neigh != bond.GetEnd():
v.append(neigh)
break
for neigh in bond.GetEnd().GetAtoms():
if neigh != bond.GetBgn():
v.append(neigh)
break
stereo = bond.GetStereo(v, oechem.OEBondStereo_CisTrans)
if stereo == oechem.OEBondStereo_Undefined:
unspec_db = True
a1 = bond.GetBgn()
a2 = bond.GetEnd()
a1_idx = a1.GetIdx()
a2_idx = a2.GetIdx()
a1_s = oechem.OEGetAtomicSymbol(a1.GetAtomicNum())
a2_s = oechem.OEGetAtomicSymbol(a2.GetAtomicNum())
bond_order = bond.GetOrder()
problematic_bonds.append((a1_idx, a1_s, a2_idx, a2_s, bond_order))
if unspec_chiral or unspec_db:
raise ValueError("Stereochemistry is unspecified. Problematic atoms {}, problematic bonds {}".format(problematic_atoms,
problematic_bonds))
else:
return True
edit: fixed code snippet.
@j-wags, not sure if you are using that snippet here https://github.com/openforcefield/openforcefield/issues/141#issuecomment-442148411 but this might be picking up spurious problems.
Very interesting! I was using that snippet there. Thanks for making the connection, I'll make sure it gets passed on to the other PR.
Thanks for doing the research above. I've got a few things to finish up with Topology
right now, but I'll try out your code soon.
Addressed by #13
Add the ability to generate identifiers from QCSchema.
The minimum fields required to generate a molecular graph are different for RDKit and OpenEye. For OpenEye, the minimum is either symbols and connectivity or symbols and geometry because OpenEye can perceive bonds and bond types from distance. However, the units need to be in Angstroms. @dgasmith, are the units in QCSchema always Bohr? Is there a way to ensure that? If geometry exists we want to use it to make sure we get the stereochemistry right.
For RDKit, the connectivity table is needed. RDKit cannot perceive bonds from distance.