openforcefield / cmiles

Generate canonical molecule identifiers for quantum chemistry database
https://cmiles.readthedocs.io
MIT License
23 stars 7 forks source link

Hydrogens move and stereochemistry flip when reading qcschema molecule #36

Open ChayaSt opened 4 years ago

ChayaSt commented 4 years ago

@vtlim pointed out that the cmiles SMILES changed for this molecule.

client = ptl.FractalClient()

ds = client.get_collection('OptimizationDataset', 'OpenFF Full Optimization Benchmark 1')
entry = ds.get_entry(name='CO/N=C/1\C[N@](C[C@H]1C[NH3+])c2c(cc3c(=O)c(cn(c3n2)C4CC4)C(=O)[O-])F-3')
record = ds.get_record('CO/N=C/1\C[N@](C[C@H]1C[NH3+])c2c(cc3c(=O)c(cn(c3n2)C4CC4)C(=O)[O-])F-3', specification='default')
qc_mol = record.get_final_molecule()
qcjson_mol = qc_mol.dict(encoding='json')
oemol = cmiles.utils.load_molecule(qcjson_mol)

print("cmiles SMILES in database:")
print(entry.attributes['canonical_isomeric_smiles'])
print('cmiles SMILES after reading qcschema molecule')
print(cmiles.utils.mol_to_smiles(oemol, mapped=False, explicit_hydrogen=False))

Output:

cmiles SMILES in database:
CO/N=C/1\C[N@](C[C@H]1C[NH3+])c2c(cc3c(=O)c(cn(c3n2)C4CC4)C(=O)[O-])F
cmiles SMILES after reading qcschema molecule
CO/N=C/1\C[N@@H+](C[C@H]1CN)c2c(cc3c(=O)c(cn(c3n2)C4CC4)C(=O)[O-])F

There are 2 issues:

  1. A hydrogen moved from the NH3+ to the pyramidal nitrogen
  2. Stereochemistry changed for pyramidal nitrogen.

The first issue is fixed when this is commented out. It's not needed because cmiles already has the connectivity from the connectivity map. However, the bond is very long so openeye's perception is probably correct. These issues should be flagged with checking WBOs to see if the chemical graph changed. After investigating the second issue, it turns out that the input molecule had the pyramidal nitrogen in a planar conformation and after optimization the pyramidal nitrogen had a specified stereochemistry which was apparently different than the input SMILES.

Input molecule:

Screen Shot 2019-11-27 at 10 44 04 AM

Optimized molecule:

Screen Shot 2019-11-27 at 10 44 08 AM

@j-wags, I think this means that cmiles should not require defined stereochemistry for pyramidal nitrogen given that they can flip.

ChayaSt commented 4 years ago

When loading the qcschema molecule with rdkit, the negatively charged O gets a H to make it neutral.