openforcefield / cmiles

Generate canonical molecule identifiers for quantum chemistry database
https://cmiles.readthedocs.io
MIT License
23 stars 7 forks source link

Add atom map to molecule #26

Closed ChayaSt closed 5 years ago

ChayaSt commented 5 years ago

Description

Add canonical atom map to an existing molecule.

Status

ChayaSt commented 5 years ago

The failing tests are due to rdkit version update 2019.03.1 -> 2019.03.2.

jchodera commented 5 years ago

The failing tests are due to rdkit version update 2019.03.1 -> 2019.03.2.

What happened in that update. Is this something we need to worry about more broadly for the toolkit?

ChayaSt commented 5 years ago

What happened in that update. Is this something we need to worry about more broadly for the toolkit?

The tests flag canonical SMILES and InChI changes. It seems like the current update removes overdefined stereo for bridged rings. This is different than the behavior in the previous release where it did not remove them. OpenEye does not remove them. Below is an example for the molecule in the image: image

import cmiles
from openeye import oechem
from rdkit import Chem
import rdkit
print(rdkit.__version__)

original_smiles = 'CC1(C)[C@H]2CC[C@]1(C)CC2'
oemol = oechem.OEMol()
oechem.OESmilesToMol(oemol, original_smiles)
print(cmiles.utils.has_stereo_defined(oemol))
print(oechem.OEMolToSmiles(oemol))

# Take it through rdkit
rdmol = Chem.MolFromSmiles(original_smiles)
print(cmiles.utils.has_stereo_defined(rdmol))
rd_smiles = Chem.MolToSmiles(rdmol)
print(rd_smiles)

#Create mol from rd_smiles
oemol = oechem.OEMol()
oechem.OESmilesToMol(oemol, rd_smiles)
print(cmiles.utils.has_stereo_defined(oemol))
print(oechem.OEMolToSmiles(oemol))

rdmol = Chem.MolFromSmiles(rd_smiles)
print(cmiles.utils.has_stereo_defined(rdmol))

output for rdkit 2019.03.1

2019.03.1
True
C[C@@]12C([C@@H](CC1)CC2)(C)C
True
CC1(C)[C@H]2CC[C@]1(C)CC2
True
C[C@@]12C([C@@H](CC1)CC2)(C)C
True

output for rdkit 2019.03.2

2018.09.2
True
C[C@@]12C([C@@H](CC1)CC2)(C)C
True
CC12CCC(CC1)C2(C)C
True
CC1(C2CCC1(CC2)C)C
True

Currently cmiles does not have a way to detect overdefined stereo. It only detects if stereo is missing. It would be nice if it would be able to do that.

ETA: However, it seems like InChI does expect the stereo information because when the SMILES without stereo is passed through Chem.MolToInchi(rdmol) it raises a warning [08:56:51] WARNING: Omitted undefined stereo

For the record, below are the 6 SMILES that now raise the warning when calling Chem.MolToInchi(mol) with molecules created with those SMILES.

'[CC(N1CCN(CC1)c2ncc(cc2)C(F)(F)F)(C(=O)N[C@H]3[C@@H]4C[C@@]5(CC(C4)C[C@H]3C5)C(=O)N)C',
'C[C@@]12OO[C@@](CCC(O)=O)(C3=CC=CC=C13)C1=C2C=CC=C1',
'[H][C@]12C(=O)N(C(=O)[C@@]1([H])[C@@]1(CC[C@@]2([H])CC1)NC(=O)OCC(O)=O)C1=CC=C(NC(C)=O)C=C1',
'[CC1(C)[C@H]2CC[C@]1(C)CC2',
'C[C@@]12CC[C@@H](CC1)C(C)(C)O2',
'C[N@@+]12C[C@@H]([C@@H](CC1)CC2)OC(=O)C(O)(c3ccccc3)c4ccccc4'
codecov-io commented 5 years ago

Codecov Report

Merging #26 into master will increase coverage by 0.03%. The diff coverage is 93.75%.

codecov-io commented 5 years ago

Codecov Report

Merging #26 into master will increase coverage by 0.03%. The diff coverage is 93.75%.