rdkit / rdkit

The official sources for the RDKit library
BSD 3-Clause "New" or "Revised" License
2.66k stars 880 forks source link

Double bond geometry loss on calling removeHs #754

Closed sroughley closed 8 years ago

sroughley commented 8 years ago

SMILES Strings with explicit hydrogens present and double bond geometry specified lose the double bond geometry when importing to RDKit ROMol object with default sanitization.

e.g. If we use, where we wish to keep explicit hydrogens: RWMol mol = RWMol.MolFromSmiles(smiles, 0, false); mol.findSSSR(); String canonicalSmiles = mol.MolToSmiles(true);

then the value of canonicalSmiles is as expected.

But, if we use the following, where we would like to remove hydrogens:

RWMol mol = RWMol.MolFromSmiles(smiles); mol.findSSSR(); String canonicalSmiles = mol.MolToSmiles(true);

Then, the resulting SMILES has lost the stereochemistry supplied has been lost, for example:

[1*]/C(Cl)=C(/[H])C([H])([H])[H] --> [1*]C(Cl)=CC [1*]C([H])([H])/C([H])=C(\[H])C([H])([H])[H] --> [1*]CC=CC [1*]/C(Cl)=C(\[H])C([H])([H])[H] --> [1*]C(Cl)=CC

(In each case, the SMILES on the left has been generated from a previous ROMol#MolToSmiles(true) call))

Steve

bp-kelley commented 8 years ago

In all your cases, you are relating stereochemistry to the Hydrogen. As a test, I have done it to the heavy atom with expected results:

m=Chem.MolFromSmiles("[1*]/C(Cl)=C\C")
Chem.MolToSmiles(m, isomericSmiles=True)
'[1*]/C(Cl)=C\\C'

So the hydrogen issue smells like a bug to me, I expect the hydrogens are being removed and unexpectedly removing the chiral information. We can perhaps validate this here:

m = Chem.MolFromSmiles("[1*]/C(Cl)=C(/[H])C([H])([H])[H]", sanitize=False)
Chem.SanitizeMol(m, rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_ALL)
Chem.MolToSmiles(m, isomericSmiles=True)
# so far so good
'[1*]/C(Cl)=C(/[H])C([H])([H])[H]'
mh=Chem.RemoveHs(m)
Chem.MolToSmiles(mh, isomericSmiles=True)
# chirality is gone
'[1*]C(Cl)=CC'

So, indeed, this is a bug. Hopefully the python snippet above will give you a work around!

greglandrum commented 8 years ago

@sroughley : The bug is fixed and the fix is queued for review. It's likely to be on master in the next day or so.