Closed sroughley closed 8 years ago
In all your cases, you are relating stereochemistry to the Hydrogen. As a test, I have done it to the heavy atom with expected results:
m=Chem.MolFromSmiles("[1*]/C(Cl)=C\C")
Chem.MolToSmiles(m, isomericSmiles=True)
'[1*]/C(Cl)=C\\C'
So the hydrogen issue smells like a bug to me, I expect the hydrogens are being removed and unexpectedly removing the chiral information. We can perhaps validate this here:
m = Chem.MolFromSmiles("[1*]/C(Cl)=C(/[H])C([H])([H])[H]", sanitize=False)
Chem.SanitizeMol(m, rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_ALL)
Chem.MolToSmiles(m, isomericSmiles=True)
# so far so good
'[1*]/C(Cl)=C(/[H])C([H])([H])[H]'
mh=Chem.RemoveHs(m)
Chem.MolToSmiles(mh, isomericSmiles=True)
# chirality is gone
'[1*]C(Cl)=CC'
So, indeed, this is a bug. Hopefully the python snippet above will give you a work around!
@sroughley : The bug is fixed and the fix is queued for review. It's likely to be on master in the next day or so.
SMILES Strings with explicit hydrogens present and double bond geometry specified lose the double bond geometry when importing to RDKit ROMol object with default sanitization.
e.g. If we use, where we wish to keep explicit hydrogens:
RWMol mol = RWMol.MolFromSmiles(smiles, 0, false);
mol.findSSSR();
String canonicalSmiles = mol.MolToSmiles(true);
then the value of
canonicalSmiles
is as expected.But, if we use the following, where we would like to remove hydrogens:
RWMol mol = RWMol.MolFromSmiles(smiles);
mol.findSSSR();
String canonicalSmiles = mol.MolToSmiles(true);
Then, the resulting SMILES has lost the stereochemistry supplied has been lost, for example:
[1*]/C(Cl)=C(/[H])C([H])([H])[H] --> [1*]C(Cl)=CC
[1*]C([H])([H])/C([H])=C(\[H])C([H])([H])[H] --> [1*]CC=CC
[1*]/C(Cl)=C(\[H])C([H])([H])[H] --> [1*]C(Cl)=CC
(In each case, the SMILES on the left has been generated from a previous ROMol#MolToSmiles(true) call))
Steve