rdkit / rdkit

The official sources for the RDKit library
BSD 3-Clause "New" or "Revised" License
2.7k stars 881 forks source link

EnumerateStereoisomers generates different isomers with legacy and new stereo code #6977

Open bp-kelley opened 12 months ago

bp-kelley commented 12 months ago

Example of the issue:

Chem.SetUseLegacyStereoPerception(False)
smi = "NC(F)C(I)C(F)N"
print("SMI:", smi)
m = Chem.MolFromSmiles(smi)
print(f"stereo: {[x.type for x in Chem.FindPotentialStereo(m)]}")
isomers = list(EnumerateStereoisomers(m))
print("new stereo num isomers", len(isomers))
for ism in isomers:
    print(Chem.MolToSmiles(ism))
###
print()
Chem.SetUseLegacyStereoPerception(True)
m = Chem.MolFromSmiles(smi)
print(f"stereo: {[x.type for x in Chem.FindPotentialStereo(m)]}")
isomers = list(EnumerateStereoisomers(m))
print("old stereo num isomers:", len(isomers))
for ism in isomers:
    print(Chem.MolToSmiles(ism))

Outputs 6 for non legacy stereo (incorrect) and 3 for legacy stereo (correct)

SMI: NC(F)C(I)C(F)N
stereo: [rdkit.Chem.rdchem.StereoType.Atom_Tetrahedral, rdkit.Chem.rdchem.StereoType.Atom_Tetrahedral, rdkit.Chem.rdchem.StereoType.Atom_Tetrahedral]
new stereo num isomers 6
N[C@H](F)[C@H](I)[C@@H](N)F
N[C@@H](F)[C@H](I)[C@@H](N)F
N[C@H](F)[C@@H](I)[C@@H](N)F
N[C@@H](F)[C@@H](I)[C@@H](N)F
N[C@H](F)[C@H](I)[C@H](N)F
N[C@H](F)[C@@H](I)[C@H](N)F

stereo: [rdkit.Chem.rdchem.StereoType.Atom_Tetrahedral, rdkit.Chem.rdchem.StereoType.Atom_Tetrahedral, rdkit.Chem.rdchem.StereoType.Atom_Tetrahedral]
old stereo num isomers: 3
N[C@H](F)C(I)[C@@H](N)F
N[C@@H](F)C(I)[C@@H](N)F
N[C@H](F)C(I)[C@H](N)F

I believe that this is caused by a change in steps. The legacy stereo perception also created CIP codes so the central Carbon would be noted to not be chiral. The newer code does this later in the sanitization step.

If we add a sanitization to EnumerateStereoisomers (line 340), this fixes the behavior.

    Chem.AssignStereochemistry(isomer, cleanIt=True, force=True, flagPossibleStereoCenters=True)
    Chem.SanitizeMol(isomer)
bp-kelley commented 12 months ago

This was noted in discussion #6881

greglandrum commented 12 months ago

Here's a more compact form of this (generated using v2023.09.3):

smi = "NC(F)C(I)C(F)N"
Chem.SetUseLegacyStereoPerception(False)
m = Chem.MolFromSmiles(smi)
print('\n'.join([Chem.MolToSmiles(x) for x in EnumerateStereoisomers(m)]))
Chem.SetUseLegacyStereoPerception(True)
m = Chem.MolFromSmiles(smi)
print('\n'.join([Chem.MolToSmiles(x) for x in EnumerateStereoisomers(m)]))

Output is:

N[C@H](F)[C@H](I)[C@@H](N)F
N[C@@H](F)[C@H](I)[C@@H](N)F
N[C@H](F)[C@@H](I)[C@@H](N)F
N[C@@H](F)[C@@H](I)[C@@H](N)F
N[C@H](F)[C@H](I)[C@H](N)F
N[C@H](F)[C@@H](I)[C@H](N)F

and

N[C@H](F)C(I)[C@@H](N)F
N[C@@H](F)C(I)[C@@H](N)F
N[C@H](F)C(I)[C@H](N)F
greglandrum commented 12 months ago

Accepting for the moment that we don't correctly deal with meso compounds, I believe that the correct answer here is:

N[C@H](F)[C@H](I)[C@@H](N)F
N[C@H](F)[C@@H](I)[C@@H](N)F
N[C@@H](F)C(I)[C@@H](N)F
N[C@H](F)C(I)[C@H](N)F

So neither of them is actually providing the correct answer

bp-kelley commented 12 months ago

I didn't think we supported the last two forms of chirality yet.

greglandrum commented 11 months ago

I didn't think we supported the last two forms of chirality yet.

I'm not sure what you mean by this. Which last two forms?

bp-kelley commented 11 months ago

I was talking about mess compound (the first two).

Ignoring the central Carbon, I think these are all unique

N[C@H](F)C(I)[C@@H](N)F
N[C@@H](F)C(I)[C@@H](N)F
N[C@H](F)C(I)[C@H](N)F

Visual inspection seems to agree, but I'll need to break out the model kit to be sure...

github-actions[bot] commented 1 month ago

This issue was marked as stale because it has been open for 90 days with no activity.