openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
309 stars 90 forks source link

Spec for undefined stereochemistry #146

Open j-wags opened 5 years ago

j-wags commented 5 years ago

Purpose

Because it's a complicated topic, and to maintain the ability to restart this discussion in the future, I think it's important to record our design decisions with regard to stereochemistry in this repository.

Background

Stereochemical differences are important in the SMIRNOFF spec, as they can lead to different partial charges or partial bond orders on a molecule, and (if the project goes in that direction), different assignment of library parameters using stereochemistry-dependent SMARTS.

However, the definition of meaningful stereochemistry can change depending on experimental conditions and timescale. For example a tertiary amine will invert 10^3 to 10^5 times per second in solution. Therefore, on a wet-lab-experiment timescale, the amine does not have meaningful stereochemistry. However, if one is running a QM calculation on the tertiary amine, there may be meaningful differences in the results depending on its stereochemistry. Thus, whether or not stereochemistry is required in a tertiary amine depends on your use case.

Our definition of stereochemistry will ideally cover the fact that MD simulations can run for micro- and milli-seconds, therefore molecular features that are likely to change stereochemistry on those timescales should not be assigned stereochemistry-specific properties. It is unclear currently whether this decision should be made at the point of SMIRNOFF molecule definition (ie, we enforce that short-lived stereochemical features do not distinguish one OFF molecule from another, thereby preventing our engine from assigning them different simulation parameters), or if the SMARTS that define our parameter libraries should be policed moving forward, to ensure that no two parameters are differentiated only by low-barrier stereochemistry (eg, we will never create a SMARTS that relies on the stereochemistry of a tertiary amine).

Current implementation plans

Read below for details, but I think a good plan is to let the toolkits use their own (slightly different!) standards in choosing how to accept SMILES. While they mostly agree with what "fully specified stereochemistry" is, they differ on some borderline cases.

While this isn't ideal, ensuring the same stereochemistry standard across toolkits will require a lot of engineering, and will restrict us to the union of the most restrictive stereochemistry standards of all of our toolkits (eg. OE omega complains if bond stereo isn't defined for CC=NH, which is a bit much).

3D molecule input

We haven't run into too many problems with sdf and mol2 input. OE and RDK currently use their default stereochemistry detection in these cases and seem to rarely have problems. I haven't dug very deeply into these, so I will update this post as I look further into it.

Potentially relevant to 3D stereochemistry detection: #141, #137

SMILES input

"Common sense" atom and double bond stereochemistry is required, as is implemented by default in OE and RDK. There are, however, a few cases where disagreement occurs, both between OE and RDK, and between a toolkit and our intuition.

This table records the cases of obvious and border-case stereochemistry we've found so far, whether we think molecules should labeled as having undefined stereochemistry (on the MD timescale), and whether the toolkits define the SMILES as having undefined stereochemistry. If applicable, experimental barriers/interconversion rates from literature should be added.


SMILES Description Humans say it has undefined stereochem? OE says it has undefined stereochem? RDK says it has undefined stereochem? barrier conversion rate
C(Cl)(F)(Br)C Undefined tetrahedral center Yes Yes Yes
C(F)=C(Cl) Undefined C=C double bond Yes Yes Yes
C\C(F)=C(/F)CC(C)(Cl)Br Undefined stereocenter, defined stereobond Yes Yes Yes
CC(F)=C(F)C[C@@](C)(Cl)Br Defined stereocenter, undefined stereobond Yes Yes Yes
C\C(F)=C(/F)C[C@@](C)(Cl)Br defined stereocenter, defined stereobond No No No
CN=C(F)C Undefined imine Yes Yes Yes > 40 kcal/mol
N=C(F)C Undefined primary imine No Yes No
N(Cl)(F)C Undefined pyrimidal nitrogen No Optional [1] No 10^3 - 10^5 sec^-1
CP(O)(=O)N tetrahedral phosphate with one protonated oxygen Yes Yes Yes
CNC(C)=O keto-form of amide bond No No No
CN=C(C)O enol-form of amide bond [2] Yes Yes
C(=[O+][H])C oximium ? Yes No
N(C)(CC)(CCC)=O ? Yes? Yes No
C[C@H]1CN1C, may have tested using different method trivalent N in 3-membered ring Yes, as of 2019.3 Yes Yes
C[C@H]1CN1, may have tested using different method trivalent N in 3-membered ring (1 hydrogen) ? No? Yes
[H]c1c(c(c(c(c1[H])[H])P2C(=C3C(=C2c4c(c(c(c(n4)[H])[H])[H])[H])C(C(C(C3([H])[H])([H])[H])([H])[H])([H])[H])c5c(c(c(c(n5)[H])[H])[H])[H])[H])[H] trivalent phosphorous ? No Yes considered stereogenic starting in RDK 2020.3 release https://github.com/rdkit/rdkit/issues/2949
CCS(=O)C thione ? ? ? ? ?
COP(=O)([O-])OCC deprotonated NA phosphate backbone N ? ? ? deprotonated at physiological pH (pKa=2ish)

[1] OE says "Yes" when using method A, but when "No" when method B is given enum_nitrogens=False. [2] I'm not certain what the rotation timescale for amide bonds is, but this might just be in molecular dynamics range. Letting the enol form of the amide get stereochemistry might give users the choice of whether they want it defined.

Code to reproduce the above

from rdkit import Chem 
from rdkit.Chem import EnumerateStereoisomers

# Define the SMILESes, 
smiles_ambiguous = [
                    ('C(Cl)(F)(Br)C', True), # Undefined tetrahedral center
                    ('C(F)=C(Cl)', True), # Undefined RC=CR 
                    ('C(F)(Br)=C(Cl)(I)', True), # Undefined RC=CR 
                    ("C\C(F)=C(/F)CC(C)(Cl)Br", True), # Undefined stereocenter, defined stereobond
                    ("CC(F)=C(F)C[C@@](C)(Cl)Br", True), # Defined stereocenter, undefined stereobond
                    ("C\C(F)=C(/F)C[C@@](C)(Cl)Br", False), # defined stereocenter, defined stereobond
                    ('CN=C(F)C', True), # RC=NR
                    ('N=C(F)C', False), # RC=NH
                    ('N(Cl)(F)C', False), #Pyrimidal nitrogen center
                    ('CP(O)(=O)N', True), #phosphate
                    ('CNC(C)=O', False), # keto form of amide
                    ('CN=C(C)O', False), #enol form of peptide
                    ('C(=[O+][H])C', False), # charged oxygen
                    ('N(C)(CC)(CCC)=O', True), # ?
                  ]

def rdkit_unspec_stereo(smiles):
    from rdkit import Chem
    from rdkit.Chem import EnumerateStereoisomers

    rdmol = Chem.MolFromSmiles(smiles)

    # Adding H's can hide undefined bond stereochemistry, so we have to test for undefined stereo here
    unspec_stereo = False
    rdmol_copy = Chem.Mol(rdmol)
    enumsi_opt = EnumerateStereoisomers.StereoEnumerationOptions(maxIsomers=2, onlyUnassigned=True)
    stereoisomers = [isomer for isomer in Chem.EnumerateStereoisomers.EnumerateStereoisomers(rdmol_copy, enumsi_opt)]
    if len(stereoisomers) != 1:
        unspec_stereo = True

    if unspec_stereo:
        return True
    return False

def oe_unspec_stereo_A(smiles):
    molecule = oechem.OEMol()
    oechem.OESmilesToMol(molecule, smiles)
    atoms_to_highlight = list()
    bonds_to_highlight = list()
    for atom in molecule.GetAtoms():
        if atom.IsChiral() and not atom.HasStereoSpecified(oechem.OEAtomStereo_Tetrahedral):
            # Check if handness is specified
            v = []
            for nbr in atom.GetAtoms():
                v.append(nbr)
            stereo = atom.GetStereo(v, oechem.OEAtomStereo_Tetrahedral)
            if stereo == oechem.OEAtomStereo_Undefined:
                atoms_to_highlight.append(atom)
    for bond in molecule.GetBonds():
        if bond.IsChiral() and not bond.HasStereoSpecified(oechem.OEBondStereo_CisTrans):
            v = []
            for neigh in bond.GetBgn().GetAtoms():
                if neigh != bond.GetEnd():
                    v.append(neigh)
                    break
            for neigh in bond.GetEnd().GetAtoms():
                if neigh != bond.GetBgn():
                    v.append(neigh)
                    break
            stereo = bond.GetStereo(v, oechem.OEBondStereo_CisTrans)

            if stereo == oechem.OEBondStereo_Undefined:
                bonds_to_highlight.append(bond)
    if len(atoms_to_highlight+bonds_to_highlight) == 0:
        return False
    return True

def oe_unspec_stereo_B(smiles):
    from openeye import oechem
    from openeye import oeomega
    maxcenters = 5
    force_flip = False
    enum_nitrogens = False
    mol = oechem.OEMol()
    if not oechem.OESmilesToMol(mol, smiles):
        print(s, "Not Parsed")
    n_isomers = 0
    for isomer in oeomega.OEFlipper(mol, maxcenters, force_flip, enum_nitrogens):
        n_isomers += 1
    if n_isomers == 1:
        return False
    elif n_isomers > 1:
        return True

method_list = [rdkit_unspec_stereo, oe_unspec_stereo_A, oe_unspec_stereo_B]
for smiles, should_be_ambiguous in smiles_ambiguous:
    method_results = [method(smiles) for method in method_list]
    print(smiles, should_be_ambiguous, method_results[0], method_results[1], method_results[2])
ChayaSt commented 5 years ago

Thank you @j-wags for this write-up. I found that if Hydrogens are added to a primary imine bond, the code with RDKit that I was using to flag bonds that are missing stereo information finds an undefined stereobond.

def flag_unspec_stereo(mol):
    unspec = False
    Chem.FindPotentialStereoBonds(mol)
    for bond in mol.GetBonds():
        if bond.GetStereo() == Chem.BondStereo.STEREOANY:
            print(bond.GetBeginAtom().GetSymbol(), bond.GetSmarts(), bond.GetEndAtom().GetSymbol())
            unspec = True
    return unspec

mol = Chem.MolFromSmiles('CN(C)C(=N)NC(=N)N')
mol_h = Chem.AddHs(mol)
print(flag_unspec_stereo(mol))
print(flag_unspec_stereo(mol_h))

output:

False
C = N
C = N
True

The code you provided above does not find unspecified stereochemistry even if I add Hs to the molecule.

def rdkit_unspec_stereo(smiles):
    from rdkit import Chem
    from rdkit.Chem import EnumerateStereoisomers

    rdmol = Chem.MolFromSmiles(smiles)
    # Adding H's can hide undefined bond stereochemistry, so we have to test for undefined stereo here
    unspec_stereo = False
    rdmol_copy = Chem.Mol(rdmol)
    rdmol_copy = Chem.AddHs(rdmol)
    Chem.FindPotentialStereoBonds(rdmol_copy)
    enumsi_opt = EnumerateStereoisomers.StereoEnumerationOptions(maxIsomers=2, onlyUnassigned=True)
    stereoisomers = [isomer for isomer in Chem.EnumerateStereoisomers.EnumerateStereoisomers(rdmol_copy, enumsi_opt)]
    if len(stereoisomers) != 1:
        unspec_stereo = True

    if unspec_stereo:
        return True
    return False

print(rdkit_unspec_stereo('CN(C)C(=N)NC(=N)N'))
False

It's interesting because if you check the flags in each bond, these bonds are flagged as undefined stereo but when enumerating stereoisomers they are ignored.

jchodera commented 5 years ago

Thanks for putting together this thoughtful issue! Perhaps we should add a stereochemistry discussion to our in-person get-together agenda in Jan?

Philosophically, I think these ideas should guide our thinking:

j-wags commented 5 years ago

@jchodera. Agreed -- In the interest of time, I need to table this issue for now, and the default toolkit behavior isn't that bad. I don't think this requires time at the January in-person meeting, though maybe we can plan an online meeting for stereochemistry enthusiasts later in January.

@ChayaSt Yes -- One thing I'm coming to terms with on this issue is that stereochemistry is complicated. We want a certain behavior when processing a molecule for stereochemistry (for example, we want RC=NH to not require stereochemistry), but as it stands, each toolkit has decided what their built-in stereochemistry model does with that moiety (As you mentioned, it doesn't help that RDKit's stereo expectations change depending on whether H's are added).

For that reason, and kind of to build toward John's "compatibility" idea, I think we should treat this as a test-driven problem. That is, I'm planning on having a data set of SMILESes for which we (as CMILES/SMIRNOFF developers) say what behavior we want for each SMILES string, and then we create (potentially complicated) functions written using the different toolkits, until we get the behavior we want for all the SMILES strings in our data set. (OpenEye just wrote back suggesting we do this)

ChayaSt commented 5 years ago

There are two more issues that I found that we should be keeping record of:

  1. Loss of stereo information between toolkits. I found that for certain cases, when rdkit reads an isomeric SMILES, it will not adhere to the specification and the SMILES written from the rdkit molecule will be missing the stereochemistry. Examples: A. Pyramidal nitrogen, bismuth and arsenic. Only the pyramidal nitrogen is relevant right now. B. Sulfoxide
  2. Inconsistency with missing stereo flagging between rdkit and openeye I found all these cases in DrugBank. A. Metals B. Bridged systems C. Larger symmetric R groups. Here is a file of all cases in DrugBank where rdkit and openeye flagged the molecules inconsistently. Since these are from DrugBank, not all molecules are currently relevant. But since I test cmiles against all of DrugBank, I'm keeping this file for the record.

This notebook has code showing these (and some more corner) issues.

ChayaSt commented 5 years ago

Something I found with fragmenter and cmiles, if a molecule has spurious isomeric information (a fragment that lost the stereochemistry from parent but still has that information), oechem.OEMolToSmiles() will write out a SMILES string with that stereo information.

Example:

from openeye import oechem
mol = oechem.OEMol()
oechem.OESmilesToMol(mol, '[H][C@@](C1CCCC1)(C)[H]')
print(oechem.OEMolToSmiles(mol))

The output: '[H][C@@](C1CCCC1)(C)[H]'

However, this molecule does not have a chiral center: image

It will be great if we can check for that, but I'm not sure how do that.

j-wags commented 5 years ago

Thanks for the update. I hadn't thought about what the behavior should be if we have overdefined stereochemistry. Not sure that I have an answer right now, but this is a good place to park this info.

I also haven't checked if our particular molecule-loading pipeline (eg. OFFMol.from_oemol()) would catch this. I know that, in some cases, we force the toolkit to recompute stereochemistry labels so we can get CIP codes (instead of the OETK's internal CW/CCW definitions), so we might coincidentally smooth this out in the process.

proteneer commented 5 years ago

This is an awesome thread. Pinging @greglandrum @bpkelley and @coleb just so they're at least aware of these toolkit differences.

ChayaSt commented 5 years ago

Another behavioral difference we should be aware of between rdkit and openeye.

When explicit hydrogens have map indices, RDKit considers them nonequivalent and it will flag even a methyl as a potential chiral center. Openeye does not flag these as potential chiral centers.

See example below:

from rdkit import Chem
import warnings

def has_stereo_defined(molecule):
    """

    Parameters
    ----------
    molecule

    Returns
    -------

    """

    unspec_chiral = False
    unspec_db = False
    problematic_atoms = list()
    problematic_bonds = list()
    chiral_centers = Chem.FindMolChiralCenters(molecule, includeUnassigned=True)
    for center in chiral_centers:
        atom_id = center[0]
        if center[-1] == '?':
            unspec_chiral = True
            problematic_atoms.append((atom_id, molecule.GetAtomWithIdx(atom_id).GetSmarts()))

    # Find potential stereo bonds that are unspecified
    Chem.FindPotentialStereoBonds(molecule)
    for bond in molecule.GetBonds():
        if bond.GetStereo() == Chem.BondStereo.STEREOANY:
            unspec_db = True
            problematic_bonds.append((bond.GetBeginAtom().GetSmarts(), bond.GetSmarts(),
                                                bond.GetEndAtom().GetSmarts()))
    if unspec_chiral or unspec_db:
        warnings.warn("Stereochemistry is unspecified. Problematic atoms {}, problematic bonds {}".format(
                problematic_atoms, problematic_bonds))
        return False
    else:
        return True

a = Chem.rdmolfiles.SmilesParserParams()
a.removeHs = False
m = Chem.MolFromSmiles('[H]C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H]', a)
print(has_stereo_defined(m))
m = Chem.MolFromSmiles('[H][C:1]([H])([H])C([H])([H])C([H])([H])[C:2]([H])([H])[H]', a)
print(has_stereo_defined(m))
m = Chem.MolFromSmiles('[H:3][C:1]([H:4])([H:5])C([H:6])([H:7])C([H:8])([H:9])[C:2]([H:10])([H:11])[H:12]', a)
print(has_stereo_defined(m))

The output is:

True
True
False
/Users/sternc1/anaconda3/envs/my-rdkit-env/lib/python3.6/site-packages/ipykernel_launcher.py:33: UserWarning: Stereochemistry is unspecified. Problematic atoms [(1, '[C:1]'), (4, 'C'), (7, 'C'), (10, '[C:2]')], problematic bonds []

For Openeye

def has_stereo_defined(molecule):
    """
    Check if any stereochemistry in undefined.
    Parameters
    ----------
    molecule: OEMol

    Returns
    -------
    bool: True if all stereo chemistry is defined.
        If any stereochemsitry is undefined, raise and exception.

    """

    # perceive stereochemistry
    oechem.OEPerceiveChiral(molecule)

    unspec_chiral = False
    unspec_db = False
    problematic_atoms = list()
    problematic_bonds = list()
    for atom in molecule.GetAtoms():
        if atom.IsChiral() and not atom.HasStereoSpecified(oechem.OEAtomStereo_Tetrahedral):
            # Check if handness is specified
            v = []
            for nbr in atom.GetAtoms():
                v.append(nbr)
            stereo = atom.GetStereo(v, oechem.OEAtomStereo_Tetrahedral)
            if stereo == oechem.OEAtomStereo_Undefined:
                unspec_chiral = True
                problematic_atoms.append((atom.GetIdx(), oechem.OEGetAtomicSymbol(atom.GetAtomicNum())))
    for bond in molecule.GetBonds():
        if bond.IsChiral() and not bond.HasStereoSpecified(oechem.OEBondStereo_CisTrans):
            v = []
            for neigh in bond.GetBgn().GetAtoms():
                if neigh != bond.GetEnd():
                    v.append(neigh)
                    break
            for neigh in bond.GetEnd().GetAtoms():
                if neigh != bond.GetBgn():
                    v.append(neigh)
                    break
            stereo = bond.GetStereo(v, oechem.OEBondStereo_CisTrans)

            if stereo == oechem.OEBondStereo_Undefined:
                unspec_db = True
                a1 = bond.GetBgn()
                a2 = bond.GetEnd()
                a1_idx = a1.GetIdx()
                a2_idx = a2.GetIdx()
                a1_s = oechem.OEGetAtomicSymbol(a1.GetAtomicNum())
                a2_s = oechem.OEGetAtomicSymbol(a2.GetAtomicNum())
                bond_order = bond.GetOrder()
                problematic_bonds.append((a1_idx, a1_s, a2_idx, a2_s, bond_order))
    if unspec_chiral or unspec_db:
        warnings.warn("Stereochemistry is unspecified. Problematic atoms {}, problematic bonds {}, SMILES: {}".format(
                problematic_atoms,
                problematic_bonds, oechem.OEMolToSmiles(molecule)))
        return False
    else:
        return True

mol = oechem.OEMol()
oechem.OESmilesToMol(mol, '[H]C([H])([H])C([H])([H])C([H])([H])C([H])([H])[H]')
print(has_stereo_defined(mol))
mol = oechem.OEMol()
oechem.OESmilesToMol(mol, '[H][C:1]([H])([H])C([H])([H])C([H])([H])[C:2]([H])([H])[H]')
print(has_stereo_defined(mol))
mol = oechem.OEMol()
oechem.OESmilesToMol(mol, '[H:3][C:1]([H:4])([H:5])C([H:6])([H:7])C([H:8])([H:9])[C:2]([H:10])([H:11])[H:12]')
print(has_stereo_defined(mol))

The output:

True
True
True
coleb commented 5 years ago

If I recall correctly from a presentation Brian Kelley did previously this is an intentional feature of RDKit as it allows R-group decomposition that maintains the stereochemistry of the attachment point.

greglandrum commented 5 years ago

Yes, this is absolutely intended behavior. The RDKit also uses the atom mapping numbers in canonicalization of SMILES. The general idea is that if the user has included that information, it should be used.

ChayaSt commented 5 years ago

The RDKit also uses the atom mapping numbers in canonicalization of SMILES.

Yes, I noticed that. I understand why it's the intended behavior. But here I don't want to use that information for stereochemistry. I tried creating a copy of the molecule and removed the map indices. However, __computed_props still has '_ChiralityPossible': 1. Is there a way to recompute the properties on the molecule besides creating a new molecule from a SMILES without the map indices?

greglandrum commented 5 years ago

Sure:

In [3]: m = Chem.MolFromSmiles('C[C@H]([F:1])[F:2]')                                                                                          

In [4]: m.GetAtomWithIdx(1).GetPropsAsDict()                                                                                                  
Out[4]: 
{'__computedProps': <rdkit.rdBase._vectNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE at 0x7fdb2183e030>,
 '_CIPRank': 1,
 '_ChiralityPossible': 1,
 '_CIPCode': 'R'}

In [6]: for at in m.GetAtoms(): at.SetAtomMapNum(0)                                                                                           

In [8]: Chem.AssignStereochemistry(m,force=True,cleanIt=True,flagPossibleStereoCenters=True)                                                  

In [9]: m.GetAtomWithIdx(1).GetPropsAsDict()                                                                                                  
Out[9]: 
{'__computedProps': <rdkit.rdBase._vectNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE at 0x7fdb2183e630>,
 '_CIPRank': 1}
greglandrum commented 5 years ago

If you just want to clear the computed properties on the molecule and all atoms and bonds, you can do:

In [10]: m.ClearComputedProps()                                                                                                               

In [11]: m.GetAtomWithIdx(1).GetPropsAsDict()                                                                                                 
Out[11]: {'__computedProps': <rdkit.rdBase._vectNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE at 0x7fdb12ff6450>}
ChayaSt commented 5 years ago

@j-wags, the latest release of rdkit now recognizes a N in a three membered ring as a potential stereo center. The relevant rdkit issue is here: https://github.com/rdkit/rdkit/issues/2268

What I found is that an N in a 3 membered ring that has an H is only flagged as a potential chiral center with rdkit. OpenEye does not consider it stereogenic. Below is the code to demonstrate:

from rdkit import Chem
from openeye import oechem
def has_all_chiral_defined_rd(smiles):
    undefined_atoms = []
    unspec_chiral = False
    mol = Chem.MolFromSmiles(smiles)
    mol = Chem.AddHs(mol)
    chiral_centers = Chem.FindMolChiralCenters(mol, includeUnassigned=True)
    for center in chiral_centers:
        atom_id = center[0]
        if center[-1] == '?':
            unspec_chiral = True
            undefined_atoms.append((atom_id, mol.GetAtomWithIdx(atom_id).GetSmarts()))
    if unspec_chiral:
        print(undefined_atoms)
    return unspec_chiral

def has_all_chiral_defined_oe(smiles):
    oemol = oechem.OEMol()
    oechem.OESmilesToMol(oemol, smiles)
    unspec_chiral = False
    undefined_atoms = []
    oechem.OEAddExplicitHydrogens(oemol)
    oechem.OEPerceiveChiral(oemol)
    for atom in oemol.GetAtoms():
        if atom.IsChiral() and not atom.HasStereoSpecified(oechem.OEAtomStereo_Tetrahedral):
            #Check handness
            v = []
            for nbr in atom.GetAtoms():
                v.append(nbr)
            stereo = atom.GetStereo(v, oechem.OEAtomStereo_Tetrahedral)
            if stereo == oechem.OEAtomStereo_Undefined:
                unspec_chiral = True
                undefined_atoms.append((atom.GetIdx(), oechem.OEGetAtomicSymbol(atom.GetAtomicNum())))
    if unspec_chiral:
        print(undefined_atoms)
    return unspec_chiral

smiles = ['C[C@H]1CN1','C[C@H]1CN1C' ]
for sm in smiles:
    print(sm)
    print('openeye')
    print(has_all_chiral_defined_oe(sm))
    print('rdkit')
    print(has_all_chiral_defined_rd(sm))

The output:

C[C@H]1CN1
openeye
False
rdkit
[(3, 'N')]
True
C[C@H]1CN1C
openeye
[(4, 'N')]
True
rdkit
[(3, 'N')]
True
j-wags commented 5 years ago

@ChayaSt Thanks for the writeup. I'm adding these cases to the table at the top of the page.

I'm a little surprised that OE doesn't flag the first case as being possibly-stereogenic. In my experience, OE has been more aggressive about identifying atoms as possible stereocenters. This is the first case I've seen where RDKit says yes and OE says no.

j-wags commented 4 years ago

Updated table with trivalent phosphorous, which InChI consideres stereogenic, and RDKit detects as of 2020.3 release. https://github.com/rdkit/rdkit/issues/2949