rdkit / rdkit

The official sources for the RDKit library
BSD 3-Clause "New" or "Revised" License
2.52k stars 852 forks source link

Substructure matching for SMARTS vs SMILES with chirality #5354

Open jasondbiggs opened 2 years ago

jasondbiggs commented 2 years ago

Often with this kind of issue, the culprit is my misunderstanding of the differences between SMILES and SMARTS grammar. Maybe that is the case here, but I don't see the reason.

molecule = Chem.MolFromSmiles('[C@H](Cl)(Br)I')
shouldMatch = Chem.MolFromSmarts('[C@H](Cl)(Br)I')
shouldNotMatch = Chem.MolFromSmarts('[C@@H](Cl)(Br)I')
molecule.HasSubstructMatch(shouldMatch, useChirality=True)
#False
molecule.HasSubstructMatch(shouldNotMatch, useChirality=True)
#True

So should a molecule created from the SMILES "[C@H](Cl)(Br)I" match a query molecule created from the SMARTS "[C@H](Cl)(Br)I"?

bp-kelley commented 2 years ago

It looks like a bug to me.

Interestingly, it matches the canonical smiles when used as a smarts

>>> Chem.MolToSmiles(molecule)
'Cl[C@@H](Br)I'
>>> molecule.HasSubstructMatch(Chem.MolFromSmarts('Cl[C@@H](Br)I'), useChirality=True)
True
jangerit commented 4 months ago

Hi, I ran into a similar issue:

>>> mol = Chem.MolFromSmiles("C=C(C)[C@H]1CC=C(C)C(=O)C1")
>>> substr = ["[CD3@@H;R]-[CD3;R0]", "[CD3@H;R]-[CD3;R0]", "[C@@]", "[C@]"]
>>> for s in substr:
>>>     print(mol.HasSubstructMatch(Chem.MolFromSmarts(s), useChirality=True))
>>>     print(mol.GetSubstructMatch(Chem.MolFromSmarts(s), useChirality=True))
True
(3, 1)
True
(3, 1)
True
(3,)
True
(3,)

Configuration

Any updates on this? Thanks a lot for your help.