Open kovasap opened 6 years ago
Yeah, this looks like a bug in the way the SMARTS is generated. Thanks for reporting it!
Here's a more minimal example:
In [19]: s3 = '[NH3+:8](-[C@@H:10]2-[O:11]-[C@H:12](-[CH2:13]-[OH:14])-[C@@H:41](-[OH:42])-[C@H:43]-2-[OH:44])'
In [20]: m3 = Chem.MolFromSmiles(s3)
In [21]: print(Chem.MolToSmiles(m3))
[NH3+:8][C@@H:10]1[O:11][C@H:12]([CH2:13][OH:14])[C@@H:41]([OH:42])[C@H:43]1[OH:44]
In [22]: print(Chem.MolToSmarts(m3))
[#7H3+:8]-[#6@H:10]1-[#8:11]-[#6@H:12](-[#6H2:13]-[#8H:14])-[#6@@H:41](-[#8H:42])-[#6@@H:43]-1-[#8H:44]
and to confirm that it's not connected to the atom map info (which it shouldn't be):
In [31]: for at in m3.GetAtoms(): at.SetAtomMapNum(0)
In [32]: print(Chem.MolToSmiles(m3))
[NH3+][C@@H]1O[C@H](CO)[C@@H](O)[C@H]1O
In [33]: print(Chem.MolToSmarts(m3))
[#7H3+]-[#6@H]1-[#8]-[#6@H](-[#6H2]-[#8H])-[#6@@H](-[#8H])-[#6@@H]-1-[#8H]
i face this too
There seems to be a related bug.
with the example above:
"[#7H3+]-[#6@H]1-[#8]-[#6@H](-[#6H2]-[#8H])-[#6@@H](-[#8H])-[#6@@H]-1-[#8H]"
if we convert it multiple times with MolFromSmarts and MolToSmarts, the Smarts is modified:
print(AllChem.MolToSmarts(AllChem.MolFromSmarts("[#7H3+]-[#6@H]1-[#8]-[#6@H](-[#6H2]-[#8H])-[#6@@H](-[#8H])-[#6@@H]-1-[#8H]")))
print(AllChem.MolToSmarts(AllChem.MolFromSmarts(AllChem.MolToSmarts(AllChem.MolFromSmarts("[#7H3+]-[#6@H]1-[#8]-[#6@H](-[#6H2]-[#8H])-[#6@@H](-[#8H])-[#6@@H]-1-[#8H]")))))
print(AllChem.MolToSmarts(AllChem.MolFromSmarts(AllChem.MolToSmarts(AllChem.MolFromSmarts(AllChem.MolToSmarts(AllChem.MolFromSmarts("[#7H3+]-[#6@H]1-[#8]-[#6@H](-[#6H2]-[#8H])-[#6@@H](-[#8H])-[#6@@H]-1-[#8H]")))))))
we obtain:
"[#7&H3&+]-[#6@@&*&H1]1-[#8]-[#6@&*&H1](-[#6&H2]-[#8&H1])-[#6@@&*&H1](-[#8&H1])-[#6@&*&H1]-1-[#8&H1]"
"[#7&H3&+]-[#6@&*&*&H1]1-[#8]-[#6@&*&*&H1](-[#6&H2]-[#8&H1])-[#6@@&*&*&H1](-[#8&H1])-[#6@@&*&*&H1]-1-[#8&H1]"
"[#7&H3&+]-[#6@@&*&*&*&H1]1-[#8]-[#6@&*&*&*&H1](-[#6&H2]-[#8&H1])-[#6@@&*&*&*&H1](-[#8&H1])-[#6@&*&*&*&H1]-1-[#8&H1]"
I think this (the chiralities) gets fixed with #2570. The extra "&*" that get added are #2595 :)
When i try to convert a molecular smiles string to a smarts string, some of the stereocenters in my molecule change. Example code:
Produces output:
Unless I'm missing something, this means that atom 24, 29, 10, and 43 change their stereochemistry during the conversion. This looks like a bug to me, unless bond rotation/ring flipping can explain this?