greglandrum / rdkit-blog-fastpages

Other
19 stars 7 forks source link

R-group decomposition and molzip | RDKit blog #14

Open utterances-bot opened 2 years ago

utterances-bot commented 2 years ago

R-group decomposition and molzip | RDKit blog

Generating molecules from all possible combinations of R groups

https://greglandrum.github.io/rdkit-blog/tutorial/rgd/2022/03/14/rgd-and-molzip.html

hollisullivan27 commented 2 years ago

Thank you, I have been looking for a solution for Markush enumerations! Does this new implementation allow for multiple attachment points? For example, when the R group is a linker?

greglandrum commented 2 years ago

@hollisullivan27, I don't understand your question; both examples in the blog post have a fragment which has multiple attachment points. Can you be specific, ideally including molecules, what you want to do?

xescape commented 2 years ago

Hello,

I'm trying to replicate the steps in this tutorial with my own molecule and ran into an error. The molecule has two R groups attached to the same atom (Nitrogen). RGroupDecompose puts both into one 'R group' (R1), but molzip can't put them back together. I'd appreciate any insight on this. Thank you!

from rdkit.Chem import AllChem, RWMol, molzip, rdRGroupDecomposition as rgd

mol = AllChem.MolFromSmiles("CC1(C)CC(N(CCc2ccccc2)Cc2ccccc2)CCO1")
core = AllChem.MolFromSmiles("NCc1ccccc1")

rgs = rgd.RGroupDecompose([core], [mol])
core_fragment = rgs[0][0]['Core']
r1 = rgs[0][0]['R1']

product = RWMol(core_fragment)
product.InsertMol(r1)
molzip(product)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
testing.ipynb Cell 11' in <cell line: 10>()
      [8](vscode-notebook-cell://testing.ipynb#ch0000010vscode-remote?line=7) product = Chem.RWMol(core_fragment)
      [9](vscode-notebook-cell://testing.ipynb#ch0000010vscode-remote?line=8) product.InsertMol(r1)
---> [10](vscode-notebook-cell://testing.ipynb#ch0000010vscode-remote?line=9) Chem.molzip(product)

RuntimeError: Invariant Violation
    molzip: bond info already exists for end atom with label:1
    Violation occurred on line 907 in file Code/GraphMol/ChemTransforms/MolFragmenter.cpp
    Failed Expression: !bond.b
    RDKIT: 2021.09.5
    BOOST: 1_67

The molecule: image

greglandrum commented 2 years ago

Hi @xescape. This is exactly why the section of the blog post which says: "Remove any R groups which have more than one dummy atom. This happens if an R group is attached to the core at multiple points and it may mess up the rest of the analysis." is there. Molzip just doesn't support this at the moment.

The easiest solution is to add explicit dummy atoms to your core on atoms which can have more than one substituent:

core = AllChem.MolFromSmiles("[*:1]N([*:2])Cc1ccccc1")

rgs = rgd.RGroupDecompose([core], [mol])
core_fragment = rgs[0][0]['Core']
r1 = rgs[0][0]['R1']
r2 = rgs[0][0]['R2']
product = RWMol(core_fragment)
product.InsertMol(r1)
product.InsertMol(r2)
p = molzip(product)
print(Chem.MolToSmiles(p))
xescape commented 2 years ago

Ah, sorry I missed that. Thanks for your help!

On Thu, Mar 24, 2022 at 12:33 AM Greg Landrum @.***> wrote:

Hi @xescape https://github.com/xescape. This is exactly why the section of the blog post which says: "Remove any R groups which have more than one dummy atom. This happens if an R group is attached to the core at multiple points and it may mess up the rest of the analysis." is there. Molzip just doesn't support this at the moment.

The easiest solution is to add explicit dummy atoms to your core on atoms which can have more than one substituent:

core = AllChem.MolFromSmiles("[:1]N([:2])Cc1ccccc1")

rgs = rgd.RGroupDecompose([core], [mol]) core_fragment = rgs[0][0]['Core'] r1 = rgs[0][0]['R1'] r2 = rgs[0][0]['R2'] product = RWMol(core_fragment) product.InsertMol(r1) product.InsertMol(r2) p = molzip(product) print(Chem.MolToSmiles(p))

— Reply to this email directly, view it on GitHub https://github.com/greglandrum/rdkit-blog/issues/14#issuecomment-1077058859, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZXVBCCSDTPHYYQZ3L5Q3DVBPWAFANCNFSM5Q2PVHWA . You are receiving this because you were mentioned.Message ID: @.***>

DavidACosgrove commented 2 years ago

From my experiments and poking about in the code, it appears that molzip uses the atom map numbers on the dummy atoms by default, not the isotope numbers as you say above. That's certainly consistent with the [:1] notation in the SMILES you show. Isotopes would show as [1]. Could you amend the blog, as it's the first hit when I searched for "rdkit molzip" and it sent me off on a bit of a wild goose chase. FragmentOnBonds uses isotopes to label the dummy atoms at the fragmentation points, which means the two don't play well together, but that's a separate issue.

ErikCVik commented 2 years ago

Running this in PyCharm I get an error conf = core.GetConformer() ValueError: Bad Conformer Id

rdkit version = 2022.03.5 installed from conda