rdkit / mmpdb

A package to identify matched molecular pairs and use them to predict property changes.
Other
197 stars 55 forks source link

Incorrect atom mappings for the new generated molecules via transformation #13

Closed chengthefang closed 5 years ago

chengthefang commented 5 years ago

Hi developers,

I am keeping trying the mmpdb regarding the "transform" function. I found that for 2 cuts and 3 cuts, the new generated SMILES messed up with the atom mappings in the transform rule.

For example,

Original SMILES New   SMILES from_smiles to_smiles
CC(C)c1nc(C(=O)NCc2ccccc2)no1 CC(C)CNC(=O)N1CCC(c2ccccc2)C1 [:1]CNC(=O)c1noc([:2])n1 [:1]CNC(=O)N1CCC([:2])C1
CC(C)c1nc(C(=O)NCc2ccccc2)no1 CC(C)CNC(=O)c1cc(-c2ccccc2)no1 [:1]CNC(=O)c1noc([:2])n1 [:1]CNC(=O)c1cc([:2])no1

I expect that the transformed linkers (i.e. "to_smiles") should connect with two other unchanged fragments at the same attachment points (1 & 2) as the old linkers (i.e. "from_smiles"). However, it shows the new generated molecules (i.e. "New SMILES") flip the transformed linker over. In other words, the atom mappings in "From Smiles" to "To Smiles" are correct, but the atom mappings are incorrect in the new generated whole molecule.

Would you mind take a look at this issue?

Thanks, Cheng

KramerChristian commented 5 years ago

Hi Cheng,

I can take a look at this. To do so, can you send me a test case that has the molecules you use to build up the relevant part of the mmpdb database and the commands you used for fragmentation, indexing, and transform?

Thanks, Christian

KramerChristian commented 5 years ago

As a starter, could you post the output from the transform call using --explain?

Thanks, Christian

chengthefang commented 5 years ago

@KramerChristian Hi Christian, thank you much for your kind concern! I have sent you an email with my test files since those files are large.

Thanks, Cheng

KramerChristian commented 5 years ago

Hi Cheng,

I received your files, and I was able to see the error you mean. It will probably take me a while to dig into where this comes from. I will work on it as soon as I find the time for it.

Bests, Christian

chengthefang commented 5 years ago

@KramerChristian Hi Christian, thank you much for your time looking into this issue. I suppose it probably has something to do with the enumeration of the transformed smiles and the constant smiles to the new molecules. I tried different datasets, and came across the same issues.

Thanks, Cheng

KramerChristian commented 5 years ago

Hi Cheng,

I believe I found and fixed the bug. The problem was that mmpdb uses an attachment order to map the constant fragments to the variable part. This was simply not taken into account during enumeration of the transform products, and constant fragments were added in the RDKit-canonical order. Most of the time, this order was coincidentally the correct one, so the bug only showed up in the cases where the constant fragments have to be permuted.

I added a few lines to fix this and pushed the fix directly to the master branch. The wrong compounds from your example then did not show up any more. Could you please test it on your inhouse datasets and let me know whether you still find erroneous compounds?

Thank you, Christian

chengthefang commented 5 years ago

@KramerChristian Hi Christian. Appreciate much for your time and help. I will try it from my end, and update you with the new tests.

Thanks, Cheng

chengthefang commented 5 years ago

@KramerChristian Hi Christian. I updated with your new codes, and tried other datasets. It works as expected now! Thank you for your help. I think this issue has been solved. Feel free to close the ticket now.

Thank you, Cheng

KramerChristian commented 5 years ago

@chf42 Thanks a lot for your fast feedback and the error reporting and testing!