reymond-group / map4

The MinHashed Atom Pair fingerprint of radius 2
MIT License
106 stars 33 forks source link

isomericSmiles=False #16

Closed wmhcqw closed 3 years ago

wmhcqw commented 3 years ago

First, thank you for your awesome work!

I'm wondering why you set isomericSmiles=False in file map4.py at Line 15.

From my perspective, the stereochemistry information is quite important in Molecular representation. In repo rdkit, there is an optional argument called useChirality when obtain Morgan Fingerprint.

So, if the code still works if I manually set isomericSmiles to True?

Test Code Below

import tmap as tm
from map4 import MAP4Calculator

MAP4 = MAP4Calculator(is_folded=True)

smiles_a = 'n1nn2N=C(C=C(c2n1)N)C(=O)Nc3c(cc(c(c3)C#C)F)C(=O)O[C@H]4[C@H](O)CSSC4'
mol_a = Chem.MolFromSmiles(smiles_a)
map4_a = MAP4.calculate(mol_a)

smiles_b = 'n1nn2N=C(C=C(c2n1)N)C(=O)Nc3c(cc(c(c3)C#C)F)C(=O)O[C@@H]4[C@H](O)CSSC4'  # '@' -> '@@'
mol_b = Chem.MolFromSmiles(smiles_b)
map4_b = MAP4.calculate(mol_b)

print(sum(map4_a == map4_b), MAP4.dimensions)

# [isomericSmiles=False output]: (1024, 1024)
# [isomericSmiles=True output]: (921, 1024)
alicecapecchi commented 3 years ago

Hi, yes, it is possible to set "usechirality" to True. The code should still work but I haven't tested it. However, the reason why I did not consider this option is that in this way two shingles representing the same substructure with different chirality would be considered as 100% different, which I am not sure it is necessarily a good idea. Probably implementing a second set of shingles that would account only for chirality (e.g "R|distance in bonds|S") to which one could give a desired weight would be a better option. If you experiment with MAP4 and chirality keep me updated, I am definitely curios. cheers, Alice

wmhcqw commented 3 years ago

Thank you for your reply! Your code still works when I set isomericSmiles to True in my test cases.

However, the reason why I did not consider this option is that in this way two shingles representing the same substructure with different chirality would be considered as 100% different, which I am not sure it is necessarily a good idea.

Indeed, I found this problem during my experiments. I do want to distinguish molecules with different chirality but they got too far in representation (like I mentioned above, 103 dim different with only one chirality different), which is unexpected.

...Probably implementing a second set of shingles that would account only for chirality (e.g "R|distance in bonds|S") to which one could give a desired weight would be a better option.

This is a good idea! I will try it in the future.

If you experiment with MAP4 and chirality keep me updated, I am definitely curios.

Of course! You shall close this issue for now. I may reopen it one day if I get some progress. 🍻