Closed Mryangkaitong closed 5 years ago
Providing access to the seed is a nice feature, but we don't want to just set the seed to some privileged number by default for two reasons: 1) that silently changes the behavior of the conformer generator and 2) it doesn't grant the user access to the parameter. A better approach is to add a confgen param for the seed that is passed to rdkit.Chem.AllChem.EmbedMultipleConfs
and then have a global default to -1 (the default current value in RDKit). Then users such as yourself can set it when you want reproducible conformers.
Doing this will require some changes to the internals of E3FP, so I'll go ahead and commit a fix myself. Thanks for taking initiative on this!
This feature was added in 098934d.
running result: [Fingerprint(indices=array([17, 71, 188, 195, 206, 224, 239, 288, 322, 324, 349, 356, 390, 401, 424, 473, 489, 503, 504, 561, 562, 621, 652, 666, 714, 745, 763, 778, 805, 816, 836, 914, 938, 981, 999]), level=5, bits=1024, name=ritalin_0), Fingerprint(indices=array([71, 188, 239, 282, 300, 322, 349, 356, 358, 390, 401, 415, 473, 489, 503, 504, 532, 561, 562, 621, 652, 666, 714, 745, 763, 778, 795, 836, 853, 914, 938, 981, 982]), level=5, bits=1024, name=ritalin_1), Fingerprint(indices=array([17, 71, 109, 188, 224, 239, 250, 322, 349, 355, 356, 390, 401, 473, 489, 503, 504, 539, 561, 562, 621, 649, 652, 666, 714, 745, 763, 778, 853, 914, 916, 938, 981, 988]), level=5, bits=1024, name=ritalin_2)]
[Fingerprint(indices=array([8, 71, 169, 175, 188, 224, 239, 322, 349, 356, 390, 401, 407, 420, 473, 489, 503, 504, 532, 556, 561, 562, 621, 652, 656, 666, 703, 714, 745, 763, 778, 914, 930, 981]), level=5, bits=1024, name=ritalin_0), Fingerprint(indices=array([15, 71, 188, 206, 224, 239, 268, 270, 322, 349, 356, 390, 401, 431, 473, 489, 503, 504, 560, 561, 562, 621, 652, 666, 707, 714, 745, 762, 763, 778, 826, 914, 930, 981]), level=5, bits=1024, name=ritalin_1), Fingerprint(indices=array([17, 71, 76, 188, 206, 224, 239, 322, 349, 356, 390, 393, 401, 402, 473, 489, 503, 504, 561, 562, 621, 652, 666, 680, 714, 745, 763, 778, 826, 886, 914, 930, 981, 987, 1012]), level=5, bits=1024, name=ritalin_2)]
[Fingerprint(indices=array([8, 71, 169, 175, 188, 224, 239, 322, 349, 356, 390, 401, 407, 420, 473, 489, 503, 504, 532, 556, 561, 562, 621, 652, 656, 666, 703, 714, 745, 763, 778, 914, 930, 981]), level=5, bits=1024, name=ritalin_0), Fingerprint(indices=array([71, 112, 179, 188, 224, 239, 322, 349, 356, 363, 390, 401, 473, 489, 503, 504, 532, 561, 562, 601, 621, 652, 666, 714, 745, 763, 778, 805, 809, 836, 853, 914, 938, 981, 1011]), level=5, bits=1024, name=ritalin_1), Fingerprint(indices=array([15, 71, 188, 206, 224, 239, 268, 270, 322, 349, 356, 390, 401, 431, 473, 489, 503, 504, 560, 561, 562, 621, 652, 666, 707, 714, 745, 762, 763, 778, 826, 914, 930, 981]), level=5, bits=1024, name=ritalin_2)]
You can find different 3D fingerprints corresponding to the same molecule, and how similar are they? Here, I tried 10 different Smiles ,using tanimoto similarity
0 CC12CCC(=O)C=C1NCC1C2CCC2(C)C(C(=O)N3CCCCC3)CCC12 1 COc1c2c(cc3c1-c1ccc(OC)c(=O)cc1C(NC(C)=O)CC3)OCO2 2 COc1ccc2c(c1)c(/C=C1\C(=O)Nc3ccc(S(N)(=O)=O)cc31)cn2C 3 COc1ccc2c(c1)c(/C=C1\C(=O)Nc3ccc(S(N)(=O)=O)cc31)cn2C 4 Cc1ccsc1/C=N/NC(=O)c1ccc(Cn2cc(Br)cn2)o1 5 Cc1ccsc1/C=N/NC(=O)c1ccc(Cn2cc(Br)cn2)o1 6 Cc1ccsc1/C=N/NC(=O)c1ccc(Cn2cc(Br)cn2)o1 7 O=C(O)CCCSCCN1C(=O)CCCC1/C=C/C(O)CCC1CCC1 8 c1cc2c(cc1OCCCN1CCCCC1)CCN(CC1CCCCC1)CC2 9 c1cc2c(cc1OCCCN1CCCCC1)CCN(CC1CCCCC1)CC2 Name: Smiles, dtype: object
from e3fp.fingerprint.metrics.array_metrics import tanimoto
running result:[[1. 0.13190184 0.08284024 0.08630952 0.09580838 0.11246201 0.0969697 0.10423453 0.07920792 0.09246575] [0.13190184 1. 0.12146893 0.11235955 0.09366391 0.09065934 0.09470752 0.09467456 0.11875 0.11821086] [0.08284024 0.12146893 1. 0.61728395 0.11614731 0.11299435 0.11111111 0.07309942 0.109375 0.1086262 ] [0.08630952 0.11235955 0.61728395 1. 0.11965812 0.11965812 0.11461318 0.06705539 0.09937888 0.0984127 ] [0.09580838 0.09366391 0.11614731 0.11965812 1. 0.51538462 0.56 0.09552239 0.06606607 0.07098765] [0.11246201 0.09065934 0.11299435 0.11965812 0.51538462 1. 0.50579151 0.09552239 0.06927711 0.07430341] [0.0969697 0.09470752 0.11111111 0.11461318 0.56 0.50579151
Making sure a molecule gets the only definitive 3D fingerprint using randomSeed