keiserlab / e3fp

3D molecular fingerprints
GNU Lesser General Public License v3.0
122 stars 33 forks source link

Update generator.py #33

Closed Mryangkaitong closed 5 years ago

Mryangkaitong commented 5 years ago
from e3fp.pipeline import fprints_from_smiles
fprint_params = {'bits': 1024, 'radius_multiplier': 1.5, 'rdkit_invariants': True}
confgen_params = {'max_energy_diff': 20.0, 'first': 3}
smiles = "COC(=O)C(C1CCCCN1)C2=CC=CC=C2"

fprints1 = fprints_from_smiles(smiles, "ritalin", confgen_params=confgen_params, fprint_params=fprint_params)
fprints2 = fprints_from_smiles(smiles, "ritalin", confgen_params=confgen_params, fprint_params=fprint_params)
fprints3 = fprints_from_smiles(smiles, "ritalin", confgen_params=confgen_params, fprint_params=fprint_params)

print(fprints1)
print('***************************')
print(fprints2)
print('***************************')
print(fprints3)

running result: [Fingerprint(indices=array([17, 71, 188, 195, 206, 224, 239, 288, 322, 324, 349, 356, 390, 401, 424, 473, 489, 503, 504, 561, 562, 621, 652, 666, 714, 745, 763, 778, 805, 816, 836, 914, 938, 981, 999]), level=5, bits=1024, name=ritalin_0), Fingerprint(indices=array([71, 188, 239, 282, 300, 322, 349, 356, 358, 390, 401, 415, 473, 489, 503, 504, 532, 561, 562, 621, 652, 666, 714, 745, 763, 778, 795, 836, 853, 914, 938, 981, 982]), level=5, bits=1024, name=ritalin_1), Fingerprint(indices=array([17, 71, 109, 188, 224, 239, 250, 322, 349, 355, 356, 390, 401, 473, 489, 503, 504, 539, 561, 562, 621, 649, 652, 666, 714, 745, 763, 778, 853, 914, 916, 938, 981, 988]), level=5, bits=1024, name=ritalin_2)]


[Fingerprint(indices=array([8, 71, 169, 175, 188, 224, 239, 322, 349, 356, 390, 401, 407, 420, 473, 489, 503, 504, 532, 556, 561, 562, 621, 652, 656, 666, 703, 714, 745, 763, 778, 914, 930, 981]), level=5, bits=1024, name=ritalin_0), Fingerprint(indices=array([15, 71, 188, 206, 224, 239, 268, 270, 322, 349, 356, 390, 401, 431, 473, 489, 503, 504, 560, 561, 562, 621, 652, 666, 707, 714, 745, 762, 763, 778, 826, 914, 930, 981]), level=5, bits=1024, name=ritalin_1), Fingerprint(indices=array([17, 71, 76, 188, 206, 224, 239, 322, 349, 356, 390, 393, 401, 402, 473, 489, 503, 504, 561, 562, 621, 652, 666, 680, 714, 745, 763, 778, 826, 886, 914, 930, 981, 987, 1012]), level=5, bits=1024, name=ritalin_2)]


[Fingerprint(indices=array([8, 71, 169, 175, 188, 224, 239, 322, 349, 356, 390, 401, 407, 420, 473, 489, 503, 504, 532, 556, 561, 562, 621, 652, 656, 666, 703, 714, 745, 763, 778, 914, 930, 981]), level=5, bits=1024, name=ritalin_0), Fingerprint(indices=array([71, 112, 179, 188, 224, 239, 322, 349, 356, 363, 390, 401, 473, 489, 503, 504, 532, 561, 562, 601, 621, 652, 666, 714, 745, 763, 778, 805, 809, 836, 853, 914, 938, 981, 1011]), level=5, bits=1024, name=ritalin_1), Fingerprint(indices=array([15, 71, 188, 206, 224, 239, 268, 270, 322, 349, 356, 390, 401, 431, 473, 489, 503, 504, 560, 561, 562, 621, 652, 666, 707, 714, 745, 762, 763, 778, 826, 914, 930, 981]), level=5, bits=1024, name=ritalin_2)]

You can find different 3D fingerprints corresponding to the same molecule, and how similar are they? Here, I tried 10 different Smiles ,using tanimoto similarity

0 CC12CCC(=O)C=C1NCC1C2CCC2(C)C(C(=O)N3CCCCC3)CCC12 1 COc1c2c(cc3c1-c1ccc(OC)c(=O)cc1C(NC(C)=O)CC3)OCO2 2 COc1ccc2c(c1)c(/C=C1\C(=O)Nc3ccc(S(N)(=O)=O)cc31)cn2C 3 COc1ccc2c(c1)c(/C=C1\C(=O)Nc3ccc(S(N)(=O)=O)cc31)cn2C 4 Cc1ccsc1/C=N/NC(=O)c1ccc(Cn2cc(Br)cn2)o1 5 Cc1ccsc1/C=N/NC(=O)c1ccc(Cn2cc(Br)cn2)o1 6 Cc1ccsc1/C=N/NC(=O)c1ccc(Cn2cc(Br)cn2)o1 7 O=C(O)CCCSCCN1C(=O)CCCC1/C=C/C(O)CCC1CCC1 8 c1cc2c(cc1OCCCN1CCCCC1)CCN(CC1CCCCC1)CC2 9 c1cc2c(cc1OCCCN1CCCCC1)CCN(CC1CCCCC1)CC2 Name: Smiles, dtype: object from e3fp.fingerprint.metrics.array_metrics import tanimoto running result:

[[1. 0.13190184 0.08284024 0.08630952 0.09580838 0.11246201 0.0969697 0.10423453 0.07920792 0.09246575] [0.13190184 1. 0.12146893 0.11235955 0.09366391 0.09065934 0.09470752 0.09467456 0.11875 0.11821086] [0.08284024 0.12146893 1. 0.61728395 0.11614731 0.11299435 0.11111111 0.07309942 0.109375 0.1086262 ] [0.08630952 0.11235955 0.61728395 1. 0.11965812 0.11965812 0.11461318 0.06705539 0.09937888 0.0984127 ] [0.09580838 0.09366391 0.11614731 0.11965812 1. 0.51538462 0.56 0.09552239 0.06606607 0.07098765] [0.11246201 0.09065934 0.11299435 0.11965812 0.51538462 1. 0.50579151 0.09552239 0.06927711 0.07430341] [0.0969697 0.09470752 0.11111111 0.11461318 0.56 0.50579151

  1. 0.09009009 0.07012195 0.071875 ] [0.10423453 0.09467456 0.07309942 0.06705539 0.09552239 0.09552239 0.09009009 1. 0.08609272 0.08474576] [0.07920792 0.11875 0.109375 0.09937888 0.06606607 0.06927711 0.07012195 0.08609272 1. 0.28870293] [0.09246575 0.11821086 0.1086262 0.0984127 0.07098765 0.07430341 0.071875 0.08474576 0.28870293 1. ]]

Making sure a molecule gets the only definitive 3D fingerprint using randomSeed

coveralls commented 5 years ago

Coverage Status

Coverage remained the same at 58.372% when pulling be428292dc21a44b9f3181132e6fe4cd10a9715a on Mryangkaitong:master into 5a3c683f54edbe9ec52ccd28df24908fdc9df8fc on keiserlab:master.

sethaxen commented 5 years ago

Providing access to the seed is a nice feature, but we don't want to just set the seed to some privileged number by default for two reasons: 1) that silently changes the behavior of the conformer generator and 2) it doesn't grant the user access to the parameter. A better approach is to add a confgen param for the seed that is passed to rdkit.Chem.AllChem.EmbedMultipleConfs and then have a global default to -1 (the default current value in RDKit). Then users such as yourself can set it when you want reproducible conformers.

Doing this will require some changes to the internals of E3FP, so I'll go ahead and commit a fix myself. Thanks for taking initiative on this!

sethaxen commented 5 years ago

This feature was added in 098934d.