Closed beyondpie closed 8 years ago
If you want similarities between compounds in an SDF file, I would recommend generating fingerprints and calculating similarities locally using RDKit (or OpenBabel, CDK, etc.). Something like:
mols = Chem.SDMolSupplier('myfile.sdf')
fp1 = AllChem.GetMorganFingerprint(mols[0], 2)
fp2 = AllChem.GetMorganFingerprint(mols[1], 2)
DataStructs.TanimotoSimilarity(fp1, fp2)
But if you specifically want to use PubChem fingerprints you can do something like this with PubChemPy:
def tanimoto(compound1, compound2):
fp1 = int(compound1.fingerprint, 16)
fp2 = int(compound2.fingerprint, 16)
fp1_count = bin(fp1).count('1')
fp2_count = bin(fp2).count('1')
both_count = bin(fp1 & fp2).count('1')
return float(both_count) / (fp1_count + fp2_count - both_count)
I added a more complete example here: https://github.com/mcs07/PubChemPy/blob/master/examples/Chemical%20fingerprints%20and%20similarity.ipynb
Great ! Yes, I also use RDKit. In this part, I only want to get the PubChem similarities. Now I see, by compound.fingerprint in your package, I can not only get the similarities, but also the PubChem fingerprints ~ Thanks a lot ! Songpeng
Hi, nice to find this package. Currently, I have multiple compounds (with SDF formats, in fact, they are from Zinc Database). Is it possible that I use their SDF formats to get their PubChem similarities ? Thanks ~ Songpeng