ostrokach / cinfony

Automatically exported from code.google.com/p/cinfony
1 stars 1 forks source link

Cross-toolkit fingerprint comparison #8

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I was playing a bit with cinfony 1.0 (included in debian) and fingerprints and 
just tried some things below, with interesting results:

Python 2.7.2+ (default, Nov 30 2011, 19:22:03) 
[GCC 4.6.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from cinfony import webel, rdk, pybel
[14:15:50] WARNING: The AvailDescriptors module is deprecated. Please switch to 
using the Descriptors module.
>>> mol = pybel.readstring('smi','CCCC')
>>> obmol = mol
>>> rdkmol = rdk.Molecule(mol)
>>> webmol = webel.Molecule(mol)
>>> obfp = obmol.calcfp('MACCS')
>>> obfp.bits
[114, 115, 118, 147, 149, 155, 160]
>>> rdkfp = rdkmol.calcfp('maccs')
>>> rdkfp.bits
[114, 115, 118, 147, 149, 155, 160]
>>> webfp = webmol.calcfp('maccs')
>>> webfp.bits
[113, 114, 117, 146, 148, 154, 159]
>>> obfp | obfp
1.0
>>> obfp | rdkfp
-1.0
>>> obfp | webfp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/cinfony/pybel.py", line 612, in __or__
    return ob.OBFingerprint.Tanimoto(self.fp, other.fp)
TypeError: in method 'OBFingerprint_Tanimoto', argument 2 of type 'std::vector< 
unsigned int,std::allocator< unsigned int > > const &'
>>> webfp | obfp
0.07692307692307693
>>> webfp | rdkfp
0.07692307692307693
>>> webfp | webfp
1.0
>>> rdkfp | rdkfp
1.0
>>> rdkfp | webfp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/cinfony/rdk.py", line 578, in __or__
    return rdkit.DataStructs.FingerprintSimilarity(self.fp, other.fp)
  File "/usr/lib/pymodules/python2.7/rdkit/DataStructs/__init__.py", line 36, in FingerprintSimilarity
    sz2 = fp2.GetNumBits()
AttributeError: 'str' object has no attribute 'GetNumBits'
>>> rdkfp | obfp
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/pymodules/python2.7/cinfony/rdk.py", line 578, in __or__
    return rdkit.DataStructs.FingerprintSimilarity(self.fp, other.fp)
  File "/usr/lib/pymodules/python2.7/rdkit/DataStructs/__init__.py", line 36, in FingerprintSimilarity
    sz2 = fp2.GetNumBits()
AttributeError: 'vectorUnsignedInt' object has no attribute 'GetNumBits'

I can see 3 issues here:

1 - The MACCS fingerprint is called 'MACCS' in pybel but 'maccs' in rdk and 
webel. This complicates the situation if i want to do MACCS fingerprints using 
different backends for that. Maybe using case insensitive fingerprint names 
could do the trick.

2 - Fingerprints generated by different toolkits cannot be compared, or give 
weird results (-1.0?). I don't know if this could be fixed, but I would find it 
useful. Maybe using the same algorithm webel.py uses for __other__ in all 
Fingerprint implementations?

3 - It seems that webel disagrees with OpenBabel and RDkit regarding MACCS 
fingerprints. This probably has nothing to do with cinfony.

Original issue reported on code.google.com by ssorga...@gmail.com on 5 Jan 2012 at 1:42

GoogleCodeExporter commented 9 years ago
I just noticed that I can use 'MACCS' ad fingerprint type both in webel and 
rdk, so I can use the same fingerprint type for all of them. Maybe pybel should 
be case insensitive too, anyway.

Original comment by ssorga...@gmail.com on 5 Jan 2012 at 1:55

GoogleCodeExporter commented 9 years ago
These are all good points. 

1 - (After reading Comment 1) I'll look into this for pybel.

2 - That is an interesting suggestion. But I think it only applies to MACCS, 
right? In all other cases, it doesn't make sense to compare fingerprints (maybe 
I should raise an error message). For MACCS keys, you can compare 
set(myfp.bits) and set(myotherfp.bits). 

3 - The Webel webservice uses the CDK for the fingerprint.

Original comment by baoille...@gmail.com on 5 Jan 2012 at 2:20

GoogleCodeExporter commented 9 years ago
Thank you.

I've ended up wrapping the fingerprints in a new fingerprint class, keeping the 
fp and bits attributes but using webel.Fingerprint's __or__ 
implementation,,which so far seems to return the same results as the original 
fingerprint classes, but is now able to compare fingerprints of the same kind 
generated in different toolkits (which, AFAIK, currently only happens to MACCS, 
or are there any other fingerprints implemented by more than one toolkit?).

About point 3: it looks like someone has bug. I read somewhere that CDK MACCS 
implementation was based on RDkit's, so CDK's is probably the buggy one. I'll 
upgrade the CDK on my system (the version I have, 1.2.8, does not seem to 
support MACCS) and I'll look into it and report bugs where appropiate.

Original comment by ssorga...@gmail.com on 6 Jan 2012 at 7:03