openphacts / GLOBAL

Global project issues [private for now. owner lee harland]
3 stars 0 forks source link

Chemical Structure Search: Similarity - Issue with relevance for tautomers #16

Open leeharland opened 10 years ago

leeharland commented 10 years ago

from stefan senger The following two URIs relate to tautomers of Sildenafil: http://ops.rsc.org/OPS1213082 http://ops.rsc.org/OPS1794066

If I take the SMILES string CCCC1=NN(C2=C1N=C(NC2=O)C3=C(C=CC(=C3)S(=O)(=O)N4CCN(CC4)C)OCC)C for OPS1213082 and perform a Tanimoto similarity search, I noticed that the other tautomer (http://ops.rsc.org/OPS1794066) is only amongst the hits when I use a threshold of less than 0.9 since the relevance for this tautomer is 0.89, Considering that OPS1794066 is a tautomer of the query I would expect it to have a relevance of 1 (or at least very close to 1). When we calculated the Tanimoto similarity with ChemAxon the similarity index was indeed 1.0. Chemists would definitely see it as counterintuitive for tautomers to have such a low relevance.

Is there anything that can be done so that Indigo produces a similarity index for tautomers that is at least closer to what one would expect (ideally 1.0)?

Is there some documentation that explains how the fingerprints are calculated?

Just to note, that I haven't tried other tautomers. I am just assuming that the behaviour would be similar.

karapetk commented 10 years ago

I confirm the issue. The issue has been submitted to GGA forum: https://groups.google.com/forum/#!topic/indigo-bugs/3QcPVrTSKMw

StefanSenger commented 10 years ago

Just adding this comment so that it's easier for me ( @StefanSenger) to watch this issue.

karapetk commented 10 years ago

Same, no answer from GGA

@valt