ncats / bard

Sources for the BioAssay Research Database
Other
10 stars 2 forks source link

Empty results from the following URL #38

Closed jasiedu closed 11 years ago

jasiedu commented 11 years ago

http://bard.nih.gov/api/v12/search/compounds/?q=O%3DS%28*C%29%28Cc1ccc2ncc%28CCNC%29c2c1%29%3DO&skip=0&top=2&expand=true

http://bard.nih.gov/api/v12/search/projects/?q=zinc+ion+binding&filter=fq(gomf_term:zinc+ion+binding),&skip=0&top=10&expand=true

Used to work with 10

rajarshi commented 11 years ago

The second link works if you quote the q argument

Need to looking to the first one.

caodac commented 11 years ago

Why would you expect something to return for the first one? Are you indexing smarts/smiles strings in lucence?

jasiedu commented 11 years ago

Here is the smiles we are trying to use "O=S(*C)(Cc1ccc2ncc(CCNC)c2c1)=O" Note that this works in earlier versions of the API, so this is nothing new.

jasiedu commented 11 years ago

Also that it does return, but the docs node is empty

caodac commented 11 years ago

While you could (and should) be able to search on any string, I'm a bit perplexed as to why this particular query should return anything since all smiles are stored and indexed in their kekulized forms. Moreover, depending on whether smiles are analyzed and/or normalized during indexing, I'm not sure what to make of the resulting matches. What are we really matching here? Regardless, there are certainly some differences in how the smiles are indexed compared to previous versions.

jasiedu commented 11 years ago

This older version works.

http://bard.nih.gov/api/v10/search/compounds/?q=O%3DS%28*C%29%28Cc1ccc2ncc%28CCNC%29c2c1%29%3DO&skip=0&top=2&expand=true

rajarshi commented 11 years ago

It turns out that SMILES are not tokenized/analyzed during indexing and storage. However the new query system that we are using (edismax) does perform an analysis on the SMILES query - ignores case, breaks it up and in general, generates a nonsensical query. Furthermore, the structure in the Solr index is a kekulized form - even if the query parser did not break up the query, an exact match would never return anything.

The fact that you got 30393 hits in v10 for this structure is a little suspicious - suggesting that even in v10 the query was being analysed and incorrectly matched

In general, SMILES searches via text search does not seem to be a reliable thing to do

jasiedu commented 11 years ago

Our intent was not to do a SMILES search. The fact that it used to return something and it no longer does was our only concern.

On Thu, Jan 10, 2013 at 8:57 AM, Rajarshi Guha notifications@github.comwrote:

It turns out that SMILES are not tokenized/analyzed during indexing and storage. However the new query system that we are using (edismax) does perform an analysis on the SMILES query - ignores case, breaks it up and in general, generates a nonsensical query. Furthermore, the structure in the Solr index is a kekulized form - even if the query parser did not break up the query, an exact match would never return anything.

The fact that you got 30393 hits in v10 for this structure is a little suspicious - suggesting that even in v10 the query was being analysed and incorrectly matched

In general, SMILES searches via text search does not seem to be a reliable thing to do

— Reply to this email directly or view it on GitHubhttps://github.com/ncatsdpiprobedev/bard/issues/38#issuecomment-12097470.

*Jacob K Asiedu

Principal Software Engineer Chemical Biology Informatics Platform Broad Institute of Harvard & MIT 7 Cambridge Center Cambridge, MA 02142 Office: 617-714-7383*

jasiedu commented 11 years ago

I am closing this issue