openphacts / GLOBAL

Global project issues [private for now. owner lee harland]
3 stars 0 forks source link

API doesn't bring out result it should in Exact Search #198

Closed lucaxbartek closed 9 years ago

lucaxbartek commented 9 years ago

When searching for a compound, for example in this case Erythrose, using the SMILES formula obtained from OpenPHACTS (http://ops.rsc.org/OPS1782071), it brings up the error message "Internal server error" with error code 500. @StefanSenger

StefanSenger commented 9 years ago

@valt @pshenichnov

danidi commented 9 years ago

The 500 error is a HTTP Error 400. The request is badly formed. from Chemspider I think there is some formatting issue in the SMILES on the ops.rsc page. If you write the same SMILES you get a 404 (which is still not correct though). The one which is pure SMILES text (leading to 404) encodes to OCC%40%40HC%40%40HC%3DO, the one with the 500 error is OCC%40%40HC%40%40H%C2%ADC%3DO, which adds some AD to the SMILES.

danidi commented 9 years ago

Also, copying the SMILES from the ops.rsc page and searching the SMILES in Chemspider does not find anything. Writing the exact same SMILES, or copying from somewhere else finds the molecule.

OC[C@@H](O­)[C@@H](O)­C=O
danidi commented 9 years ago

Using the "corrected" SMILES, gives a result in the Exact Structure search, but only when the search option is set to get all tautomers: "primaryTopic": { "_about": "http://www.openphacts.org/api/ChemicalStructureSearch", "result": "http://ops.rsc.org/OPS1782071", "MatchType": "1", "Molecule": "OCC@@HC@@HC=O", "type": "http://www.openphacts.org/api/ExactStructureSearch",

But this then returns the same SMILES again, not a tautomeric structure I think.

Unrelated, for future issues, I think it would be best to post it as a new ticket on http://support.openphacts.org. We'll then first investigate the issue, and post it to Github with additional technical information.

lucaxbartek commented 9 years ago

My bad. Originally, I used the SMILES FROM Chemspider ( C(C@HO)O ) I got the corresponding OpenPHACTS URI from the Tautomer search, and I wondered why it does not come up on the Exact search - even though the compound is actually in the system! Sorry I didn't explain it clearly.

karapetk commented 9 years ago

An issue was identified in smiles to molfile translation. In this particular case OpenEye smiles-to-molfile conversion for some reason doesn't add a chiral flag to molfile and that translates into different non-standard InChI (which we use for exact search). Issue has been fixed but fix is not in production yet.

Another issue has been spotted that using copy button on OPS web pages like http://ops.rsc.org/OPS1782071 doesn't correctly copy the Smiles string (you will notice it if paste to notepad).

danidi commented 9 years ago

Hi Luca, no reason to apologize, it is indeed not the behaviour I would expect (regardless if the SMILES originally came from a tautomer search). Also the SMILES from Compound information (which is probably the one you also used originally doesn't find the molecule. So hopefully this issue will be solved with the fix. @karapetk Thank you for looking into it! For the copy issue, it is not only the copy button, but also copy pasting the text directly doesn't work.

StefanSenger commented 9 years ago

Hi all,

I just came across a similar behaviour when looking at Rivaroxaban.

Rivaroxaban -> http://www.conceptwiki.org/concept/index/4fe41c1c-c265-41f1-a767-8ec496a0a158 -> http://ops.rsc.org/OPS1557895 -> O=C1COCCN1C1C=CC(=CC=1)N1CC@HOC1=O

When I use this 'OPS-derived SMILES string' to search for the compound the search fails:

http://ops.rsc.org/JSON.ashx?op=ExactStructureSearch&searchOptions.Molecule=O=C1COCCN1C1C=CC(=CC=1)N1C[C@H](CNC(=O)C2=CC=C(Cl)-S2)OC1=O

http://ops.rsc.org/JSON.ashx?op=GetSearchResult&rid=1d5a23f0-b9ae-427f-b38a-75afe412e516 -> []

Doese this fail for the same reason as the Erythrose example has failed?

@karapetk: Could you please check if this issue is also resolved by the fix that has been put in place? That would be great.

karapetk commented 9 years ago

That smiles string is incorrectly copied form the referenced OPS web page for OPS1557895. The smiles should be : O=C1COCCN1C1C=CC(=CC=1)N1CC@HOC1=O

Note the difference between my smiles above and yours below: O=C1COCCN1­C1C=CC(=CC­=1)N1CC@H­OC1=O Haa. it disappears when I paste it here..

One should not trust copied smiles text from OPS web site until we fix the issue. If you paste the copied smiles text to notepad (not the browser's text field like on this forum) you will see the subtle difference.

StefanSenger commented 9 years ago

Hi Ken,

Let's try again. I took the Rivaroxaban SMILES string from http://ops.rsc.org/OPS1557895 and 'cleaned it manually' (i.e. removed the '-' characters). After that I checked that the SMILES string is working by perfoming a search on the Chemspider webpage and it worked. Here is the SMILES string: O=C1COCCN1C1C=CC(=CC=1)N1CC@HOC1=O I hope it's ok this time around. If I use this SMILES string now to perform an ExactStructureSearch with the MatchType=0 NO result is found (at least when I do it). Howerver, if I use MatchType=1 (AllTautomers) the search does find Rivaroxaban. Something is not quite right here. Would you mind having another look please?

Just to say that this does not appear to be happening just for Rivaroxaban. Without great difficulties I was able to find three more examples (see below) that show the same behaviour. All of the examples contain stereocentres. This might be coincidence or ....

Dapagliflozin_http://ops.rsc.org/OPS101970,CCOC1C=CC(CC2=CC(=CC=C2Cl)[C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)=CC=1 Dolutegravir_http://ops.rsc.org/OPS561448,C[C@@H]1CCO[C@H]2CN3C=C(C(=O)NCC4C=CC(F)=CC=4F)C(=O)C(O)=C3C(=O)N21 Tofacitinib_http://ops.rsc.org/OPS118596,CN([C@H]1CN(CC[C@H]1C)C(=O)CC#N)C1=NC=NC2NC=CC=21

StefanSenger commented 9 years ago

P.S. I have no idea what is going on her but github just 'truncates' the SMILES string when I pasted into the comment window. What you see above is not what I pasted in. I will send you the SMILES string via email.

lucaxbartek commented 9 years ago

Hi! Another three examples showing the same behaviour: L-(+)-lactic acid CC@@HO D-(+)-Glucose C(C@HO)O α-D-Glucopyranose C([C@@H]1C@HO)O

Nothing comes up with the "ExactMatch" parameter, however with "AllTautomers" or any other setting, the original molecule is also in the result, so they are definitely in the system.

The case of L-(+)-lactic acid is especially interesting, since I tried the SMILES with the steric information removed ( CC(C(=O)O)O ) -sorry, don't want my brackets to be confused for smiles - and in that case, the search brought up a result. This could mean @StefanSenger 's suspicion of the problem being caused by stereocentres is in fact correct!

lucaxbartek commented 9 years ago

And of course the smiles are all messed up again.... Hopefully this will work! http://txs.io/uHrb

danidi commented 9 years ago

You can try to write the SMILES in a new row (with one row in between) adding four space symbols before it. Then Github will interpret it as code and show it correctly (you can always check the preview if it shows up correctly).

 OC[C@@H]1OC(O)[C@@H](O)[C@H](O)[C@H]1O

It seems the stereo containing SMILES work in "Chemical Structure Conversion: SMILES to URL", but not in the exact search. But the issue then seems consistent with the problem @karapetk stated for the chiral flag in the molfile conversion, so it should hopefully be solved with the fix.

StefanSenger commented 9 years ago

Can I just follow up on the comment made by @dandi "It seems the stereo containing SMILES work in "Chemical Structure Conversion: SMILES to URL", but not in the exact search." Can someone please tell me what exactly happens when a "Chemical Structure Conversion: SMILES to URL" is performed? I just assumed that it was a 'Chemical Structure Search: Exact' but based on what @danidi observed that can't be the case.

karapetk commented 9 years ago

Alex just rolled out the bug fix for production test environment at ops2.rsc.org Stefan's Smiles seem to be working as intended.

Please test.

lucaxbartek commented 9 years ago

I cannot test the search for the SMILES as the SMILES to structure converter doesn't seem to be working for me at ops2.rsc.org Whatever SMILES I enter, I get the following error message:

No Transport : 0 Error in procedure convertTo has happened. undefined

danidi commented 9 years ago

You can use the API there in the following way: First perform the exact search with the SMILES of your choice with the following call (you can add the same parameters as with the Open PHACTS API there).

http://ops2.rsc.org/JSON.ashx?op=ExactStructureSearch&searchOptions.MatchType=0&searchOptions.Molecule=OC[C@@H]1OC(O)[C@@H](O)[C@H](O)[C@H]1O

This will give you a code, which you can use to retrieve the result, eg. with:

http://ops2.rsc.org/JSON.ashx?op=GetSearchResult&rid=9e0881ec-954a-4ed2-8101-f31b767118c0

This will return you the OPS-ID of a compound, which you can add to http://ops.rsc.org/Compounds/Get/ to see the resulting structure. For my example it seems to work fine now.

It seems the search portal doesn't work for SMILES at all, I raised this as another ticket last week: https://github.com/openphacts/GLOBAL/issues/199

karapetk commented 9 years ago

Works for me. Regarding #199 I added comment there.

karapetk commented 9 years ago

The fix has been pushed to production. Please close the ticket

danidi commented 9 years ago

Great! The search for the SMILES with stereochemistry works for the examples I tested. Maybe Luca can have a look at her list again to validate this? But I would like to leave the ticket open, as the first issue (the copy/paste issue with the SMILES from the rsc page) is still not solved.

pshenichnov-rsc commented 9 years ago

Copy/paste issue is fixed as well. You have to use "Copy" button against every item you want to copy. If you try to do this by selecting text on the page and press Ctrl+C - this won't work. The reason that the text contains special characters that allow browser does proper text wrap up, otherwise browser can't do this.

Copy buttons were added to resolve the issue and provide possibility to copy the text to clipboard. If Copy buttons don't work, please try to refresh the page, maybe some scripts were cashed on your side.

lucaxbartek commented 9 years ago

Maybe it's just me doing something wrong, but the erythrose example is still not working for me. The other 3 (glucoses and lactic acid) have been resolved. The copying issue is indeed solved it's working for me as well. The SMILES of eryhtrose I was trying to use:

OC[C@@H](O)[C@@H](O)C=O    
danidi commented 9 years ago

Doesn't work for me as well, but also not with MatchType=1, so maybe it is a slightly different issue here? SMILES to URL works.

lucaxbartek commented 9 years ago

When I first tested them, it worked with MatchType=1 so maybe this fix changed something.

karapetk commented 9 years ago

@lucaxbartek Smiles you mentioned is working for me when going via Smiles conversion and then "exact strict" search. Please use ops.rsc.org.

@danidi Please provide URL for the search

lucaxbartek commented 9 years ago

@karapetk on ops.rsc.org I cannot get the SMILES conversion to work either. "No Transport : 0 Error in procedure convertTo has happened. undefined" I don't get to the "search" part, this is just when clicking "OK" after entering the smiles

danidi commented 9 years ago

I used the same SMILES as Luca. With the Open PHACTS API it doesn't work. However, http://ops.rsc.org/JSON.ashx?op=ExactStructureSearch&searchOptions.MatchType=0&searchOptions.Molecule=OC[C@@H]%28O%29[C@@H]%28O%29C=O retrieves http://ops.rsc.org/Compounds/Get/1782071.

karapetk commented 9 years ago

I don't understand. Luca's Smiles is working for me

image

image

image

lucaxbartek commented 9 years ago

I'm doing the same thing. Do you think this could be a computer issue?

Daniela was saying that it's not working through the API which is the same case for me.

image image

danidi commented 9 years ago

This interface works well for me as well. So maybe it is an issue of the Open PHACTS API rather than the search API, but given that the SMILES doesn't have any special characters, I'm wondering which error possibilities are left.

karapetk commented 9 years ago

@lucaxbartek What browser are you using?

lucaxbartek commented 9 years ago

I was using IE 9. I now tried on Chrome and it seems to work. It should be addressed though as for example here, the use of IE is encouraged (and required for some things).

danidi commented 9 years ago

I think that will depend on the result of the discussion here: https://github.com/openphacts/GLOBAL/issues/199

lucaxbartek commented 9 years ago

I was wondering if there are any news regarding the Erythrose question. All the other issues that weren't working seem to have been resolved, however the original issue:

 C([C@H]([C@H](C=O)O)O)O

Even though it works on ops.rsc.org, it is still not working on the OpenPHACTS API page (https://dev.openphacts.org). @danidi

karapetk commented 9 years ago

As you mentioned the search API on ops.rsc.org works fine, however we (RSC'ers) do not have access or knowledge how API works on dev.openphacts.org

leeharland commented 9 years ago

@antonisloizou - looks like there's still an outstanding issue - copied below - any ideas?

I was wondering if there are any news regarding the Erythrose question. All the other issues that weren't working seem to have been resolved, however the original issue:

 C([C@H]([C@H](C=O)O)O)O

Even though it works on ops.rsc.org, it is still not working on the OpenPHACTS API page (https://dev.openphacts.org). @danidi

ChristineChichester commented 9 years ago

I tried using Luca’s SMILES copied from the email C(C@HO)O

https://beta.openphacts.org/1.4/structure?app_id=0186459a&app_key=5b7fc7c1d69f1f4af2c2671174d6d7d1&smiles=C([C@H]([C@H](C=O)O)O)O

and it worked for me { "format": "linked-data-api", "version": "1.4", "result": { "_about": "https://beta.openphacts.org/1.4/structure?app_id=a8d62f99&app_key=9f9836c3762afd27b6711646c9b2b47a&smiles=C(%5BC%40H%5D(%5BC%40H%5D(C%3DO)O)O)O", "definition": "https://beta.openphacts.org/api-config", "extendedMetadataVersion": "https://beta.openphacts.org/1.4/structure?app_id=a8d62f99&app_key=9f9836c3762afd27b6711646c9b2b47a&smiles=C(%5BC%40H%5D(%5BC%40H%5D(C%3DO)O)O)O&_metadata=all%2Cviews%2Cformats%2Cexecution%2Cbindings%2Csite", "primaryTopic": { "_about": "http://ops.rsc.org/OPS1782071", "smiles": "C(C@HO)O", "isPrimaryTopicOf": "https://beta.openphacts.org/1.4/structure?app_id=a8d62f99&app_key=9f9836c3762afd27b6711646c9b2b47a&smiles=C(%5BC%40H%5D(%5BC%40H%5D(C%3DO)O)O)O" } } }

On Oct 6, 2014, at 3:42 PM, Lee Harland notifications@github.com wrote:

@antonisloizou - looks like there's still an outstanding issue - copied below - any ideas?

I was wondering if there are any news regarding the Erythrose question. All the other issues that weren't working seem to have been resolved, however the original issue:

C(C@HO)O Even though it works on ops.rsc.org, it is still not working on the OpenPHACTS API page (https://dev.openphacts.org). @danidi

— Reply to this email directly or view it on GitHub.

StefanSenger commented 9 years ago

I think @ChristineChichester used 'Chemical Structure Conversion: SMILES to URL' (which is working) instead of 'Chemical Structure Search: Exact' (which isn't working). This nicely illustrates to me that we really, really need to know what the difference is between these two calls. @danidi, I know you asked @antonisloizou this question. Did you get an answer? Once we established what the difference is we should capture this on the support portal so that we never have to ask this question again :-).

danidi commented 9 years ago

No, he was away last week. We will follow this up as soon as possible.

antonisloizou commented 9 years ago

So there seems to be a few different things here:

  1. Difference between 'Chemical Structure Conversion: SMILES to URL' and 'Chemical Structure Search: Exact'

'Chemical Structure Search: Exact' calls :

[1] http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&scopeOptions.DataSources%5B0%5D=DrugBank&scopeOptions.DataSources%5B1%5D=ChEMBL&scopeOptions.DataSources%5B2%5D=PDB&searchOptions.Molecule={searchOptions.Molecule}

'Chemical Structure Conversion: SMILES to URL' calls first:

[2] http://ops.rsc.org/api/v1/JSON.ashx?op=ConvertTo&convertOptions.Direction=Smiles2InChi&convertOptions.Text={smiles}

and then :

[3] http://ops.rsc.org/api/v1/JSON.ashx?op=ConvertTo&convertOptions.Direction=InChi2CSID&convertOptions.Text={inchi}

I have no idea what the precise difference between the two services is. @karapetk ?

  1. The SMILES for Erythrose (http://ops.rsc.org/OPS1782071) does not return results.

This issue is about URL encoding . The encoding of

OC[C@@H](O)[C@@H](O)C=O

is:

OC%5BC%40%40H%5D(O)%5BC%40%40H%5D(O)C%3DO

Now, if I use the encoded string with [1]:

http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&scopeOptions.DataSources%5B0%5D=DrugBank&scopeOptions.DataSources%5B1%5D=ChEMBL&scopeOptions.DataSources%5B2%5D=PDB&searchOptions.Molecule=OC%5BC%40%40H%5D(O)%5BC%40%40H%5D(O)C%3DO

I get nothing back.

However, using the same string with [2]:

http://ops.rsc.org/api/v1/JSON.ashx?op=ConvertTo&convertOptions.Direction=Smiles2InChi&convertOptions.Text=OC%5BC%40%40H%5D(O)%5BC%40%40H%5D(O)C%3DO

I get the InChI :

InChI=1S/C4H8O4/c5-1-3(7)4(8)2-6/h1,3-4,6-8H,2H2/t3-,4+/m0/s1

In turn, the URL encoding of this InChI is :

InChI%3D1S%2FC4H8O4%2Fc5-1-3(7)4(8)2-6%2Fh1%2C3-4%2C6-8H%2C2H2%2Ft3-%2C4%2B%2Fm0%2Fs1

Finally, supplying the URL encoded InChI to [3] :

http://ops.rsc.org/api/v1/JSON.ashx?op=ConvertTo&convertOptions.Direction=InChi2CSID&convertOptions.Text=InChI%3D1S%2FC4H8O4%2Fc5-1-3(7)4(8)2-6%2Fh1%2C3-4%2C6-8H%2C2H2%2Ft3-%2C4%2B%2Fm0%2Fs1

I get back the ChemSpider ID 84990; which corresponds to OPS1782071, Erythrose .

Now this explains why the 'Chemical Structure Conversion: SMILES to URL' returns:

http://ops.rsc.org/OPS84990

which is NOT Erythrose (http://ops.rsc.org/OPS1782071). From our end we take whatever ID the OCRS gives and prepend http://ops.rsc.org/OPS to it.

In any case, we have so far established that [2] and [3] can work with URL encoded Erythrose SMILES and InChI string, while [1] does not appear to able to handle the URLEncoding of this particular SMILES.

Now, as you've seen already, the unencoded SMILES:

OC[C@@H](O)[C@@H](O)C=O

used with [1]

http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&scopeOptions.DataSources%5B0%5D=DrugBank&scopeOptions.DataSources%5B1%5D=ChEMBL&scopeOptions.DataSources%5B2%5D=PDB&searchOptions.Molecule=OC[C@@H](O)[C@@H](O)C=O

Produces the correct result, i.e.

http://ops.rsc.org/OPS1782071
antonisloizou commented 9 years ago

So, in terms of actions I guess it would go something like this:

First issue:

Figure out what is so special about the URL encoding of :

OC[C@@H](O)[C@@H](O)C=O

OR

Second issue:

OR

OR

karapetk commented 9 years ago

@antonisloizou Why in query [1] you are limiting search by those 3 data sources? If you remove them you would get resulting OPSID that came from ChEBI.

antonisloizou commented 9 years ago

@karapetk If I recall correctly at some point we had an issue about returning molecules that were not in the system from the similarity searches.

Confirm, [1] without the sources works with the encoded string:

http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&searchOptions.Molecule=OC%5BC%40%40H%5D(O)%5BC%40%40H%5D(O)C%3DO

Should I remove the restrictions ?

karapetk commented 9 years ago

Regarding similarity searches. You probably referring to similarity searches returning virtual records (parents). To filter the results and get real only compounds, please use RealOnly property of CSCSearchScopeOptions

http://ops.rsc.org/JSON.ashx#CommonSearchOptions
antonisloizou commented 9 years ago

A first test without the data sources works :

https://ops2.few.vu.nl/structure/exact?app_id=8e2a0ab3&app_key=28ae5ffdbd76d409b59d89e0c0c8341a&searchOptions.Molecule=OC%5BC%40%40H%5D(O)%5BC%40%40H%5D(O)C%3DO

antonisloizou commented 9 years ago

So you are saying [1] should be :

http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&searchOptions.Molecule={searchOptions.Molecule}&CSCSearchScopeOptions.RealOnly=true

??

antonisloizou commented 9 years ago

The same data source parameters are also currently used in 'Chemical Structure Search: Similarity' and 'Chemical Structure Search: Substructure'

Should they also be removed ? Should 'CSCSearchScopeOptions.RealOnly=true' be added also to those requests ?

karapetk commented 9 years ago

Yes, I would use

CSCSearchScopeOptions.RealOnly=true

anytime you show the results to end user. However, you might want to get virtual compound sometimes in some workflows as an intermediate step.

antonisloizou commented 9 years ago

OK - Here are the new templates for the 3 methods :

Chemical Structure Search: Similarity http://ops.rsc.org/api/v1/JSON.ashx?op=SimilaritySearch&CSCSearchScopeOptions.RealOnly=true&searchOptions.Molecule={searchOptions.Molecule}

Chemical Structure Search: Substructure http://ops.rsc.org/api/v1/JSON.ashx?op=SubStructureSearch&CSCSearchScopeOptions.RealOnly=true&searchOptions.Molecule={searchOptions.Molecule}

Chemical Structure Search: Exact http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&CSCSearchScopeOptions.RealOnly=true&searchOptions.Molecule={searchOptions.Molecule}

@danidi , @ChristineChichester can we get some testing on this (ops2 or https://dev.openphacts.org/docs/develop ) before deploying to OL as a hot fix ?