Closed lucaxbartek closed 9 years ago
@valt @pshenichnov
The 500 error is a HTTP Error 400. The request is badly formed. from Chemspider I think there is some formatting issue in the SMILES on the ops.rsc page. If you write the same SMILES you get a 404 (which is still not correct though). The one which is pure SMILES text (leading to 404) encodes to OCC%40%40HC%40%40HC%3DO, the one with the 500 error is OCC%40%40HC%40%40H%C2%ADC%3DO, which adds some AD to the SMILES.
Also, copying the SMILES from the ops.rsc page and searching the SMILES in Chemspider does not find anything. Writing the exact same SMILES, or copying from somewhere else finds the molecule.
OC[C@@H](O)[C@@H](O)C=O
Using the "corrected" SMILES, gives a result in the Exact Structure search, but only when the search option is set to get all tautomers: "primaryTopic": { "_about": "http://www.openphacts.org/api/ChemicalStructureSearch", "result": "http://ops.rsc.org/OPS1782071", "MatchType": "1", "Molecule": "OCC@@HC@@HC=O", "type": "http://www.openphacts.org/api/ExactStructureSearch",
But this then returns the same SMILES again, not a tautomeric structure I think.
Unrelated, for future issues, I think it would be best to post it as a new ticket on http://support.openphacts.org. We'll then first investigate the issue, and post it to Github with additional technical information.
My bad. Originally, I used the SMILES FROM Chemspider ( C(C@HO)O ) I got the corresponding OpenPHACTS URI from the Tautomer search, and I wondered why it does not come up on the Exact search - even though the compound is actually in the system! Sorry I didn't explain it clearly.
An issue was identified in smiles to molfile translation. In this particular case OpenEye smiles-to-molfile conversion for some reason doesn't add a chiral flag to molfile and that translates into different non-standard InChI (which we use for exact search). Issue has been fixed but fix is not in production yet.
Another issue has been spotted that using copy button on OPS web pages like http://ops.rsc.org/OPS1782071 doesn't correctly copy the Smiles string (you will notice it if paste to notepad).
Hi Luca, no reason to apologize, it is indeed not the behaviour I would expect (regardless if the SMILES originally came from a tautomer search). Also the SMILES from Compound information (which is probably the one you also used originally doesn't find the molecule. So hopefully this issue will be solved with the fix. @karapetk Thank you for looking into it! For the copy issue, it is not only the copy button, but also copy pasting the text directly doesn't work.
Hi all,
I just came across a similar behaviour when looking at Rivaroxaban.
Rivaroxaban -> http://www.conceptwiki.org/concept/index/4fe41c1c-c265-41f1-a767-8ec496a0a158 -> http://ops.rsc.org/OPS1557895 -> O=C1COCCN1C1C=CC(=CC=1)N1CC@HOC1=O
When I use this 'OPS-derived SMILES string' to search for the compound the search fails:
http://ops.rsc.org/JSON.ashx?op=GetSearchResult&rid=1d5a23f0-b9ae-427f-b38a-75afe412e516 -> []
Doese this fail for the same reason as the Erythrose example has failed?
@karapetk: Could you please check if this issue is also resolved by the fix that has been put in place? That would be great.
That smiles string is incorrectly copied form the referenced OPS web page for OPS1557895. The smiles should be : O=C1COCCN1C1C=CC(=CC=1)N1CC@HOC1=O
Note the difference between my smiles above and yours below: O=C1COCCN1C1C=CC(=CC=1)N1CC@HOC1=O Haa. it disappears when I paste it here..
One should not trust copied smiles text from OPS web site until we fix the issue. If you paste the copied smiles text to notepad (not the browser's text field like on this forum) you will see the subtle difference.
Hi Ken,
Let's try again. I took the Rivaroxaban SMILES string from http://ops.rsc.org/OPS1557895 and 'cleaned it manually' (i.e. removed the '-' characters). After that I checked that the SMILES string is working by perfoming a search on the Chemspider webpage and it worked. Here is the SMILES string: O=C1COCCN1C1C=CC(=CC=1)N1CC@HOC1=O I hope it's ok this time around. If I use this SMILES string now to perform an ExactStructureSearch with the MatchType=0 NO result is found (at least when I do it). Howerver, if I use MatchType=1 (AllTautomers) the search does find Rivaroxaban. Something is not quite right here. Would you mind having another look please?
Just to say that this does not appear to be happening just for Rivaroxaban. Without great difficulties I was able to find three more examples (see below) that show the same behaviour. All of the examples contain stereocentres. This might be coincidence or ....
Dapagliflozin_http://ops.rsc.org/OPS101970,CCOC1C=CC(CC2=CC(=CC=C2Cl)[C@@H]2O[C@H](CO)[C@@H](O)[C@H](O)[C@H]2O)=CC=1 Dolutegravir_http://ops.rsc.org/OPS561448,C[C@@H]1CCO[C@H]2CN3C=C(C(=O)NCC4C=CC(F)=CC=4F)C(=O)C(O)=C3C(=O)N21 Tofacitinib_http://ops.rsc.org/OPS118596,CN([C@H]1CN(CC[C@H]1C)C(=O)CC#N)C1=NC=NC2NC=CC=21
P.S. I have no idea what is going on her but github just 'truncates' the SMILES string when I pasted into the comment window. What you see above is not what I pasted in. I will send you the SMILES string via email.
Hi! Another three examples showing the same behaviour: L-(+)-lactic acid CC@@HO D-(+)-Glucose C(C@HO)O α-D-Glucopyranose C([C@@H]1C@HO)O
Nothing comes up with the "ExactMatch" parameter, however with "AllTautomers" or any other setting, the original molecule is also in the result, so they are definitely in the system.
The case of L-(+)-lactic acid is especially interesting, since I tried the SMILES with the steric information removed ( CC(C(=O)O)O ) -sorry, don't want my brackets to be confused for smiles - and in that case, the search brought up a result. This could mean @StefanSenger 's suspicion of the problem being caused by stereocentres is in fact correct!
And of course the smiles are all messed up again.... Hopefully this will work! http://txs.io/uHrb
You can try to write the SMILES in a new row (with one row in between) adding four space symbols before it. Then Github will interpret it as code and show it correctly (you can always check the preview if it shows up correctly).
OC[C@@H]1OC(O)[C@@H](O)[C@H](O)[C@H]1O
It seems the stereo containing SMILES work in "Chemical Structure Conversion: SMILES to URL", but not in the exact search. But the issue then seems consistent with the problem @karapetk stated for the chiral flag in the molfile conversion, so it should hopefully be solved with the fix.
Can I just follow up on the comment made by @dandi "It seems the stereo containing SMILES work in "Chemical Structure Conversion: SMILES to URL", but not in the exact search." Can someone please tell me what exactly happens when a "Chemical Structure Conversion: SMILES to URL" is performed? I just assumed that it was a 'Chemical Structure Search: Exact' but based on what @danidi observed that can't be the case.
Alex just rolled out the bug fix for production test environment at ops2.rsc.org Stefan's Smiles seem to be working as intended.
Please test.
I cannot test the search for the SMILES as the SMILES to structure converter doesn't seem to be working for me at ops2.rsc.org Whatever SMILES I enter, I get the following error message:
No Transport : 0 Error in procedure convertTo has happened. undefined
You can use the API there in the following way: First perform the exact search with the SMILES of your choice with the following call (you can add the same parameters as with the Open PHACTS API there).
http://ops2.rsc.org/JSON.ashx?op=ExactStructureSearch&searchOptions.MatchType=0&searchOptions.Molecule=OC[C@@H]1OC(O)[C@@H](O)[C@H](O)[C@H]1O
This will give you a code, which you can use to retrieve the result, eg. with:
http://ops2.rsc.org/JSON.ashx?op=GetSearchResult&rid=9e0881ec-954a-4ed2-8101-f31b767118c0
This will return you the OPS-ID of a compound, which you can add to http://ops.rsc.org/Compounds/Get/ to see the resulting structure. For my example it seems to work fine now.
It seems the search portal doesn't work for SMILES at all, I raised this as another ticket last week: https://github.com/openphacts/GLOBAL/issues/199
Works for me. Regarding #199 I added comment there.
The fix has been pushed to production. Please close the ticket
Great! The search for the SMILES with stereochemistry works for the examples I tested. Maybe Luca can have a look at her list again to validate this? But I would like to leave the ticket open, as the first issue (the copy/paste issue with the SMILES from the rsc page) is still not solved.
Copy/paste issue is fixed as well. You have to use "Copy" button against every item you want to copy. If you try to do this by selecting text on the page and press Ctrl+C - this won't work. The reason that the text contains special characters that allow browser does proper text wrap up, otherwise browser can't do this.
Copy buttons were added to resolve the issue and provide possibility to copy the text to clipboard. If Copy buttons don't work, please try to refresh the page, maybe some scripts were cashed on your side.
Maybe it's just me doing something wrong, but the erythrose example is still not working for me. The other 3 (glucoses and lactic acid) have been resolved. The copying issue is indeed solved it's working for me as well. The SMILES of eryhtrose I was trying to use:
OC[C@@H](O)[C@@H](O)C=O
Doesn't work for me as well, but also not with MatchType=1, so maybe it is a slightly different issue here? SMILES to URL works.
When I first tested them, it worked with MatchType=1 so maybe this fix changed something.
@lucaxbartek Smiles you mentioned is working for me when going via Smiles conversion and then "exact strict" search. Please use ops.rsc.org.
@danidi Please provide URL for the search
@karapetk on ops.rsc.org I cannot get the SMILES conversion to work either. "No Transport : 0 Error in procedure convertTo has happened. undefined" I don't get to the "search" part, this is just when clicking "OK" after entering the smiles
I used the same SMILES as Luca. With the Open PHACTS API it doesn't work. However, http://ops.rsc.org/JSON.ashx?op=ExactStructureSearch&searchOptions.MatchType=0&searchOptions.Molecule=OC[C@@H]%28O%29[C@@H]%28O%29C=O retrieves http://ops.rsc.org/Compounds/Get/1782071.
I don't understand. Luca's Smiles is working for me
I'm doing the same thing. Do you think this could be a computer issue?
Daniela was saying that it's not working through the API which is the same case for me.
This interface works well for me as well. So maybe it is an issue of the Open PHACTS API rather than the search API, but given that the SMILES doesn't have any special characters, I'm wondering which error possibilities are left.
@lucaxbartek What browser are you using?
I was using IE 9. I now tried on Chrome and it seems to work. It should be addressed though as for example here, the use of IE is encouraged (and required for some things).
I think that will depend on the result of the discussion here: https://github.com/openphacts/GLOBAL/issues/199
I was wondering if there are any news regarding the Erythrose question. All the other issues that weren't working seem to have been resolved, however the original issue:
C([C@H]([C@H](C=O)O)O)O
Even though it works on ops.rsc.org, it is still not working on the OpenPHACTS API page (https://dev.openphacts.org). @danidi
As you mentioned the search API on ops.rsc.org works fine, however we (RSC'ers) do not have access or knowledge how API works on dev.openphacts.org
@antonisloizou - looks like there's still an outstanding issue - copied below - any ideas?
I was wondering if there are any news regarding the Erythrose question. All the other issues that weren't working seem to have been resolved, however the original issue:
C([C@H]([C@H](C=O)O)O)O
Even though it works on ops.rsc.org, it is still not working on the OpenPHACTS API page (https://dev.openphacts.org). @danidi
I tried using Luca’s SMILES copied from the email C(C@HO)O
and it worked for me { "format": "linked-data-api", "version": "1.4", "result": { "_about": "https://beta.openphacts.org/1.4/structure?app_id=a8d62f99&app_key=9f9836c3762afd27b6711646c9b2b47a&smiles=C(%5BC%40H%5D(%5BC%40H%5D(C%3DO)O)O)O", "definition": "https://beta.openphacts.org/api-config", "extendedMetadataVersion": "https://beta.openphacts.org/1.4/structure?app_id=a8d62f99&app_key=9f9836c3762afd27b6711646c9b2b47a&smiles=C(%5BC%40H%5D(%5BC%40H%5D(C%3DO)O)O)O&_metadata=all%2Cviews%2Cformats%2Cexecution%2Cbindings%2Csite", "primaryTopic": { "_about": "http://ops.rsc.org/OPS1782071", "smiles": "C(C@HO)O", "isPrimaryTopicOf": "https://beta.openphacts.org/1.4/structure?app_id=a8d62f99&app_key=9f9836c3762afd27b6711646c9b2b47a&smiles=C(%5BC%40H%5D(%5BC%40H%5D(C%3DO)O)O)O" } } }
On Oct 6, 2014, at 3:42 PM, Lee Harland notifications@github.com wrote:
@antonisloizou - looks like there's still an outstanding issue - copied below - any ideas?
I was wondering if there are any news regarding the Erythrose question. All the other issues that weren't working seem to have been resolved, however the original issue:
C(C@HO)O Even though it works on ops.rsc.org, it is still not working on the OpenPHACTS API page (https://dev.openphacts.org). @danidi
— Reply to this email directly or view it on GitHub.
I think @ChristineChichester used 'Chemical Structure Conversion: SMILES to URL' (which is working) instead of 'Chemical Structure Search: Exact' (which isn't working). This nicely illustrates to me that we really, really need to know what the difference is between these two calls. @danidi, I know you asked @antonisloizou this question. Did you get an answer? Once we established what the difference is we should capture this on the support portal so that we never have to ask this question again :-).
No, he was away last week. We will follow this up as soon as possible.
So there seems to be a few different things here:
'Chemical Structure Search: Exact' calls :
[1] http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&scopeOptions.DataSources%5B0%5D=DrugBank&scopeOptions.DataSources%5B1%5D=ChEMBL&scopeOptions.DataSources%5B2%5D=PDB&searchOptions.Molecule={searchOptions.Molecule}
'Chemical Structure Conversion: SMILES to URL' calls first:
[2] http://ops.rsc.org/api/v1/JSON.ashx?op=ConvertTo&convertOptions.Direction=Smiles2InChi&convertOptions.Text={smiles}
and then :
[3] http://ops.rsc.org/api/v1/JSON.ashx?op=ConvertTo&convertOptions.Direction=InChi2CSID&convertOptions.Text={inchi}
I have no idea what the precise difference between the two services is. @karapetk ?
This issue is about URL encoding . The encoding of
OC[C@@H](O)[C@@H](O)C=O
is:
OC%5BC%40%40H%5D(O)%5BC%40%40H%5D(O)C%3DO
Now, if I use the encoded string with [1]:
I get nothing back.
However, using the same string with [2]:
http://ops.rsc.org/api/v1/JSON.ashx?op=ConvertTo&convertOptions.Direction=Smiles2InChi&convertOptions.Text=OC%5BC%40%40H%5D(O)%5BC%40%40H%5D(O)C%3DO
I get the InChI :
InChI=1S/C4H8O4/c5-1-3(7)4(8)2-6/h1,3-4,6-8H,2H2/t3-,4+/m0/s1
In turn, the URL encoding of this InChI is :
InChI%3D1S%2FC4H8O4%2Fc5-1-3(7)4(8)2-6%2Fh1%2C3-4%2C6-8H%2C2H2%2Ft3-%2C4%2B%2Fm0%2Fs1
Finally, supplying the URL encoded InChI to [3] :
http://ops.rsc.org/api/v1/JSON.ashx?op=ConvertTo&convertOptions.Direction=InChi2CSID&convertOptions.Text=InChI%3D1S%2FC4H8O4%2Fc5-1-3(7)4(8)2-6%2Fh1%2C3-4%2C6-8H%2C2H2%2Ft3-%2C4%2B%2Fm0%2Fs1
I get back the ChemSpider ID 84990; which corresponds to OPS1782071, Erythrose .
Now this explains why the 'Chemical Structure Conversion: SMILES to URL' returns:
http://ops.rsc.org/OPS84990
which is NOT Erythrose (http://ops.rsc.org/OPS1782071). From our end we take whatever ID the OCRS gives and prepend http://ops.rsc.org/OPS to it.
In any case, we have so far established that [2] and [3] can work with URL encoded Erythrose SMILES and InChI string, while [1] does not appear to able to handle the URLEncoding of this particular SMILES.
Now, as you've seen already, the unencoded SMILES:
OC[C@@H](O)[C@@H](O)C=O
used with [1]
http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&scopeOptions.DataSources%5B0%5D=DrugBank&scopeOptions.DataSources%5B1%5D=ChEMBL&scopeOptions.DataSources%5B2%5D=PDB&searchOptions.Molecule=OC[C@@H](O)[C@@H](O)C=O
Produces the correct result, i.e.
http://ops.rsc.org/OPS1782071
So, in terms of actions I guess it would go something like this:
First issue:
Figure out what is so special about the URL encoding of :
OC[C@@H](O)[C@@H](O)C=O
OR
Second issue:
OR
OR
@antonisloizou Why in query [1] you are limiting search by those 3 data sources? If you remove them you would get resulting OPSID that came from ChEBI.
@karapetk If I recall correctly at some point we had an issue about returning molecules that were not in the system from the similarity searches.
Confirm, [1] without the sources works with the encoded string:
http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&searchOptions.Molecule=OC%5BC%40%40H%5D(O)%5BC%40%40H%5D(O)C%3DO
Should I remove the restrictions ?
Regarding similarity searches. You probably referring to similarity searches returning virtual records (parents). To filter the results and get real only compounds, please use RealOnly property of CSCSearchScopeOptions
http://ops.rsc.org/JSON.ashx#CommonSearchOptions
A first test without the data sources works :
So you are saying [1] should be :
http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&searchOptions.Molecule={searchOptions.Molecule}&CSCSearchScopeOptions.RealOnly=true
??
The same data source parameters are also currently used in 'Chemical Structure Search: Similarity' and 'Chemical Structure Search: Substructure'
Should they also be removed ? Should 'CSCSearchScopeOptions.RealOnly=true' be added also to those requests ?
Yes, I would use
CSCSearchScopeOptions.RealOnly=true
anytime you show the results to end user. However, you might want to get virtual compound sometimes in some workflows as an intermediate step.
OK - Here are the new templates for the 3 methods :
Chemical Structure Search: Similarity http://ops.rsc.org/api/v1/JSON.ashx?op=SimilaritySearch&CSCSearchScopeOptions.RealOnly=true&searchOptions.Molecule={searchOptions.Molecule}
Chemical Structure Search: Substructure http://ops.rsc.org/api/v1/JSON.ashx?op=SubStructureSearch&CSCSearchScopeOptions.RealOnly=true&searchOptions.Molecule={searchOptions.Molecule}
Chemical Structure Search: Exact http://ops.rsc.org/api/v1/JSON.ashx?op=ExactStructureSearch&CSCSearchScopeOptions.RealOnly=true&searchOptions.Molecule={searchOptions.Molecule}
@danidi , @ChristineChichester can we get some testing on this (ops2 or https://dev.openphacts.org/docs/develop ) before deploying to OL as a hot fix ?
When searching for a compound, for example in this case Erythrose, using the SMILES formula obtained from OpenPHACTS (http://ops.rsc.org/OPS1782071), it brings up the error message "Internal server error" with error code 500. @StefanSenger