Open StefanSenger opened 10 years ago
On Jul 17, 2014, at 9:46 AM, StefanSenger notifications@github.com wrote:
When performing a substructure search with small substructures (e.g. just a pyridine) ring a 504 error is returned EVEN IF a maximum number of hits of only 10, for example, is specified. Here is an example:
Yes for small structures the problem occurs when there is no threshold set. As I understand, even if the number of hits is set for 10, the process still tries to retrieve all and then return the top 10, which ends of giving the error.
I have performed the same search on the RSC chemistry server and everything seems to work fine, so I can only conclude that the issue is caused by the way the Open PHACTS API calls is implemented.
I think that the RSC has a threshold already set in the background. http://ops.rsc.org/JSON.ashx?op=GetSearchResult&rid=08bf1a0e-b5df-48b2-8daf-e94f60af3a40 => [167,69,22,65,17,10,7,4,30,27]
I didn't test it but it is pretty likely that the same problem occurs when performing a similarity search.
Yes, the same thing happens with small molecules, like benzene on the similarity search.
We have a github issue for this, please see: https://github.com/openphacts/GLOBAL/issues/64
An attendee at the community workshop run into this problem and couldn't understand why his substructure search wasn't working. Since people who are new to the API will start with trying simple searches it really is crucial that this is working.
— Reply to this email directly or view it on GitHub.
Adding @antonisloizou
The Open PHACTS API is giving an error because it calls the ops.rsc.org which also gives an error (see below) This has been discussed and Ken is looking into the fix. http://ops.rsc.org/JSON.ashx?op=SubStructureSearch&CSCSearchScopeOptions.RealOnly=true&searchOptions.Molecule=c1cccnc1 http://ops.rsc.org/JSON.ashx?op=GetSearchResult&rid=e6fab95a-86ed-472b-bfb7-3352f3cbd36a
After performing any search using RSC API one has to periodically check the search status: http://ops.rsc.org/JSON.ashx#GetSearchStatus
Once request status is "ResultReady" then continue to get the actual results: http://ops.rsc.org/JSON.ashx#ERequestStatus
If you skip status check and try to get results you may get server error on our side.
@antonisloizou Are you periodically checking status before pulling results?
Yes, the status is polled for a maximum of 15 minutes, before attempting to get results. Are you saying that eventually you get back results for the Erythrose example? After how long?
Sent from a mobile device, excuse the brevity
----- Reply message ----- From: "Karen Karapetyan" notifications@github.com To: "openphacts/GLOBAL" GLOBAL@noreply.github.com Cc: "Antonis Loizou" antonis.loizou@gmail.com Subject: [GLOBAL] 'Chemical Structure Search: Substructure' timeout when searching for small substructures (#176) Date: Wed, Oct 15, 2014 15:17
@antonisloizou Are you periodically checking status before pulling results?
— Reply to this email directly or view it on GitHub. {"@context":"http://schema.org","@type":"EmailMessage","description":"View this Issue on GitHub","action":{"@type":"ViewAction","url":"https://github.com/openphacts/GLOBAL/issues/176#issuecomment-59196421","name":"View Issue"}}
Erythrose works fine for similarity search (with a cutoff of 0.9) and in the substructure search. The exact search works on develop only (compare https://github.com/openphacts/GLOBAL/issues/198).
The example Christine gave in her last comment returns an empty set [] after some time, which shouldn't be the case I think.
When performing a substructure search with small substructures (e.g. just a pyridine) ring a 504 error is returned EVEN IF a maximum number of hits of only 10, for example, is specified. Here is an example:
curl -v -X GET "https://beta.openphacts.org/1.3/structure/substructure?app_id=1853f6fb&app_key=a43c21c1f0b61e99ae5b3d49348f54ae&searchOptions.Molecule=c1cccnc1&searchOptions.MolType=0&resultOptions.Count=10"
I have performed the same search on the RSC chemistry server and everything seems to work fine, so I can only conclude that the issue is caused by the way the Open PHACTS API calls is implemented.
Here is what I got performing the search directly: 1) http://ops.rsc.org/JSON.ashx?op=SubstructureSearch&searchOptions.Molecule=c1cccnc1&resultOptions.Limit=10 => 08bf1a0e-b5df-48b2-8daf-e94f60af3a40 2) http://ops.rsc.org/JSON.ashx?op=GetSearchStatus&rid=08bf1a0e-b5df-48b2-8daf-e94f60af3a40 => {"Count":10,"Elapsed":"PT12M50.843S","Message":"Finished","Progress":1,"Status":6} 3) http://ops.rsc.org/JSON.ashx?op=GetSearchResult&rid=08bf1a0e-b5df-48b2-8daf-e94f60af3a40 => [167,69,22,65,17,10,7,4,30,27]
I didn't test it but it is pretty likely that the same problem occurs when performing a similarity search.
An attendee at the community workshop run into this problem and couldn't understand why his substructure search wasn't working. Since people who are new to the API will start with trying simple searches it really is crucial that this is working.