openphacts / GLOBAL

Global project issues [private for now. owner lee harland]
3 stars 0 forks source link

'Chemical Structure Search: Substructure' timeout when searching for small substructures #176

Open StefanSenger opened 10 years ago

StefanSenger commented 10 years ago

When performing a substructure search with small substructures (e.g. just a pyridine) ring a 504 error is returned EVEN IF a maximum number of hits of only 10, for example, is specified. Here is an example:

curl -v -X GET "https://beta.openphacts.org/1.3/structure/substructure?app_id=1853f6fb&app_key=a43c21c1f0b61e99ae5b3d49348f54ae&searchOptions.Molecule=c1cccnc1&searchOptions.MolType=0&resultOptions.Count=10"

I have performed the same search on the RSC chemistry server and everything seems to work fine, so I can only conclude that the issue is caused by the way the Open PHACTS API calls is implemented.

Here is what I got performing the search directly: 1) http://ops.rsc.org/JSON.ashx?op=SubstructureSearch&searchOptions.Molecule=c1cccnc1&resultOptions.Limit=10 => 08bf1a0e-b5df-48b2-8daf-e94f60af3a40 2) http://ops.rsc.org/JSON.ashx?op=GetSearchStatus&rid=08bf1a0e-b5df-48b2-8daf-e94f60af3a40 => {"Count":10,"Elapsed":"PT12M50.843S","Message":"Finished","Progress":1,"Status":6} 3) http://ops.rsc.org/JSON.ashx?op=GetSearchResult&rid=08bf1a0e-b5df-48b2-8daf-e94f60af3a40 => [167,69,22,65,17,10,7,4,30,27]

I didn't test it but it is pretty likely that the same problem occurs when performing a similarity search.

An attendee at the community workshop run into this problem and couldn't understand why his substructure search wasn't working. Since people who are new to the API will start with trying simple searches it really is crucial that this is working.

ChristineChichester commented 10 years ago

On Jul 17, 2014, at 9:46 AM, StefanSenger notifications@github.com wrote:

When performing a substructure search with small substructures (e.g. just a pyridine) ring a 504 error is returned EVEN IF a maximum number of hits of only 10, for example, is specified. Here is an example:

Yes for small structures the problem occurs when there is no threshold set. As I understand, even if the number of hits is set for 10, the process still tries to retrieve all and then return the top 10, which ends of giving the error.

I have performed the same search on the RSC chemistry server and everything seems to work fine, so I can only conclude that the issue is caused by the way the Open PHACTS API calls is implemented.

I think that the RSC has a threshold already set in the background. http://ops.rsc.org/JSON.ashx?op=GetSearchResult&rid=08bf1a0e-b5df-48b2-8daf-e94f60af3a40 => [167,69,22,65,17,10,7,4,30,27]

I didn't test it but it is pretty likely that the same problem occurs when performing a similarity search.

Yes, the same thing happens with small molecules, like benzene on the similarity search.

We have a github issue for this, please see: https://github.com/openphacts/GLOBAL/issues/64

An attendee at the community workshop run into this problem and couldn't understand why his substructure search wasn't working. Since people who are new to the API will start with trying simple searches it really is crucial that this is working.

— Reply to this email directly or view it on GitHub.

karapetk commented 9 years ago

Adding @antonisloizou

ChristineChichester commented 9 years ago

The Open PHACTS API is giving an error because it calls the ops.rsc.org which also gives an error (see below) This has been discussed and Ken is looking into the fix. http://ops.rsc.org/JSON.ashx?op=SubStructureSearch&CSCSearchScopeOptions.RealOnly=true&searchOptions.Molecule=c1cccnc1 http://ops.rsc.org/JSON.ashx?op=GetSearchResult&rid=e6fab95a-86ed-472b-bfb7-3352f3cbd36a

karapetk commented 9 years ago

After performing any search using RSC API one has to periodically check the search status: http://ops.rsc.org/JSON.ashx#GetSearchStatus

Once request status is "ResultReady" then continue to get the actual results: http://ops.rsc.org/JSON.ashx#ERequestStatus

If you skip status check and try to get results you may get server error on our side.

karapetk commented 9 years ago

@antonisloizou Are you periodically checking status before pulling results?

antonisloizou commented 9 years ago

Yes, the status is polled for a maximum of 15 minutes, before attempting to get results. Are you saying that eventually you get back results for the Erythrose example? After how long?

Sent from a mobile device, excuse the brevity

----- Reply message ----- From: "Karen Karapetyan" notifications@github.com To: "openphacts/GLOBAL" GLOBAL@noreply.github.com Cc: "Antonis Loizou" antonis.loizou@gmail.com Subject: [GLOBAL] 'Chemical Structure Search: Substructure' timeout when searching for small substructures (#176) Date: Wed, Oct 15, 2014 15:17

@antonisloizou Are you periodically checking status before pulling results?

— Reply to this email directly or view it on GitHub. {"@context":"http://schema.org","@type":"EmailMessage","description":"View this Issue on GitHub","action":{"@type":"ViewAction","url":"https://github.com/openphacts/GLOBAL/issues/176#issuecomment-59196421","name":"View Issue"}}

danidi commented 9 years ago

Erythrose works fine for similarity search (with a cutoff of 0.9) and in the substructure search. The exact search works on develop only (compare https://github.com/openphacts/GLOBAL/issues/198).

The example Christine gave in her last comment returns an empty set [] after some time, which shouldn't be the case I think.

valt commented 8 years ago

https://rsc-solutions.atlassian.net/browse/OPS-124

danidi commented 8 years ago

This is still an open issue for 2.0 https://beta.openphacts.org/2.0/structure/substructure?app_id=1853f6fb&app_key=a43c21c1f0b61e99ae5b3d49348f54ae&searchOptions.Molecule=c1cccnc1&searchOptions.MolType=0&resultOptions.Count=10