Open leeharland opened 10 years ago
valery confirmed OCR fine, assigning to open link as next step
ok thanks to some great detective work this is becuase urls like: https://beta.openphacts.org/1.3/structure/similarity?app_id=y&app_key=x&searchOptions.Molecule=CC(%3DO)Oc1ccccc1C(%3DO)O&searchOptions.SimilarityType=0
dont specify any thresholds/limits so return a very large result sets. Investigating next steps with defaults perhaps
would be good to get some info from open link on where this died - presumably too much data or a timeout?
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On 24/04/2014 11:40, Lee Harland wrote:
would be good to get some info from open link on where this died - presumably too much data or a timeout?
— Reply to this email directly or view it on GitHub https://github.com/openphacts/GLOBAL/issues/64#issuecomment-41261316.
I'll check if there's something in the logs to indicate what happened in a bit.
Yrjänä
Yrjänä Rankka (ghard@zonk.net) Grand Praetor of Excruciations - ZONK.NET Propaganda HQ ZONK.NET - Advancing the Thermal Death of the Multiverse Since 1998 -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.22 (Darwin) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQIcBAEBCgAGBQJTWNxwAAoJEPwzzlnBROJ7gP4P/1t+UOhvn/k2SxfsSvXm+QK7 rah2l3sWuSRCNVaJFNKUSPS7FdZ+TM4qJ6/Ez5X3wvxP6sLysYlYMdx2xXV1SSxm WkOhZJNeystPDbNDaUplPMGP2uldTykFJX3K1ChZ7Ci81/mkew3eUlGHiEYADumv iU5lNyZ1M97tKteMYx11HlolqHY1QYzUQa74JqoLX6Rlf3Vitg9vyB+mcQs44rc3 be0pkoebA5diFg1foLht9ToNxQ8TJn5BywSWhKFDBtJ2eVkL/9PYvIlA0fkQhprs jZDX7Ri3128KpQruPZx4r3RR4/HLcwtLi8C8RFAZpxBeMieAN60P5cni9h8PuLH+ pWM+Zc0+k+1kEWwYdw6b82E/ge+/DvopYQG5FU9m2xGH5uih86ULw+DB0TpPeA+7 07fE1Cr2GomLmvGDraUcTGwp112ZBtaY5jZX9jmjgKpGpzEs7t/AFRthMEEAIW5J 8FuZ6e8W6e8C5gLMUhFbFwAYzh9BLcLyWgbY+osZ/sG2OqXc5v1r+G0bszsIz3FE 2JwbWsQq6pNNBysujdZDr7/iypNEaj+860nbeNT4UgJF2V6BdPzRDl66n47D8axV cZAnbW9hPfsSPKkrFQJYP0UnqF7ZgFK+ee10ZAt8FxM0xyUwz/+vZOW/vikRwcu8 5l/x12fSPY7OB3DJFsHI =3ZzM -----END PGP SIGNATURE-----
@ghard - any update?
The too much data queries still seem to cause problems but with limits it is functioning (at least for my tests) This works https://beta.openphacts.org/1.4/structure/similarity?app_id=18983b12&app_key=c99cf43da48a1a2f9069651fe6be7c06&searchOptions.Molecule=C1%3DCC(%3DCC%3DC1C%5BC%40H%5D(C(%3DO)N%5BC%40H%5D(CCCN%3DC(N)N)C(%3DO)O)N)O&searchOptions.SimilarityType=0&searchOptions.Threshold=0.75 but changing the threshold to 0.50 doesnt.
Antonis confirms we could add threshold limits, for instance >99, 98, 90, 80, 70, and 50% on the similarity search and return 400 when another value is given. This should help but will not guarantee that some results sets will still not cause problems.
Maybe the easiest would be to give a default value of 0.8? So if people forget to specify a threshold they still get data back.
hoping @ghard or @antonisloizou can give us some insight into why its failing
@ChemConnector - what do you think for the default??
A default threshold of 0.8 would be acceptable for sure. A threshold of 0.5 or less makes little sense to me and while the value could be dropped to 0.7 or lower I have always found compounds of interest at >0.8 on ChemSpider, a much larger collection than the OPS-CRS for sure.
Antonis commented about the similarity search, there are 3 options:
Option 2 is the most problematic as it needs to be hardcoded inside the swagger generation script
apologies for being slow today, can someone remind me why we cant just allow any value but if no value is supplied it defaults to 0.8 as tony suggested (and maybe we if <0.5 we force 0.5?) thanks
Does the RSC API allow any value? On the ChemSpider similarity search interface they only give options via a dropdown for certain values (>=99, 95, 90, 80, 70, 50}. For me at least, using .50 from our side didnt work but .75 did.
@ChemConnector @valt could you comment?
That sounds absolutely fine to me Lee. We simply block all searches with values below 0.7 and add a comment to the screen that that is the default (and minimum) value.
@ChemConnector @valt @karapetk
hi folks - hopefully you saw #176 and could we get an update on both of these? thanks
Many times, when the same Euclidean search would work on Chemspider, it fails on the API. Examples include:
C(=C/Cl)\Cl with 0.9 threshold
CCCC(C)C1(C(=O)NC(=NC1=O)[O-])CC.[Na+] with 0.8 threshold
The list goes on. My suspicion is that this is due to the fact that CS has a default value of 100 for search hits limit. When you change that to 1000 all calls fail. Would it be possible to limit the search results by default like previously suggested? or (also as previously mentioned) raise the timeout limit? Perhaps any other solution available?
Here is what we see running both queries in batch mode:
Running similarity (Euclidian 0.9) on 2 SMILES in 8 threads using http://ops.rsc.org/api/v1/JSON.ashx 0: SMILES:CCCC(C)C1(C(=O)NC(=NC1=O)[O-])CC.[Na+]; Count: 8; Duration: 20.627287 1: SMILES:C(=C/Cl)\Cl; Count: 2438; Duration: 66.3834998 Total,2 Errors,0 Success,2 Total Time,66.4174926 sec.
Running similarity (Euclidian 0.8) on 2 SMILES in 8 threads using http://ops.rsc.org/api/v1/JSON.ashx 0: SMILES:CCCC(C)C1(C(=O)NC(=NC1=O)[O-])CC.[Na+]; Count: 276; Duration: 30.7441478 1: SMILES:C(=C/Cl)\Cl; Count: 30094; Duration: 320.8561992 Total,2 Errors,0 Success,2 Total Time,320.9128618 sec.
broken in production (1.3) right now.
[nb some discussion on https://github.com/openphacts/GLOBAL/issues/14]