phenoscape / phenoscape-kb-services

Web services application for the Phenoscape RDF knowledgebase.
https://kb.phenoscape.org/apidocs/#/
MIT License
1 stars 3 forks source link

Certain study IRIs cause a 500 internal server error when requesting the matrix #42

Closed hlapp closed 8 years ago

hlapp commented 8 years ago

The /api/study/matrix API endpoint results in a 500 Internal Server Error on certain study IRIs, for example https://scholar.google.com/scholar?q=Análise+Filogenética+da+Fam%C3%ADlia+Heptapteridae+%28Teleostei%2C+Ostariophysi%2C+Siluriformes%29&btnG=&hl=en&as_sdt=0%2C42

$ curl -s --dump-header - -o out 'http://kb.phenoscape.org/api/study/matrix?iri=https%3A%2F%2Fscholar.google.com%2Fscholar%3Fq%3DAn%C3%A1lise%2BFilogen%C3%A9tica%2Bda%2BFam%25C3%25ADlia%2BHeptapteridae%2B%2528Teleostei%252C%2BOstariophysi%252C%2BSiluriformes%2529%26btnG%3D%26hl%3Den%26as_sdt%3D0%252C42'
HTTP/1.1 500 Internal Server Error
Date: Mon, 27 Jun 2016 17:22:32 GMT
Server: spray-can/1.3.3
Content-Type: text/plain; charset=UTF-8
Content-Length: 35
Connection: close

This failure is not generally true for IRIs from Google Scholar; for example, https://scholar.google.com/scholar?q=hylogenetic+studies+of+the+amblycipitid+catfishes+%28Teleostei%2C+Siluriformes%29+with+species+accounts&btnG=&hl=en&as_sdt=0%2C42 works just fine. (BTW why is this IRI lacking the leading 'p'?)

balhoff commented 8 years ago

The problem IRI has accented characters in it; it's being rejected at the level of the Spray HTTP toolkit. I'm not sure whether that is correct or not. These Google Scholar IRIs are just search results, presumably because the person who entered them could not find a good "canonical" IRI for the study. The missing "p" is probably the result of copy/paste when the curator was searching for the paper.

It would be great if someone could review study IRIs in Phenex files and see if better IRIs can be used. I could generate a list of them all to make it a bit easier.

hlapp commented 8 years ago

The problem IRI has accented characters in it

Well, it's an IRI (in contrast to a URI).

it's being rejected at the level of the Spray HTTP toolkit. I'm not sure whether that is correct or not.

How can that be correct if it's properly URL-encoded (which as you can see it is in the actual request!).

balhoff commented 8 years ago

The accented characters aren't URL-encoded. I suspect they are valid as IRIs, but I am not certain.

balhoff commented 8 years ago

Oh, I see, in the request! Sorry, I was looking at your Google scholar URL above.

balhoff commented 8 years ago

Okay, now when I submit that query directly to the web application, it works fine. So probably it is a problem with the Apache proxy? Getting a little more confusing. :-)

balhoff commented 8 years ago

Fixed by adding -Dfile.encoding=UTF-8 to the Java startup parameters for the web application. This was very confusing because it was working fine when run from within a build environment on the same server.

hlapp commented 8 years ago

Yes, I can confirm it's working now. 👍