tarekmed / gbif-ecat

Automatically exported from code.google.com/p/gbif-ecat
0 stars 0 forks source link

ECat Webservices returning content with wrong encoding #61

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Query anything through the WebService and request UTF-8 and you may/will get 
ISO-8859-1 back:

See response headers:
 curl "http://ecat-dev.gbif.org/ws/usage/?pagesize=100&rkey=1&showVernaculars=all&page=1&q=basiliscus&showRanks=kpcofg&showIds=true" -H "Accept-Charset: utf-8" -I

See response:
 curl "http://ecat-dev.gbif.org/ws/usage/?pagesize=100&rkey=1&showVernaculars=all&page=1&q=basiliscus&showRanks=kpcofg&showIds=true" -H "Accept-Charset: utf-8" > original.txt

iconv -f iso-8859-1 -t utf-8 original.txt > converted.txt

The culprit in this case is the name Basiliscus galeritus Duméril, 1851. E9 is 
the ISO-8859-1 character code for that letter (latin small letter e with acute, 
http://en.wikipedia.org/wiki/%C3%89)

hexdump original.txt | grep " e9"
00008d0 72 69 74 75 73 20 44 75 6d e9 72 69 6c 2c 20 31

same line in the converted thing:
hexdump converted.txt | grep "00008d0"
00008d0 72 69 74 75 73 20 44 75 6d c3 a9 72 69 6c 2c 20

Same line the only difference is e9 in the original becomes c3 a9 in the 
converted thing which is the UTF-8 encoded value of that character: 
http://graphemica.com/%C3%A9

Looks like the json-lib in use there completely ignores any character encoding 
things and the WebService itself doesn't do any processing either, additionally 
the headers are wrong because it replies saying that the encoding of the reply 
is UTF-8.

Oliver found this because the new Portal now only works for any CLB responses 
that don't include any conflicting characters.

Original issue reported on code.google.com by lars.fra...@gmail.com on 4 Aug 2011 at 3:37

GoogleCodeExporter commented 8 years ago
have you tried using something like?
@Produces(...,MediaType.APPLICATION_JSON + ";charset=UTF-8"...,)
myServiceMethod()...

Original comment by federic...@gmail.com on 8 Aug 2011 at 10:41

GoogleCodeExporter commented 8 years ago
The WebService is not using Jersey unfortunately. It's all custom Servlet stuff.

Original comment by lars.fra...@gmail.com on 8 Aug 2011 at 10:42

GoogleCodeExporter commented 8 years ago
As the services will be rewritten in jersey anyway - is there a need to fix 
this still?

Original comment by wixner@gmail.com on 5 Sep 2011 at 11:36

GoogleCodeExporter commented 8 years ago
I'll let lars close it, but I think not.  As this is a public facing service 
where the new one wouldn't be (at least at first), it might still be warranted, 
but probably not worth the effort.

Original comment by oliver.m...@gmail.com on 5 Sep 2011 at 11:39

GoogleCodeExporter commented 8 years ago
If the WebServices will be rewritten and we won't fix the old one then I'm 
closing this as WontFix.

Original comment by lars.fra...@gmail.com on 6 Sep 2011 at 7:18