mathewdenis / json-smart

Automatically exported from code.google.com/p/json-smart
0 stars 0 forks source link

JSONValue.parse does not correctly decode UTF-8 bytes from an inputstream #48

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
// Note that the testJsonString includes unicode characters and is UTF-8 endoded

String testJsonString = 
"{\"balance\":1000.21,\"num\":100,\"nickname\":null,\"is_vip\":true,\"Sinhalese\
":\"සිංහල ජාතිය\",\"name\":\"foo\"}";

ByteArrayInputStream bis = new 
ByteArrayInputStream(testJsonString.getBytes(StandardCharsets.UTF_8));

JSONObject obj = JSONValue.parse(bis, JSONObject.class);

obj.get("Sinhalese"); // result is incorrect

What is the expected output? What do you see instead?

I would expect obj.get("Sinhalese") to return the characters in the original 
UTF-8 String

What version of the product are you using? On what operating system?

Using json-smart 2.0, openjdk 7 on freebsd

Please provide any additional information below.

Note that:

JSONValue.parse(ByteStreams.toByteArray(bis), JSONObject.class);

works correctly. So the code works fine when decoding byte arrays

Original issue reported on code.google.com by patrick....@gmail.com on 13 Sep 2014 at 6:42

GoogleCodeExporter commented 9 years ago
I tried my hand at a fix for this for version 1 and created a pull request for 
it.
I'd be happy to try applying a similar for to version 2, but am less familiar 
with it.

Regardless of whether the pull request is accepted or not, a workaround could 
be to just wrap the InputStream in a InputStreamReader 
(http://docs.oracle.com/javase/7/docs/api/java/io/InputStreamReader.html) and 
use the parse methods that take a Reader instead. e.g.

{{{
String testJsonString = 
"{\"balance\":1000.21,\"num\":100,\"nickname\":null,\"is_vip\":true,\"Sinhalese\
":\"සිංහල ජාතිය\",\"name\":\"foo\"}";

ByteArrayInputStream bis = new 
ByteArrayInputStream(testJsonString.getBytes(StandardCharsets.UTF_8));

JSONObject obj = JSONValue.parse(new InputStreamReader(bis, 
StandardCharsets.UTF_8), JSONObject.class);
}}}

I noticed this same issue whilst using V1 of the library and parsing JSON 
serially. The same workaround should work there too, e.g.:

{{{
JSONValue.SAXParse(new InputStreamReader(bis, StandardCharsets.UTF_8), 
someHandler);
}}}

Original comment by toadm...@googlemail.com on 4 Feb 2015 at 4:45