rlalfo / google-http-java-client

Automatically exported from code.google.com/p/google-http-java-client
0 stars 0 forks source link

UTF-8 encoding isn't assumed for application/json unless charset=utf-8 is also present #221

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Version of google-http-java-client (e.g. 1.15.0-rc)?
1.14-beta

Java environment (e.g. Java 6, Android 2.3, App Engine)?
Android 4.0.4

Describe the problem.
A Django/Tastypie server returns only "Content-Type: application/json" with no 
charset parameter at the end. According to the bug report, they believe that 
this is correct as per RFC4627 (JSON) 
(https://github.com/toastdriven/django-tastypie/issues/717). 

Within google-http-java-client, HttpResponse.getContentCharset() assumes an 
ISO-8859-1 encoding where there is no charset parameter within the returned 
Content-Type field. For "application/json" it should in fact assume the UTF-8 
encoding that is the default according to http://www.ietf.org/rfc/rfc4627.txt

How would you expect it to be fixed?
If the mime type and subtype are application and json respectively, then 
HttpResponse.getContentCharset() should return a UTF-8 charset instead of 
ISO-8859-1 as per the JSON RFC 4627

Original issue reported on code.google.com by steve.s...@ziconix.com on 11 May 2013 at 4:19

GoogleCodeExporter commented 9 years ago
Thanks for the feedback.

Here's the only thing I can find regarding charset in the HTTP spec:

http://tools.ietf.org/html/rfc2616#section-3.7.1
   The "charset" parameter is used with some media types to define the
   character set (section 3.4) of the data. When no explicit charset
   parameter is provided by the sender, media subtypes of the "text"
   type are defined to have a default charset value of "ISO-8859-1" when
   received via HTTP. Data in character sets other than "ISO-8859-1" or
   its subsets MUST be labeled with an appropriate charset value. See
   section 3.4.1 for compatibility problems.

Our goal within HttpResponse.getContentCharset() is to implement this portion 
of the HTTP spec, and not try to provide logic specific to every application 
use case.  Otherwise, this would result in a long and complicated and 
every-growing piece of logic.  Furthermore, the JavaDoc on getContentCharset() 
matches the actual behavior, so technically there is no bug.

That said, I agree with your interpretation of the JSON spec, and I agree that 
it makes it more difficult to implement your use case.  So my recommendation is 
that you not use getContentCharset().  Better for your use case is to call 
getMediaType() (checking for null) and then getCharsetParameter().  That has 
the advantage of returning null for the charset parameter if it not specified, 
which would allow you to do that tricky charset detection algorithm in section 
3.

Original comment by yan...@google.com on 15 Aug 2013 at 12:04