google / guava

Google core libraries for Java
Apache License 2.0
50.2k stars 10.91k forks source link

Presumably bad media type for JSON #915

Closed gissuebot closed 10 years ago

gissuebot commented 10 years ago

Original issue created by j...@nwsnet.de on 2012-03-01 at 05:04 PM


The media type for JSON is defined like this:

public static final MediaType JSON_UTF_8 = new MediaType(APPLICATION_TYPE, "json")
    .withCharset(UTF_8);

I.e. "application/json; charset=utf-8".

However, while JSON is a text format, it is part of the "application" type group, while the "charset" parameter is (AFAIK) only applied to text types.

RFC 4627 says this in "6. IANA Considerations":

   The MIME media type for JSON text is application/json.

   Type name: application

   Subtype name: json

   Required parameters: n/a

   Optional parameters: n/a

   Encoding considerations: 8bit if UTF-8; binary if UTF-16 or UTF-32

  JSON may be represented using UTF-8, UTF-16, or UTF-32.  When JSON
  is written in UTF-8, JSON is 8bit compatible.  When JSON is
  written in UTF-16 or UTF-32, the binary content-transfer-encoding
  must be used.

Ergo, the charset parameter must be dropped from the constant.

The same might apply to "application/javascript" ("text/javascript" exists, but is considered obsolete), though I didn't check that.

gissuebot commented 10 years ago

Original comment posted by gak@google.com on 2012-03-02 at 05:13 PM


I disagree with your assessment that it must be dropped. RFC 2046 states that "other media types than subtypes of "text" might choose to employ the charset parameter as defined here," which indicates that there is no restriction on the presence of the charset parameter on application types. Additionally, RFC 2045 states that "MIME implementations must ignore any parameters whose names they do not recognize." So, it is not reasonable to assume that there is any harm being done by its presence.

The charset parameter is there because browsers that attempt to sniff the charset when its not present are vulnerable to certain types of exploits. So, we have defaulted to adding the charset to any media type that is likely to be served to and interpreted by a browser. Without any evidence that this is actually incompatible with existing code/services, I'm going to leave it alone.

Finally, if you truly do need that media type without the parameter, the withoutParameters() method should do the trick.


Status: WorkingAsIntended

gissuebot commented 10 years ago

Original comment posted by j...@nwsnet.de on 2012-03-05 at 09:30 AM


OK, your justification for adding the charset seems real-worldy enough for me after I found http://bugs.cometd.org/browse/COMETD-55 (via a similar issue I reported for i-jetty at http://code.google.com/p/i-jetty/issues/detail?id=52).

Still, I believe that implementations should be JSON-aware and use the correct default charset (UTF-8) instead of a probably global one.