wenhao001 / rest-assured

Automatically exported from code.google.com/p/rest-assured
0 stars 0 forks source link

ContentType.JSON should automatically encode as UTF-8 #412

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?

1. Set the contentType of a request to `ContentType.JSON`
    1.b do NOT set the character encoding
2. Add a Unicode body with characters not within ISO-8859-1
3. Send the data.

What is the expected output? What do you see instead?

 - Expect: the data to be sent as specified.

 - Actual: non ISO-8859-1 characters are mangled.

What version of the product are you using? On what operating system?

   2.4.1, OSX 10.9, 10.10

Please provide any additional information below.

When sending JSON data, any non ISO-8859-1 characters are mangled unless the 
encoding is specified as UTF-8.

Issue #228 initially brought up the issue, but the solution to specify a 
charset of the content-type, while useful for others, is unnecessary for JSON, 
and arguably wrong.

From rfc4627: http://www.ietf.org/rfc/rfc4627.txt

   JSON text SHALL be encoded in Unicode.  The default encoding is UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

Accordingly, charset parameter is not allowed in the JSON MIME type 
specifically because the unicode variant can be determined from the content:

The MIME media type for JSON text is application/json.

   Type name: application

   Subtype name: json

   Required parameters: n/a

   Optional parameters: n/a

   Encoding considerations: 8bit if UTF-8; binary if UTF-16 or UTF-32

      JSON may be represented using UTF-8, UTF-16, or UTF-32.  When JSON
      is written in UTF-8, JSON is 8bit compatible.  When JSON is
      written in UTF-16 or UTF-32, the binary content-transfer-encoding
      must be used.

The preferred workaround should be to set the defaultContentCharset on the 
encoderConfig(), and avoid unnecessary charset parameter.

    newConfig().encoderConfig(encoderConfig().defaultContentCharset("UTF-8"))

Original issue reported on code.google.com by mchen...@gmail.com on 25 Jun 2015 at 4:42