DefaultHttpProvider body encoding problem

microsoftgraph / msgraph-sdk-java

Microsoft Graph SDK for Java

https://docs.microsoft.com/en-us/graph/sdks/sdks-overview

MIT License

402 stars 134 forks source link

DefaultHttpProvider body encoding problem #95

Closed namjug-kim closed 6 years ago

namjug-kim commented 6 years ago

Expected behavior

in DefaultHttpProvider#sendRequestInternal line 257

logger.logDebug("Sending " + serializable.getClass().getName() + " as request body");
final String serializeObject = serializer.serializeObject(serializable);
bytesToWrite = serializeObject.getBytes();

change body String to byte using String#getBytes it depends on platform's default charset

I use utf-8 and it may not work properly depending on the platform environment.

It would be better if could set up in DefaultClientConfig.

deepak2016 commented 6 years ago

Thanks @finalnova for reporting the issue. We will get back on this.

deepak2016 commented 6 years ago

@finalnova If I am understanding correctly, this issue would be present in all the instances where we are converting string to bytes?

deepak2016 commented 6 years ago

@finalnova APIs are working just fine irrespective of the encoding. I tried below encodings while doing getbytes for making api calls:

US-ASCII | Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set ISO-8859-1 | ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1 UTF-8 | Eight-bit UCS Transformation Format UTF-16 | Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark

namjug-kim commented 6 years ago

@deepak2016 sorry. I guess I did not have enough explanation. if filename is written in Korean (or other 3~4byte language) when trying to rename(PATCH request) DriveItem.name field is not always working fine.

DriveItem renameItem = new DriveItem();
renameItem.name = "테스트문서";

DriveItem patch = graphClient.me()
        .drive()
        .items(itemId)
        .buildRequest()
        .patch(renameItem);

If run the above code in two different encodings(UTF-8, ISO-8859-1), the result is:

-Dfile.encoding=UTF-8
  - 200 OK

-Dfile.encoding=ISO-8859-1
  - 400 : Bad Request

deepak2016 commented 6 years ago

@namjug-kim I am able to reproduce the issue now. Thanks for putting up an example. Regarding the fix, I'm thinking if encoding should always be UTF-8 or is there any scenario where we would like it to be configurable?

namjug-kim commented 6 years ago

@deepak2016 In my case, UTF-8 worked fine in all situations. I think it's okay to fix UTF-8.

Korean - OK
English - OK
Emoji(2~4byte) -OK

davidmoten commented 6 years ago

I agree that the encoding should be specified to every call to getBytes in the codebase.

@deepak2016 We shouldn't be guessing what the right encoding is via a couple of tests but rather we should know.

From the JSON standard (https://www.ietf.org/rfc/rfc4627.txt)

3.  Encoding

   JSON text SHALL be encoded in Unicode.  The default encoding is
   UTF-8.

   Since the first two characters of a JSON text will always be ASCII
   characters [RFC0020], it is possible to determine whether an octet
   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
   at the pattern of nulls in the first four octets.

           00 00 00 xx  UTF-32BE
           00 xx 00 xx  UTF-16BE
           xx 00 00 00  UTF-32LE
           xx 00 xx 00  UTF-16LE
           xx xx xx xx  UTF-8

It is likely that UTF-8 is the right choice but the Graph API team should confirm this (I did not find it any doco with a brief look).

davidmoten commented 6 years ago

On this Graph documentation page it indicates the content type returned from a sample request that did not specify an encoding. Based on this fragment I think it is safe to assume the default encoding is UTF-8.

Request:

POST https://login.microsoftonline.com/common/oauth2/v2.0/token
Content-Type: application/x-www-form-urlencoded

{
  grant_type=authorization_code
  &code=AwABAAAA...cZZ6IgAA
  &redirect_uri=http%3A%2F%2Flocalhost%2Fmyapp%2F
  &resource=https%3A%2F%2Fgraph.microsoft.com%2F
  &scope=mail.read
  &client\_id=<app ID>
  &client\_secret=<app SECRET>
}

Response:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

darrelmiller commented 6 years ago

@davidmoten Since the IETF has got involved in the JSON spec it has changed a few times. The latest incarnation from late last year is even more restrictive on the use of encodings.

8.1. Character Encoding

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].

https://tools.ietf.org/html/rfc8259#section-8.1

Defaulting to UTF-8 seems like the right thing to do for Microsoft Graph.

davidmoten commented 6 years ago

@darrelmiller agreed. Thanks for the updated spec, that's interesting.

deepak2016 commented 6 years ago

This issue is fixed with PR: https://github.com/microsoftgraph/msgraph-sdk-java/pull/109