Closed namjug-kim closed 6 years ago
Thanks @finalnova for reporting the issue. We will get back on this.
@finalnova If I am understanding correctly, this issue would be present in all the instances where we are converting string to bytes?
@finalnova APIs are working just fine irrespective of the encoding. I tried below encodings while doing getbytes for making api calls:
US-ASCII | Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set ISO-8859-1 | ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1 UTF-8 | Eight-bit UCS Transformation Format UTF-16 | Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark
@deepak2016 sorry. I guess I did not have enough explanation. if filename is written in Korean (or other 3~4byte language) when trying to rename(PATCH request) DriveItem.name field is not always working fine.
DriveItem renameItem = new DriveItem();
renameItem.name = "테스트문서";
DriveItem patch = graphClient.me()
.drive()
.items(itemId)
.buildRequest()
.patch(renameItem);
If run the above code in two different encodings(UTF-8, ISO-8859-1), the result is:
-Dfile.encoding=UTF-8
- 200 OK
-Dfile.encoding=ISO-8859-1
- 400 : Bad Request
@namjug-kim I am able to reproduce the issue now. Thanks for putting up an example. Regarding the fix, I'm thinking if encoding should always be UTF-8 or is there any scenario where we would like it to be configurable?
@deepak2016 In my case, UTF-8 worked fine in all situations. I think it's okay to fix UTF-8.
Korean - OK
English - OK
Emoji(2~4byte) -OK
I agree that the encoding should be specified to every call to getBytes
in the codebase.
@deepak2016 We shouldn't be guessing what the right encoding is via a couple of tests but rather we should know.
From the JSON standard (https://www.ietf.org/rfc/rfc4627.txt)
3. Encoding
JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8
It is likely that UTF-8 is the right choice but the Graph API team should confirm this (I did not find it any doco with a brief look).
On this Graph documentation page it indicates the content type returned from a sample request that did not specify an encoding. Based on this fragment I think it is safe to assume the default encoding is UTF-8.
Request:
POST https://login.microsoftonline.com/common/oauth2/v2.0/token
Content-Type: application/x-www-form-urlencoded
{
grant_type=authorization_code
&code=AwABAAAA...cZZ6IgAA
&redirect_uri=http%3A%2F%2Flocalhost%2Fmyapp%2F
&resource=https%3A%2F%2Fgraph.microsoft.com%2F
&scope=mail.read
&client\_id=<app ID>
&client\_secret=<app SECRET>
}
Response:
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
@davidmoten Since the IETF has got involved in the JSON spec it has changed a few times. The latest incarnation from late last year is even more restrictive on the use of encodings.
8.1. Character Encoding
JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8 [RFC3629].
Defaulting to UTF-8 seems like the right thing to do for Microsoft Graph.
@darrelmiller agreed. Thanks for the updated spec, that's interesting.
This issue is fixed with PR: https://github.com/microsoftgraph/msgraph-sdk-java/pull/109
Expected behavior
in DefaultHttpProvider#sendRequestInternal line 257
change body String to byte using
String#getBytes
it depends onplatform's default charset
I use utf-8 and it may not work properly depending on the platform environment.
It would be better if could set up in DefaultClientConfig.