Closed samuel closed 11 years ago
Hm, I wonder if this changed in Go 1.1, as I'm pretty sure these all passed previously. Thanks for the heads up.
Verified that this passes in go1.0.2
I think I had grabbed the example from: http://www.fileformat.info/info/unicode/char/1d11e/index.htm C/C++/Java source code "\uD834\uDD1E"
From http://en.wikipedia.org/wiki/UTF-8 "According to the UTF-8 definition (RFC 3629) the high and low surrogate halves used by UTF-16 (U+D800 through U+DFFF) are not legal Unicode values, and the UTF-8 encoding of them is an invalid byte sequence and thus should be treated as described above."
So I think the test case itself is incorrect and Go1.1 probably got more strict. So I'll remove the test and close this issue.
It seems that strconv.Unquote calls utf8.DecodeRuneInString which returns RuneError ('\uFFFD') for each part of the surrogate pair (since it doesn't understand them). The way encoding/json handles this is to not use strconv.Unquote but rather handle the unquoting itself (there's a comment in json/decode.go : "The rules are different than for Go, so cannot use strconv.Unquote" which may just be for unicode support)
I would look into fixing this but I haven't run into it in the wild.. just in the unit tests :)