Closed vshabanov closed 7 years ago
Can you please try bisecting to check if it is due to that change?
Yes. It was precisely due to https://github.com/bos/aeson/commit/2f24e555d86a36fdda6d4cad79976004b382ab3b change. It turned out to be a simple off-by-one error. I've made a pull request which fixes it https://github.com/bos/aeson/pull/477.
Previous aeson UTF-16 decoder didn't handled \uFFFF
character (the only one that wasn't handled). Fixed decoder handles everything.
Released in v1.0.2.1!
Aeson can't decode some characters, for example U+1F3FF
I've hacked a test to check out what other valid characters are affected https://gist.github.com/vshabanov/4653f07311fc61bc397cc53db98f2407 here are results:
There is a clear pattern here. And it becomes even more clear if you check every possible UTF-16 character (not only Unicode 9 ones) https://gist.github.com/vshabanov/4653f07311fc61bc397cc53db98f2407#file-output2-txt
I suspect that it's related to https://github.com/bos/aeson/blob/master/cbits/unescape_string.c but I don't understand completely what this code do and how to fix it.