JSON uses 2 byte hex encoding for unicode escapes. This can cause lone surrogate problems in UTF-8 encodings that need 4 bytes ( see https://unicode-example-characters.glitch.me/ ). For example:
π₯ can be encoded as "\ud83d\udd25".
πΈπ― can be encoded as "\ud83d\udeb8\ud83d\udeaf".
Currently attempting to decode these results in errors:
let json = b"\"\\ud83d\\udd25\"";
let mut reader = JsonReader::from_reader(Cursor::new(json));
let mut buffer = Vec::new();
reader.read_event(&mut buffer)?;
panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: InvalidData, error: "Invalid encoded unicode code point" }', src/main.rs:86:15
Thank you for this bug report! I should have indeed added support for surrogate pairs. I have just implemented them and released a v0.1.1 version. Your test now passes. Thank you again!
JSON uses 2 byte hex encoding for unicode escapes. This can cause lone surrogate problems in UTF-8 encodings that need 4 bytes ( see https://unicode-example-characters.glitch.me/ ). For example:
π₯ can be encoded as "\ud83d\udd25". πΈπ― can be encoded as "\ud83d\udeb8\ud83d\udeaf".
Currently attempting to decode these results in errors: