`PrettyPrintWriter` fails to serialize characters in the Unicode Supplementary Multilingual Plane in XML 1.0 mode and XML 1.1 mode

basil commented 1 year ago

PrettyPrintWriter fails to properly serialize characters in the Unicode Supplementary Multilingual Plane (SMP) in XML 1.0 mode and XML 1.1 mode (quirks mode works) with the following exception:

com.thoughtworks.xstream.io.StreamException: Invalid character 0xd83e in XML stream
        at com.thoughtworks.xstream.io.xml.PrettyPrintWriter.writeText(PrettyPrintWriter.java:250)
        at com.thoughtworks.xstream.io.xml.PrettyPrintWriter.writeText(PrettyPrintWriter.java:205)
        at com.thoughtworks.xstream.io.xml.PrettyPrintWriter.setValue(PrettyPrintWriter.java:187)
        at com.thoughtworks.xstream.io.xml.PrettyPrintWriterTest.testSupportsSupplementaryMultilingualPlaneInXml1_0Mode(PrettyPrintWriterTest.java:310)

The root cause of the problem is incorrect iteration over Unicode code points. The current implementation iterates over the UTF-16 representation of the characters rather than iterating over each code point. Characters in the Supplementary Multilingual Plane are encoded in UTF-16 as two digits. For example U+1F98A is encoded in UTF-16 as 0xD83E 0xDD8A. Java provides a dedicated API to iterate over code points, but XStream makes the erroneous assumption that a code point and a character are equivalent, likely because it was never tested outside of quirks mode with characters in the Supplementary Multilingual Plane. This PR fixes the problem by using the Java API for iterating over code points, thus removing the faulty assumption that a code point and a character are equivalent.

The new quirks mode test passes before and after the changes to PrettyPrintWriter. The new XML 1.0 mode and XML 1.1 mode tests fail before the changes to PrettyPrintWriter with the exception given above. The new XML 1.0 mode and XML 1.0 mode tests pass after the changes to PrettyPrintWriter.

Fixes #336

basil commented 1 year ago

Why is this not assigned to the 1.4 milestone? This is a critical bug fix that we want in 1.4.

joehni commented 1 year ago

Because 1.5.x is dropping compatibility to Java 10 to 1.4.

basil commented 1 year ago

I think it would make more sense for the 1.4.x line to require Java 8 or newer or to backport this fix to the 1.4.x line with a for-loop based implementation that can run on Java 7 or earlier.

basil commented 2 weeks ago

Any plans to merge this PR and release version 1.5.x requiring Java 11 or newer?

joehni commented 2 weeks ago

Sorry for the long delay...

x-stream / xstream

`PrettyPrintWriter` fails to serialize characters in the Unicode Supplementary Multilingual Plane in XML 1.0 mode and XML 1.1 mode #337