Make the parser always successfully parse XML documents that contain entity references for surrogate pairs.
A document that contains a character entity reference that resolves to a surrogate pair – for example <doc>��</doc> – is technically invalid according to the XML specification. However, replacing these with U+FFFD is both better for the consuming application (because the application does not need to do some weird pre-processing of the document before parsing it) and has precedent in that the standard Go XML parser in its standard configuration replaces these with U+FFFD.
Make the parser always successfully parse XML documents that contain entity references for surrogate pairs.
A document that contains a character entity reference that resolves to a surrogate pair – for example
<doc>��</doc>
– is technically invalid according to the XML specification. However, replacing these with U+FFFD is both better for the consuming application (because the application does not need to do some weird pre-processing of the document before parsing it) and has precedent in that the standard Go XML parser in its standard configuration replaces these with U+FFFD.Fixes #187
Note: this is mutually exclusive with #189