netvl / xml-rs

An XML library in Rust
MIT License
459 stars 110 forks source link

Always allow surrogate pair entity references #188

Closed bryanburgers closed 4 years ago

bryanburgers commented 4 years ago

Make the parser always successfully parse XML documents that contain entity references for surrogate pairs.

A document that contains a character entity reference that resolves to a surrogate pair – for example <doc>&#xd83d;&#xdd34;</doc> – is technically invalid according to the XML specification. However, replacing these with U+FFFD is both better for the consuming application (because the application does not need to do some weird pre-processing of the document before parsing it) and has precedent in that the standard Go XML parser in its standard configuration replaces these with U+FFFD.


Fixes #187

Note: this is mutually exclusive with #189

netvl commented 4 years ago

As I mentioned in the issue, the configuration approach seems to be more appropriate. Thank you!