Open PierreVDL opened 7 years ago
See also https://en.wikipedia.org/wiki/Valid_characters_in_XML. The characters are valid in XML 1.1
Please note that the underlying parser is implementing XML 1.0 fourth edition, neither XML 1.1 nor XML 1.0 fifth edition. Expat ticket https://github.com/libexpat/libexpat/issues/171 may be of interest.
According to the xml standard e.g. http://www.w3.org/TR/xml/#sec-references one can refer to any charachter (including non-printables) using
&#\d+;
and&#\h+;
. However, these characters seem to be ignored by hexpat.For example, the HUnit test case
yields as output
The list is wrong since it contains no elements after the first
&#\h+;
character.Note: I know there is also an error in my test: it assumes only one Text element not a list of Text elements, but this is irrelevant for this problem!