Currently, this library follows the example of yxml in just passing through byte values >= 128 without validating anything about them, which means any ASCII-compatible encoding is supported, but without validations on correctness. However, other parts of the library (such as character references when reading to Nodes) use UTF-8.
It is not entirely clear what the best solution to this is. Following std.json, the best initial solution would probably be to validate UTF-8 as part of the state machine. Support for other encodings could be done by translating to UTF-8 on the fly using a wrapper around a Scanner, but this is lower priority.
Currently, this library follows the example of yxml in just passing through byte values >= 128 without validating anything about them, which means any ASCII-compatible encoding is supported, but without validations on correctness. However, other parts of the library (such as character references when reading to
Node
s) use UTF-8.It is not entirely clear what the best solution to this is. Following
std.json
, the best initial solution would probably be to validate UTF-8 as part of the state machine. Support for other encodings could be done by translating to UTF-8 on the fly using a wrapper around aScanner
, but this is lower priority.