ianprime0509 / zig-xml

XML parser for Zig
http://ianjohnson.dev/zig-xml/
BSD Zero Clause License
17 stars 4 forks source link

Validate UTF-8 more strictly #7

Closed ianprime0509 closed 1 year ago

ianprime0509 commented 1 year ago

Currently, this library follows the example of yxml in just passing through byte values >= 128 without validating anything about them, which means any ASCII-compatible encoding is supported, but without validations on correctness. However, other parts of the library (such as character references when reading to Nodes) use UTF-8.

It is not entirely clear what the best solution to this is. Following std.json, the best initial solution would probably be to validate UTF-8 as part of the state machine. Support for other encodings could be done by translating to UTF-8 on the fly using a wrapper around a Scanner, but this is lower priority.