Open dralley opened 2 years ago
Preview
This is effect from switching to jetscii
?
Yes, and it only requires about 3 lines of change. I'm going to see if it can be improved any further and whether the occasional regressions can be eliminated.
Some reading material, not so much for escape routines specifically but parsing XML (actually HTML) in general
https://lemire.me/blog/2024/06/08/scan-html-faster-with-simd-instructions-chrome-edition/
Unlike the unescape routines, the routines for escaping text don't currently utilize any SIMD accelleration.
This should be possible to do via the
jetscii
crate.memchr
is currently used by the unescape routines, but while it is supposed to be slightly faster thanjetscii
it is also more limited and can only handle searching for up to 3 different bytes at a time, whereasjetscii
can handle up to 16. Since escaping text requires searching for up to 5 characters<>&" '
,memchr
is not an option butjetscii
is.jetscii
also seems capable of searching for recognizing byte sequences as well as single bytes, so it could potentially be used with UTF-16 and other multibyte encodings in the future (but I don't think you can search for multiple byte-sequence-patterns at the same time, so there's limitations to this).Benchmark coverage needs to be added first: https://github.com/tafia/quick-xml/issues/404