kuroko-lang / kuroko

Dialect of Python with explicit variable declaration and block scoping, with a lightweight and easy-to-embed bytecode compiler and interpreter.
https://kuroko-lang.github.io/
MIT License
431 stars 25 forks source link

Fix an oversight in the UTF-32 endian sniffing. #18

Closed harjitmoe closed 3 years ago

harjitmoe commented 3 years ago

I'd mentioned in file comments that I was using the heuristic of characters at the start of the plane being rare, but it transpires I hadn't actually implemented said heuristic, only having implemented the detection of the high eight bits (which can be expanded to eleven) having to be false, which does not imply it. This adds it.

This change improves handling for UTF-32 bytes starting with code points in the form U+xxxx00 (such as Ā, U+0100) when stored without a byte-order mark, when passed to the UTF-32 codec without an explicit byte order.