Open utterances-bot opened 1 month ago
From what I recall the deflate stream wasn't awkward to decode -- you define a 32-bit reservoir, load new content (2 bytes a time in this case) to unoccupied high bits when necessary and return low N bits (N<=15) to the reader.
It's not that the bits can't be shunted as efficiently. The problem is that for a Huffman code, a little-endian bitstream makes you navigate the tree starting from the least-significant bit of what you read in. This practically forces you to use a table lookup just to determine how long the code is, before you can shift it out and start on the next one.
With a big-endian stream you can use SIMD to do magnitude comparisons to figure out how long the code is. Or if you don't have SIMD and still want to use a table, it's a lot more efficient to generate those tables, because when you insert a new code into the table all the table entries for that code are contiguous, so you work through the table memory sequentially. And there's a lot more cache locality for frequently-used codes.
On the endianness of bitstreams | tīkōuka.dev
Contrasting the benefits of big-endian and little-endian packing in bitstreams and observing that one consequence shows up in Zstandard.
https://www.xn--tkuka-m3a3v.dev/bitstream-endianness/