google / wuffs

Wrangling Untrusted File Formats Safely
Other
4.16k stars 131 forks source link

Best way to read 3-byte little endian number? #12

Closed mvdan closed 6 years ago

mvdan commented 6 years ago

This is just another question that has popped in my mind. Not necessarily a bug report.

It's quite common to have to read numbers from any number of bits or bytes, not just the power-of-two sizes like u8, u16, and u32.

For example, when implementing zstd, in a couple of places I need to read a three-byte little endian number. At the moment, I am doing something like:

// spaghetti code to read a three-byte little endian number
var block_lower base.u32[..0xFF] = in.src.read_u8?() as base.u32
var block_upper base.u32[..0xFFFF] = in.src.read_u16le?() as base.u32
var block_header base.u32[..0xFFFFFF] = (block_upper << 8) | block_lower

Is there a better way? Or rather, should there be a better way?

I realise that this isn't strictly necessary, and that languages like Go with their encoding/binary packages don't support such a thing either out of the box. But this language being precisely for encoders and decoders, I wonder if such an "arbitrary byte length" or even "arbitrary bit length" functionality would be possible.

Ideally, I'd instead write something like:

var block_header base.u32[..0xFFFFFF] = in.src.read_u24le?()
nigeltao commented 6 years ago

I'll consider a read_u24le method. It (and/or the big-endian variant) might be useful when reading RGB (red, green, blue) triples, not just for zstd...

mvdan commented 6 years ago

Appreciation commit :) https://github.com/mvdan/zstd/commit/9f48d9aba0c5a6f71164d625e96e2b889d36354e