grain-lang / grain

The Grain compiler toolchain and CLI. Home of the modern web staple. 🌾
https://grain-lang.org/
GNU Lesser General Public License v3.0
3.22k stars 115 forks source link

Implement LEB128 stdlib #2098

Open furesoft opened 2 months ago

furesoft commented 2 months ago

Add Buffer.addNumber(Number) that appends the bytes of the number based on the varint algorithm.

Some file formats or protocols rely on that feature so it could be helpful to have this function

ospencer commented 2 months ago

Hey @furesoft, thanks for raising this.

Grain's Number type is a composition of many number subtypes: integers, bigints, floats, rationals. As such, they all have different byte layouts and we wouldn't be able to generically implement an setNumber as it wouldn't be meaningful or have some special unknown encoding that also wouldn't be very helpful. We could of course do it for the integer-specific types.

That said though, there are various varint algorithms and I don't think it makes sense to lock Buffer to just one. For the ones we'd want to implement, they should just be their own libraries. The one that makes the most sense for Grain (in the context of WebAssembly and such) is LEB128. It could have functions that encode to Bytes or directly into a Buffer. We would likely keep it to the fixed-width ints for now, as that would cover most people's use cases and explore an implementation for BigInt later, mostly because Grain bigints are not two's complement.

While not necessary for a basic implementation, we should really consider SIMD for this because naive LEB128 encoding/decoding is slow.

The algorithm to consider would be Vectorized VByte Decoding. It uses 16 SIMD instructions. 15 of them are directly implemented by WebAssembly, with pmovsxbd able to be implemented by a combination of i16x8.extend_low_i8x16_u and i32x4.extend_low_i16x8_u.

We can probably implement the naive version first, as it will take some time to get SIMD instructions into Grain.