muni-town / agentic-fediverse

An outline and resources for the "Agentic Fediverse".
14 stars 2 forks source link

Evaluate Varints and / or Simple Packing / Compression For Leaf Components #10

Open zicklag opened 2 months ago

zicklag commented 2 months ago

The only reason so far I'm not sure that Borsh is perfect for Leaf is that it doesn't have any sort of compression for small integers.

If we want to keep the canonical property, then adding any sort of varint encoding or packing would require making it mandatory, or else a new data type. I lean towards making it mandatory if we're going to use it, as an extra data type, just for the sake of encoding/decoding/storage efficiency doesn't match the semantic nature of Leaf schemas very well.

Looking into varint encodings, while they do take less space, they can have a negative impact on performance:

Note that varint encoding saves space on the wire but is actually relatively slow, because it requires a lot of branches to encode/decode. It's not as slow as text encoding, of course, but as binary formats go it's not so great. This is one reason why Cap'n Proto decided to reverse this decision and just put fixed-width ints on the wire. Cap'n Proto also includes an optional "packing" algorithm which compresses away zero-valued bytes, which produces similar message sizes to Protobuf but is generally faster because the algorithm uses less branching.

https://stackoverflow.com/questions/24614553/why-is-varint-an-efficient-data-representation

I also found the Cap'nProto packing algorithm: https://capnproto.org/encoding.html#packing

That could be useful. Looks like it's designed to be extremely efficient to implement in a native language, but not sure if it'd slow things down in JS or something like that.

For now we are more concerned with a working prototype, but this is something we should think about and carefully weigh the pros and cons of before a Leaf 1.0.

One advantage of not packing is that Borsh is very simple to implement right now. Packing adds an extra complication, but it's still quite simple, so I'm not too worried about it.

It seems like the storage savings we could get from packing might be worth it, when we want to work well on resource constrained and low bandwidth devices. It might save a lot of storage/network costs over the long term.

zicklag commented 3 days ago

Polkadot's SCALE encoding is really similar to Borsh, but it also has compact integers. I think we need to do some serious thinking about whether Borsh or SCALE makes more sense for leaf.

Edit: Credit to @aireshBhat for pointing out SCALE to me.

anderspitman commented 3 days ago

Another option to check out would be QUIC's int packing. It's very simple.

zicklag commented 3 days ago

Another option to check out would be QUIC's int packing. It's very simple.

Interesting!

Looking at the QUIC int packing algorithm seems great for speed and storage efficiency, but only allows encoding up to 62 bits, which is a problem if you need a full 64 bits unfortunately.


A little research later...

Ah! The SCALE encoding for compact integers is almost identical to the encoding for packed QUIC integers, except that it allows you to expand beyond 62 bits for very large integers. I'm pretty happy with that.

The only concerns with compact integers were performance and otherwise confusion with the schema system, but I think we'll be fine for both of those, especially if QUIC is doing something very similar for integer packing, and parity did design SCALE to be efficient for low-end devices, so we're probably fine.

It looks like there might be nicer TypeScript libraries with type inference for SCALE, too, which could be nice if we don't have to make our own.