opensourcerouting / c-capnproto

C library/compiler for the Cap'n Proto serialization/RPC protocol
MIT License
118 stars 40 forks source link

Implemented capn_size() for calculating buffer size #37

Closed detly closed 3 years ago

detly commented 4 years ago

This is a first pass at implementing what's requested in #26. It adds a function capn_size() that calculates the size required for a buffer passed to capn_write_mem().

Notes:

zenhack commented 3 years ago

Please confirm that my comment is correct ie. that packed serialisation will never require more memory than unpacked. Then you at least have the guarantee that a buffer of size capn_size() is suitable for both packed and unpacked calls.

This is not the case (and is in general impossible for any compression scheme to guarantee, because of the pidgeonhole principle). Packing optimizes for the common case where there are many zero bytes, but may actually bloat the message a bit if there are very few zeros.

The docs (https://capnproto.org/encoding.html#packing) do put an upper bound on the overhead:

the worst-case space overhead of packing is 2 bytes per 2 KiB of input

...though I would want to sanity check that the C implementation actually computes the optimal encoding before relying on that.

I haven't looked closely at the code portion of this, and probably won't find time to (and I'm not familiar with the C implementation in particular), but I stumbled over here from the mailing list and figured I could at least point this out. Happy hacking.

detly commented 3 years ago

This is not the case (and is in general impossible for any compression scheme to guarantee, because of the pidgeonhole principle).

Ah nuts, I actually knew this too but didn't make the connection.

For now I might just say that there is no function to compute the size for a packed buffer. At least this addresses part of the need. The packed case could be done in the same way that sprintf works ie. do a serialisation pass without writing anything to compute the size. Not super performant, but perhaps offset by the fact that packing is there for when your storage or bandwidth costs already outweigh your computational costs.