zephyr-im / zephyr

An institutional/enterprise-scale distributed real-time messaging and notification system
34 stars 11 forks source link

Transition away from zcode #128

Open davidben opened 10 years ago

davidben commented 10 years ago

The length of a zcode encoding of a byte string is not purely a function of the length of its input. This is a problem for sharding messages because, depending on the bytes in the checksum and authenticator, the length of the header may change. Hence the requirement for terrible hacks like b9ec2cdc23b77fd86b69ba884c5513f3f71cf025.

We should use an encoding that inflates the length independent of the actual bytes. Or, at least, has a tight maximum overhead. (If the overhead isn't constant, we can shard by pessimistically assuming the maximum overhead. But zcode's maximum overhead is 2x which is too high for that.) Someone suggested Consistent Overhead Byte Stuffing which seems pretty reasonable. It actually looks like the overhead isn't quite constant because of the 0xFF rule, but the encoders and decoders seem pretty simple and the bound is still pretty tight.

https://en.wikipedia.org/wiki/Consistent_Overhead_Byte_Stuffing

We'll also want a story for the client learning the server supports this before it needs to shard anything (i.e. before it sends out a batch of subs). Server learning the client supports is also handy so zcode can be dropped altogether eventually, but not as critical since the server doesn't shard.