problem

I believe that currently we store a full 512 bit type hash for each value that is serialized. When serializing large data structures with lots of repeated types, this could cause quite a bit of bloat, so we might want to find a more efficient way to serialize.

counterargument

One could argue that we shouldn't make our serialized form more complicated and that you should just run the output of Value.serialize through zlib.compress if you want a smaller serialized form.

potential solution

The first solution that comes to mind is to prepend the serialized values with an array of hashes and then use indexes into this array instead of hashes when serializing objects. The downside that comes to mind is that when you go to serialize a value you don't know how many different types it is going to reference, so you probably don't know how many bytes you need to represent an index. We could decide on an arbitrary "you probably won't have more than X types in a given argument to Value.serialize and always use this number of bytes, or we could try to do something fancier and minimize the number of bytes needed to represent indexes.

unisonweb / unison

serialization: efficient hash representation #3604

problem

counterargument

potential solution