I believe that currently we store a full 512 bit type hash for each value that is serialized. When serializing large data structures with lots of repeated types, this could cause quite a bit of bloat, so we might want to find a more efficient way to serialize.
counterargument
One could argue that we shouldn't make our serialized form more complicated and that you should just run the output of Value.serialize through zlib.compress if you want a smaller serialized form.
potential solution
The first solution that comes to mind is to prepend the serialized values with an array of hashes and then use indexes into this array instead of hashes when serializing objects. The downside that comes to mind is that when you go to serialize a value you don't know how many different types it is going to reference, so you probably don't know how many bytes you need to represent an index. We could decide on an arbitrary "you probably won't have more than X types in a given argument to Value.serialize and always use this number of bytes, or we could try to do something fancier and minimize the number of bytes needed to represent indexes.
Rather than a table up front, you could just have it be built up incrementally on encode/decode, with a way in to format to reference a previously encountered Reference.
problem
I believe that currently we store a full 512 bit type hash for each value that is serialized. When serializing large data structures with lots of repeated types, this could cause quite a bit of bloat, so we might want to find a more efficient way to serialize.
counterargument
One could argue that we shouldn't make our serialized form more complicated and that you should just run the output of
Value.serialize
throughzlib.compress
if you want a smaller serialized form.potential solution
The first solution that comes to mind is to prepend the serialized values with an array of hashes and then use indexes into this array instead of hashes when serializing objects. The downside that comes to mind is that when you go to serialize a value you don't know how many different types it is going to reference, so you probably don't know how many bytes you need to represent an index. We could decide on an arbitrary "you probably won't have more than X types in a given argument to
Value.serialize
and always use this number of bytes, or we could try to do something fancier and minimize the number of bytes needed to represent indexes.