This PR adds the dictionary-generator binary, which generates three dictionaries when ran. They are specialized in compressing specifically payloads, events and transactions.
The way it does so is that it starts by generating a small dictionary trained with a limited amount of samples, then doubles the size and trains another one, and repeats this process while benchmarking the dictionaries until the compression ratio improvements become negligible.
The parameters of what is negligible, which is the first size to use, and others are fully customizable using CLI flags.
The binary outputs the dictionaries directly as Go files, ready to be used in our codec/zbor package.
The usage of the new specialized dictionaries results in a performance improvement of about 15%, and reduces the final size of the index (at least with localnet data) of about 28%.
Fixes #270
Additional Notes
This is going to be a tricky one to test, because it currently directly reads on the DB and executes the zstd command. I guess that since it's a utility binary and not a part of the Flow DPS itself, we're fine with not writing tests for it?
Currently from what I've seen, indexes that store a uint64 at a given height, when compressed, are taking 8 times more space than when uncompressed. We might want to make an exception for anything that is a uint64 never to get compressed?
TODO
[x] Generate actual dictionaries for payloads, events and transactions, and use them in the codec package.
[ ] ~Find a cleaner way to create prefixes for the iterator that does not involve making storage constants public.~
[x] Select event types for the event dictionaries.
[x] Run the performance benchmark with the new dictionaries to confirm improvement
Goal of this PR
This PR adds the
dictionary-generator
binary, which generates three dictionaries when ran. They are specialized in compressing specifically payloads, events and transactions.The way it does so is that it starts by generating a small dictionary trained with a limited amount of samples, then doubles the size and trains another one, and repeats this process while benchmarking the dictionaries until the compression ratio improvements become negligible.
The parameters of what is negligible, which is the first size to use, and others are fully customizable using CLI flags.
The binary outputs the dictionaries directly as Go files, ready to be used in our
codec/zbor
package.The usage of the new specialized dictionaries results in a performance improvement of about 15%, and reduces the final size of the index (at least with localnet data) of about 28%.
Fixes #270
Additional Notes
zstd
command. I guess that since it's a utility binary and not a part of the Flow DPS itself, we're fine with not writing tests for it?uint64
never to get compressed?TODO
storage
constants public.~Checklist