optakt / flow-dps

Flow Data Provisioning Service
Apache License 2.0
29 stars 13 forks source link

Add dictionary generator utility #495

Closed Ullaakut closed 2 years ago

Ullaakut commented 2 years ago

Goal of this PR

This PR adds the dictionary-generator binary, which generates three dictionaries when ran. They are specialized in compressing specifically payloads, events and transactions.

The way it does so is that it starts by generating a small dictionary trained with a limited amount of samples, then doubles the size and trains another one, and repeats this process while benchmarking the dictionaries until the compression ratio improvements become negligible.

The parameters of what is negligible, which is the first size to use, and others are fully customizable using CLI flags.

The binary outputs the dictionaries directly as Go files, ready to be used in our codec/zbor package.

The usage of the new specialized dictionaries results in a performance improvement of about 15%, and reduces the final size of the index (at least with localnet data) of about 28%.

Fixes #270

Additional Notes

TODO

Checklist