klauspost / compress

Optimized Go Compression Packages
Other
4.77k stars 319 forks source link

zstd: Reuse single encoder/decoder with many dictionaries #952

Closed coxley closed 6 months ago

coxley commented 6 months ago

Problem

I have a multi-tenanted use-case where we will keep hundreds (~500-1000) dictionaries in-memory at any given time. These are used to compress data before writing to storage, gRPC, etc, and decompressed on the way back.

The current API assumes that you know the entire set of dictionaries you'll use at setup time. There's no way to give a dictionary to w.EncodeAll nor is there a way to do it for r.DecodeAll or even register dictionaries to an existing *zstd.Reader.

Ideally, I manage the lifetime of my own dictionaries. When to refresh, prefetch, locally cache, etc. At compression and decompression time, I can handle providing the correct []byte to use. Building a dictionary with WithEncoderDict or WithDecoderDicts seems to copy all of the data given to it, making additional overhead.

Upstream zstd has a function ZSTD_createCDict_byReference which avoids copying the input. Doing something similar would be a very nice addition as well.

Any thoughts?