spiraldb / vortex

A toolkit for working with compressed Arrow in-memory, on-disk, and over-the-wire. "The LLVM of file formats"
Apache License 2.0
213 stars 12 forks source link

Refactor Vortex compressor #291

Open gatesn opened 5 months ago

gatesn commented 5 months ago

The current compressor implements a single strategy based on sampling. But it's a bit of a hammer. Encodings have to decide themselves, in isolation, whether they should or shouldn't be included in the search space.

Instead, I think we want these broader compression strategies to be aware of the codecs they can run over. We could explicitly implement BtrBlocks, using their chosen set of encodings. Or we could implement a configurable statistical strategy, or anything else.

As part of this, we should pull the strategy implementations into a trait. The API for encodings to implement could be stripped down, or even removed entirely and left up to the strategy.

robert3005 commented 3 months ago

with #422 we can start building alternative compression strategy

lwwmanning commented 2 weeks ago

relatedly #128