spiraldb / vortex

An extensible, state-of-the-art columnar file format
https://vortex.dev
Apache License 2.0
996 stars 27 forks source link

fix: CompressionTrees diverge from the actual array children #1430

Closed lwwmanning closed 4 days ago

lwwmanning commented 4 days ago

Fixed the following misalignments:

Also improved the CompressionTree display impl to use TreeDisplay, which is much nicer & more useful for debugging. And I removed the panic_in_result_fn lint, since we ban panic, but assertions are in fact quite nice sometimes.

lwwmanning commented 4 days ago

this is split off from #1068

robert3005 commented 4 days ago

@lwwmanning need to look deeper but the whole idea of compression tree was that you could return a different array than the compressor. The compression tree would send you back to that path

lwwmanning commented 4 days ago

@lwwmanning need to look deeper but the whole idea of compression tree was that you could return a different array than the compressor. The compression tree would send you back to that path

we only did that in the FoR compressor for constant 0, and I'm not sure that's right (i.e., if it's constant 0, the Constant compressor or should win over the FoR compressor with Constant child in this new implementation, rather than saying that we should encode-like with the FoRCompressor)

lwwmanning commented 4 days ago

also, of the 5 things I changed, I'd argued all except the FoR one are clearly bugs

robert3005 commented 4 days ago

If we got to for compressor then that means array isn’t constant. Either stats are missing or it’s not constant. Not constant case is likely sparse 0s

lwwmanning commented 4 days ago

If we got to for compressor then that means array isn’t constant. Either stats are missing or it’s not constant. Not constant case is likely sparse 0s

Yep, and in that case we should do Frequency encoding, not FoR

gatesn commented 4 days ago

but assertions are in fact quite nice sometimes.

😱