spiraldb / vortex

An extensible, state-of-the-art columnar file format
https://vortex.dev
Apache License 2.0
995 stars 27 forks source link

Add cardinality estimate stat #913

Open a10y opened 2 months ago

a10y commented 2 months ago

Useful for compressor to decide if Dict compression is worthwhile.

There's a Rust crate already implementing it: https://docs.rs/hyperloglogplus/latest/hyperloglogplus/struct.HyperLogLogPlus.html

Can be used:

lwwmanning commented 2 months ago

I think we want this instead of HLL++: https://www.cidrdb.org/cidr2019/papers/p23-freitag-cidr19.pdf

lwwmanning commented 2 months ago

(In particular, it gives good estimates of cardinality of arbitrary combinations of attributes rather than just attributes, which is cool / handy for compound join keys)

lwwmanning commented 1 month ago

if we're taking off the shelf, this crate looks potentially better: https://github.com/cloudflare/cardinality-estimator/tree/main

robert3005 commented 2 weeks ago

previously mentioned in #85