timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.49k stars 876 forks source link

[Enhancement]: Improve Numeric Compression #3962

Open bkief opened 2 years ago

bkief commented 2 years ago

What type of enhancement is this?

Performance

What subsystems and features will be improved?

Compression

What does the enhancement do?

Compression of numeric types could be stored as integers and the scale & precision information stored as column metadata. This would allow the more efficient delta-of-delta compression to be used rather than the lz-array compression currently used for numeric types. Numeric precisions too large to store as a 64bit int could default back to the lz-array compression.

Implementation challenges

This is similar to how parquet encoding treats its DECIMAL type, that could be used as a reference. The hardest part would likely be the reconstruction of the numeric type after decompressing.

bkief commented 2 years ago

2ndQuadrant (now EDB) has implemented a similar datatype that may be helpful as reference material https://github.com/2ndQuadrant/fixeddecimal

bkief commented 2 years ago

Int64 should allow for 18 significant digits of precision. NaN could be stored in the dead space that is >10^18, like int64.max It's also notable the PG15 will likely support numerics with negative scale or scale larger than the precision. This enhancement should preemptively support these numeric scales

ianthetechie commented 2 years ago

What is the current state of NUMERIC compression in timescale? I know it exists, but beyond that, what can we reasonably expect performance and space-wise when compressing columnar data? I'm guessing it's not quite as good as float since this issue exists?

bkief commented 6 months ago

@svenklemm @erimatnor - Does this feature more feasible with the recent compression API enhancements?