influxdata / influxdb

Scalable datastore for metrics, events, and real-time analytics
https://influxdata.com
Apache License 2.0
29.05k stars 3.56k forks source link

Implement Bonobo algorithm for influxdb, enabling 5 times better data compression #23081

Open JsBergbau opened 2 years ago

JsBergbau commented 2 years ago

Proposal: Improve the compression algorithm like the Bonobo timeseries data compression algorithm sugggest here https://github.com/JsBergbau/Bonobo-Timeseries-Data-Compression Current behavior: Due to the nature of floating point numbers in binary system and the data from the decimal system there is often a lot of change in data, requiring unnecessary disk space. Measurements showed that compression can be improved by a factor of 5. Desired behavior: Bonobo timeseries data compression as explained in https://github.com/JsBergbau/Bonobo-Timeseries-Data-Compression is implemented.

Alternatives considered: Currently I don't know and better solution to gain so much compression improvement.

Use case: Why is this important (helps with prioritizing requests)? Saving data by the factor of 5 is a lot of improvement, especially for 32 bit systems where database size is quite limited, see https://github.com/influxdata/influxdb/pull/12362

ypnos commented 2 years ago

The algorithm sounds great. Are there any performance implications? Are there specific data patterns where compression could degrade?

JsBergbau commented 2 years ago

The performance implications should be minimal, less than 1 %. So far I did not find any pattern that would degrade compression compared to current implementation.