timescale / docs

Timescale product documentation 📖
https://docs.timescale.com/
45 stars 87 forks source link

[Site Bug] Issue with the page: /use-timescale/latest/compression/compression-design/ #3170

Open arunkumar790 opened 2 months ago

arunkumar790 commented 2 months ago

Describe the bug

The meaning of Compression mentioned in the "About Compression" and "Compression design" pages are a bit conflicting

What do the docs say now?

In the "About Compression," it is mentioned as follows "When you enable compression, the data in your hypertable is compressed chunk by chunk. When the chunk is compressed, multiple records are grouped into a single row. The columns of this row hold an array-like structure that stores all the data. This means that instead of using lots of rows to store the data, it stores the same data in a single row. Because a single row takes up less disk space than many rows, it decreases the amount of disk space required, and can also speed up your queries."

In the "Compression design" Compressing data TimescaleDB is built on PostgreSQL which is, by nature, a row-based database. Because time-series data is accessed in order of time, when you enable compression, TimescaleDB converts many wide rows of data into a single row of data, called an array form. This means that each field of that new, wide row stores an ordered set of data comprising the entire column. For example, if you had a table with data that looked a bit like this: Raw table picture You can convert this to a single row in array form, like this: Compressed table UI picture

Even before you compress any data, this format immediately saves storage by reducing the per-row overhead. PostgreSQL typically adds a small number of bytes of overhead per row. So even without any compression, the schema in this example is now smaller on disk than the previous format.

This format arranges the data so that similar data, such as timestamps, device IDs, or temperature readings, is stored contiguously. This means that you can then use type-specific compression algorithms to compress the data further, and each array is separately compressed.

What should the docs say?

My understanding is that the compression happens in two stages 1) Converting the many wide rows of data into a single row of data (like array format) 2) Compressing the array data further, using the compression algorithms

If my understanding is correct, the document has to be updated

Page affected

https://docs.timescale.com/use-timescale/latest/compression/about-compression/#about-compression https://docs.timescale.com/use-timescale/latest/compression/compression-design/

Subject matter expert (SME)

Not sure

Screenshots

[Attach images of screenshots showing the bug] image image

jonatas commented 2 months ago

Very clear point @arunkumar790 👏

My guess is that the confusing comes because the compression itself is build of multiple "compression steps", so this pipeline of transformations are not very clear and we can definitively make the improvements!