timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.34k stars 869 forks source link

Slow compression when running a compression job as compared to manually compressing #3086

Open dwalthour opened 3 years ago

dwalthour commented 3 years ago

Relevant system information:

Describe the bug Compression running under the job scheduler is significantly slower than when running the manual compression commands

To Reproduce Steps to reproduce the behavior:

  1. Have a really large table with a significant number of chunks. For my table, it consists of all messages sent by the NASDAQ stock exchange for a given day. Specifically, this is 71 columns by 505,060,260 rows for April 1, 2021. This is broken across 326 chunks (each chunk is 3 minutes of time). This table is called 'xngs_md_totalview_itch' in my database. My tables use a BIGINT for the time column, which represents the number of nanoseconds since the epoch.

  2. Run the query: "select compress_chunk(i) from show_chunks('xngs_md_totalview_itch', newer_than=>1617235200000000000) i;" With timing turned on in postgres, this takes "Time: 4603992.554 ms (01:16:43.993)" or about 14.1 seconds per chunk, which is quite good.

  3. Turn on a compression job in timescale for this same data and the compression goes 25 times slower, and it can only do one chunk every 360 seconds. The net result is that it takes 1.35 days to compress 1 day's worth of data with the background compression job, which is unacceptable, especially knowing that it is not the fault of the compression algorithm itself, as shown in point #2.

Expected behavior I expect that the background compression runs at a pace that is not too different than manual compression.

Actual behavior It runs 25x times slower than manual compression.

erimatnor commented 3 years ago

I believe the poor performance is because the policy compresses only one chunk every time it runs. There are certainly ways we can improve this, but there are also concerns about holding locks for a long time while you compress.

Until we can make the background compression more efficient, there's always the possibility to implement a custom background job (user-defined action) that compresses multiple chunks.