Open cscetbon opened 4 months ago
Can confirm:
--- compressed:
HashAggregate (cost=82315.04..82315.05 rows=1 width=28) (actual time=55.340..55.343 rows=5 loops=1)
Group Key: t.block_id
Batches: 1 Memory Usage: 24kB
-> Nested Loop (cost=51296.47..51515.04 rows=3080000 width=16) (actual time=52.828..54.190 rows=13812 loops=1)
-> Limit (cost=51295.49..51295.51 rows=5 width=12) (actual time=52.777..52.779 rows=5 loops=1)
-> Sort (cost=51295.49..58995.49 rows=3080000 width=12) (actual time=52.776..52.777 rows=5 loops=1)
Sort Key: _hyper_39_22282_chunk."time" DESC
Sort Method: top-N heapsort Memory: 25kB
-> Custom Scan (DecompressChunk) on _hyper_39_22282_chunk (cost=0.04..137.80 rows=3080000 width=12) (actual time=0.022..52.712 rows=700 loops=1)
Filter: (is_coinbase IS TRUE)
Rows Removed by Filter: 2718385
-> Seq Scan on compress_hyper_40_22283_chunk compress_hyper_40_22283_chunk_1 (cost=0.00..137.80 rows=3080 width=88) (actual time=0.003..0.158 rows=3080 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_39_22282_chunk t (cost=0.97..3.90 rows=4000 width=16) (actual time=0.016..0.159 rows=2762 loops=5)
Filter: (is_coinbase IS NOT TRUE)
Rows Removed by Filter: 1
-> Index Scan using compress_hyper_40_22283_chunk_block_id__ts_meta_sequence_nu_idx on compress_hyper_40_22283_chunk (cost=0.28..3.90 rows=4 width=104) (actual time=0.003..0.003 rows=3 loops=5)
Index Cond: (block_id = _hyper_39_22282_chunk.block_id)
Planning Time: 6.816 ms
Execution Time: 55.388 ms
--- uncompressed:
HashAggregate (cost=138525.20..138527.20 rows=200 width=28) (actual time=3.812..3.813 rows=5 loops=1)
Group Key: t.block_id
Batches: 1 Memory Usage: 40kB
-> Nested Loop (cost=338.34..137465.62 rows=105958 width=16) (actual time=0.196..2.854 rows=13812 loops=1)
-> Limit (cost=0.43..0.78 rows=5 width=12) (actual time=0.158..0.853 rows=5 loops=1)
-> Index Scan using _hyper_39_22282_chunk_transactions_time_idx on _hyper_39_22282_chunk (cost=0.43..296392.26 rows=4238304 width=12) (actual time=0.157..0.852 rows=5 loops=1)
Filter: (is_coinbase IS TRUE)
Rows Removed by Filter: 11812
-> Bitmap Heap Scan on _hyper_39_22282_chunk t (cost=337.91..27281.04 rows=21192 width=16) (actual time=0.037..0.299 rows=2762 loops=5)
Recheck Cond: (block_id = _hyper_39_22282_chunk.block_id)
Filter: (is_coinbase IS NOT TRUE)
Rows Removed by Filter: 1
Heap Blocks: exact=777
-> Bitmap Index Scan on _hyper_39_22282_chunk_block_idx (cost=0.00..332.61 rows=42383 width=0) (actual time=0.027..0.027 rows=2763 loops=5)
Index Cond: (block_id = _hyper_39_22282_chunk.block_id)
Planning Time: 4.370 ms
Execution Time: 3.839 ms
(17 rows)
Time: 8.570 ms
One reason is that we don't have a specialized compression algorithm and hence no vectorized filters for bool columns. There is a workaround for this, making is_coinbase
a smallint
instead, it makes the difference less drastic, but it's still worse:
explain (costs off, analyze) WITH recent_blocks AS (
SELECT block_id FROM transactions
WHERE is_coinbase != 0
ORDER BY time DESC
LIMIT 5
)
SELECT
t.block_id, count(*) AS transaction_count,
SUM(weight) AS block_weight,
SUM(output_total_usd) AS block_value_usd
FROM transactions t
INNER JOIN recent_blocks b ON b.block_id = t.block_id
WHERE is_coinbase = 0
GROUP BY t.block_id;
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
HashAggregate (actual time=9.693..9.694 rows=5 loops=1)
Group Key: t.block_id
Batches: 1 Memory Usage: 24kB
-> Nested Loop (actual time=7.869..8.831 rows=13812 loops=1)
-> Limit (actual time=7.854..7.855 rows=5 loops=1)
-> Sort (actual time=7.853..7.854 rows=5 loops=1)
Sort Key: _hyper_39_22282_chunk."time" DESC
Sort Method: top-N heapsort Memory: 25kB
-> Custom Scan (DecompressChunk) on _hyper_39_22282_chunk (actual time=0.025..7.802 rows=700 loops=1)
Vectorized Filter: (is_coinbase <> 0)
Rows Removed by Filter: 2718385
-> Seq Scan on compress_hyper_41_22284_chunk compress_hyper_41_22284_chunk_1 (actual time=0.003..0.157 rows=3080 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_39_22282_chunk t (actual time=0.008..0.097 rows=2762 loops=5)
Vectorized Filter: (is_coinbase = 0)
Rows Removed by Filter: 1
-> Index Scan using compress_hyper_41_22284_chunk_block_id__ts_meta_sequence_nu_idx on compress_hyper_41_22284_chunk (actual time=0.001..0.001 rows=3 loops=5)
Index Cond: (block_id = _hyper_39_22282_chunk.block_id)
Planning Time: 6.280 ms
Execution Time: 9.721 ms
Another problem is that in the compressed case the sort on time
is not pushed down to DecompressChunk
, because the work_mem
setting is low. You can improve things by e.g. set work_mem to '64MB'
(this does not require the smallint
workaround, it's an independent improvement):
HashAggregate (actual time=2.664..2.667 rows=5 loops=1)
Group Key: t.block_id
Batches: 1 Memory Usage: 40kB
-> Nested Loop (actual time=0.554..1.778 rows=13812 loops=1)
-> Limit (actual time=0.535..0.780 rows=5 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_39_22282_chunk (actual time=0.534..0.778 rows=5 loops=1)
Filter: (is_coinbase <> 0)
Rows Removed by Filter: 14654
-> Sort (actual time=0.440..0.441 rows=18 loops=1)
Sort Key: compress_hyper_41_22285_chunk_1._ts_meta_max_1 DESC
Sort Method: quicksort Memory: 968kB
-> Seq Scan on compress_hyper_41_22285_chunk compress_hyper_41_22285_chunk_1 (actual time=0.004..0.166 rows=3080 loops=1)
-> Custom Scan (DecompressChunk) on _hyper_39_22282_chunk t (actual time=0.010..0.101 rows=2762 loops=5)
Vectorized Filter: (is_coinbase = 0)
Rows Removed by Filter: 1
-> Index Scan using compress_hyper_41_22285_chunk_block_id__ts_meta_sequence_nu_idx on compress_hyper_41_22285_chunk (actual time=0.001..0.001 rows=3 loops=5)
Index Cond: (block_id = _hyper_39_22282_chunk.block_id)
Planning Time: 5.686 ms
Execution Time: 2.697 ms
One reason is that we don't have a specialized compression algorithm and hence no vectorized filters for bool columns.
Added a feature request for this specifically: https://github.com/timescale/timescaledb/issues/7233
What type of bug is this?
Performance issue
What subsystems and features are affected?
Compression
What happened?
Running query from https://docs.timescale.com/tutorials/latest/blockchain-query/blockchain-compress/#taking-advantage-of-query-speedups gives better performance when data is decompressed and not when it's compressed.
TimescaleDB version affected
2.15.3
PostgreSQL version used
16.3
What operating system did you use?
Linux 6.5.0-15-generic
What installation method did you use?
Docker
What platform did you run on?
Not applicable
Relevant log output and stack trace
No response
How can we reproduce the bug?
Then compress it again
And try again a few times and you'll see higher latency no matter what