Open svenklemm opened 3 years ago
It would be great to get this feature implemented.
Is this actually related to #2897?
This is not related to #2897.
Sharing the content of a Stack Overflow report here as @davidkohn88 suggested it may be related to this report.
I have got a hypertable (around 300 millions rows) with the following schema
CREATE TABLE IF NOT EXISTS candlesticks(
open_time TIMESTAMP NOT NULL,
close_time TIMESTAMP NOT NULL,
open DOUBLE PRECISION,
high DOUBLE PRECISION,
low DOUBLE PRECISION,
close DOUBLE PRECISION,
volume DOUBLE PRECISION,
quote_volume DOUBLE PRECISION,
symbol VARCHAR (20) NOT NULL,
exchange VARCHAR (256),
PRIMARY KEY (symbol, open_time, exchange)
);
After compressing it with this query
ALTER TABLE candlesticks SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'symbol, exchange'
);
The following query takes several minutes (it seems timescale decompress every chunk) whereas before it was only 1/2 seconds :
SELECT DISTINCT ON (symbol) * FROM candlesticks
ORDER BY symbol, open_time DESC;
If i add a time conditions like open_time >= now() - INTERVAL '5 minutes' it's better
Hypertable Query#1: explain select distinct sensor_id from sensor_data;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
-------------------
Unique (cost=6.23..11859.26 rows=200 width=4)
-> Merge Append (cost=6.23..10901.76 rows=383000 width=4)
Sort Key: _hyper_10_25_chunk.sensor_id
-> Custom Scan (SkipScan) on _hyper_10_25_chunk (cost=0.29..66.01 rows=200 width=4)
-> Index Only Scan using _hyper_10_25_chunk_sensor_data_sensor_id_time_idx on _hyper_10_25_chunk (cost=0.29..1694.95 rows=42311 wid
th=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_26_chunk (cost=0.29..65.44 rows=200 width=4)
-> Index Only Scan using _hyper_10_26_chunk_sensor_data_sensor_id_time_idx on _hyper_10_26_chunk (cost=0.29..1170.59 rows=29487 wid
th=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_27_chunk (cost=0.29..65.43 rows=200 width=4)
-> Index Only Scan using _hyper_10_27_chunk_sensor_data_sensor_id_time_idx on _hyper_10_27_chunk (cost=0.29..1016.93 rows=25643 wid
th=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_28_chunk (cost=0.29..65.42 rows=200 width=4)
-> Index Only Scan using _hyper_10_28_chunk_sensor_data_sensor_id_time_idx on _hyper_10_28_chunk (cost=0.29..1217.82 rows=30769 wid
th=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_29_chunk (cost=0.42..93.49 rows=200 width=4)
-> Index Only Scan using _hyper_10_29_chunk_sensor_data_sensor_id_time_idx on _hyper_10_29_chunk (cost=0.42..28243.61 rows=665279 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_30_chunk (cost=0.42..92.99 rows=200 width=4)
-> Index Only Scan using _hyper_10_30_chunk_sensor_data_sensor_id_time_idx on _hyper_10_30_chunk (cost=0.42..17120.39 rows=403198 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_31_chunk (cost=0.42..92.98 rows=200 width=4)
-> Index Only Scan using _hyper_10_31_chunk_sensor_data_sensor_id_time_idx on _hyper_10_31_chunk (cost=0.42..19651.68 rows=463684 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_32_chunk (cost=0.42..92.98 rows=200 width=4)
-> Index Only Scan using _hyper_10_32_chunk_sensor_data_sensor_id_time_idx on _hyper_10_32_chunk (cost=0.42..20522.07 rows=483843 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_33_chunk (cost=0.42..93.49 rows=200 width=4)
-> Index Only Scan using _hyper_10_33_chunk_sensor_data_sensor_id_time_idx on _hyper_10_33_chunk (cost=0.42..28243.61 rows=665279 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_34_chunk (cost=0.42..92.98 rows=200 width=4)
-> Index Only Scan using _hyper_10_34_chunk_sensor_data_sensor_id_time_idx on _hyper_10_34_chunk (cost=0.42..19667.59 rows=463678 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_35_chunk (cost=0.42..92.98 rows=200 width=4)
-> Index Only Scan using _hyper_10_35_chunk_sensor_data_sensor_id_time_idx on _hyper_10_35_chunk (cost=0.42..20522.07 rows=483843 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_36_chunk (cost=0.42..92.99 rows=200 width=4)
-> Index Only Scan using _hyper_10_36_chunk_sensor_data_sensor_id_time_idx on _hyper_10_36_chunk (cost=0.42..17108.42 rows=403200 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_37_chunk (cost=0.42..93.46 rows=200 width=4)
-> Index Only Scan using _hyper_10_37_chunk_sensor_data_sensor_id_time_idx on _hyper_10_37_chunk (cost=0.42..26349.98 rows=623037 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_38_chunk (cost=0.42..92.97 rows=200 width=4)
-> Index Only Scan using _hyper_10_38_chunk_sensor_data_sensor_id_time_idx on _hyper_10_38_chunk (cost=0.42..18397.98 rows=434237 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (SkipScan) on _hyper_10_39_chunk (cost=0.42..92.98 rows=200 width=4)
-> Index Only Scan using _hyper_10_39_chunk_sensor_data_sensor_id_time_idx on _hyper_10_39_chunk (cost=0.42..19205.12 rows=453113 w
idth=4)
Index Cond: (sensor_id > NULL::integer)
-> Custom Scan (DecompressChunk) on _hyper_10_40_chunk (cost=0.10..39.85 rows=380000 width=4)
-> Index Scan using compress_hyper_13_41_chunk__compressed_hypertable_13_sensor_id_ on compress_hyper_13_41_chunk (cost=0.27..39.85
rows=380 width=8)
Uncompressed chunk Query#2: explain select distinct sensor_id from _timescaledb_internal._hyper_10_25_chunk;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.29..10.97 rows=33 width=4)
-> Custom Scan (SkipScan) on _hyper_10_25_chunk (cost=0.29..10.89 rows=33 width=4)
-> Index Only Scan using _hyper_10_25_chunk_sensor_data_sensor_id_time_idx on _hyper_10_25_chunk (cost=0.29..1694.95 rows=42311 width=4)
Index Cond: (sensor_id > NULL::integer)
Compressed chunk Query#3: explain select distinct sensor_id from _timescaledb_internal._hyper_10_40_chunk;
QUERY PLAN
---------------------------------------------------------------------------------------------------
HashAggregate (cost=960.80..961.00 rows=20 width=4)
Group Key: _hyper_10_40_chunk.sensor_id
-> Custom Scan (DecompressChunk) on _hyper_10_40_chunk (cost=0.03..10.80 rows=380000 width=4)
-> Seq Scan on compress_hyper_13_41_chunk (cost=0.00..10.80 rows=380 width=8)
Compressed Internal Chunk Query#4: explain select distinct sensor_id from _timescaledb_internal.compress_hyper_13_41_chunk
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
------------------
Unique (cost=0.27..7.60 rows=20 width=4)
-> Custom Scan (SkipScan) on compress_hyper_13_41_chunk (cost=0.27..7.55 rows=20 width=4)
-> Index Only Scan using compress_hyper_13_41_chunk__compressed_hypertable_13_sensor_id_ on compress_hyper_13_41_chunk (cost=0.27..39.85
rows=380 width=4)
Index Cond: (sensor_id > NULL::integer)
Planner Changes:
[x] Transform CustomScan(DecompressChunk)->IndexScan/IndexOnlyScan INTO CustomScan(DecompressChunk)->CustomScan(SkipScan)->IndexScan/IndexOnlyScan.
[x] Modify get_distinct_var() to map var for compressed chunk
What is the status on this issue? This is something that would be great for us!
Wanted to drop in and add my support for this. I was using Timescale for compression, but the lack of skip scan makes distinct on queries unusably slow
The initial SkipScan implementation only works for uncompressed chunks but for compressed chunks a similar optimization could be implemented.
The simplified design memo: https://docs.google.com/document/d/1JFF8UGqpdBEhdnrsJ9ngKREmDzVft5SchzjOoao2NoA/edit#heading=h.zez1r9vckjjp