timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.91k stars 882 forks source link

SkipScan for compressed chunks #3070

Open svenklemm opened 3 years ago

svenklemm commented 3 years ago

The initial SkipScan implementation only works for uncompressed chunks but for compressed chunks a similar optimization could be implemented.

The simplified design memo: https://docs.google.com/document/d/1JFF8UGqpdBEhdnrsJ9ngKREmDzVft5SchzjOoao2NoA/edit#heading=h.zez1r9vckjjp

wrobell commented 3 years ago

It would be great to get this feature implemented.

andnig commented 2 years ago

Is this actually related to #2897?

svenklemm commented 2 years ago

This is not related to #2897.

greenweeds commented 2 years ago

Sharing the content of a Stack Overflow report here as @davidkohn88 suggested it may be related to this report.


I have got a hypertable (around 300 millions rows) with the following schema

CREATE TABLE IF NOT EXISTS candlesticks(
 open_time TIMESTAMP NOT NULL,
 close_time TIMESTAMP NOT NULL,
 open   DOUBLE PRECISION,
 high   DOUBLE PRECISION,
 low    DOUBLE PRECISION,
 close   DOUBLE PRECISION,
 volume      DOUBLE PRECISION,
 quote_volume      DOUBLE PRECISION,
 symbol VARCHAR (20) NOT NULL,
 exchange VARCHAR (256),
 PRIMARY KEY (symbol, open_time, exchange)
);

After compressing it with this query

ALTER TABLE candlesticks SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'symbol, exchange'
);

The following query takes several minutes (it seems timescale decompress every chunk) whereas before it was only 1/2 seconds :

SELECT DISTINCT ON (symbol) * FROM candlesticks 
ORDER BY symbol, open_time DESC;

If i add a time conditions like open_time >= now() - INTERVAL '5 minutes' it's better


shhnwz commented 1 year ago

Current scope of SkipScan :

Hypertable Query#1: explain select distinct sensor_id from sensor_data;

                                                                              QUERY PLAN                                                            

----------------------------------------------------------------------------------------------------------------------------------------------------
-------------------
 Unique  (cost=6.23..11859.26 rows=200 width=4)
   ->  Merge Append  (cost=6.23..10901.76 rows=383000 width=4)
         Sort Key: _hyper_10_25_chunk.sensor_id
         ->  Custom Scan (SkipScan) on _hyper_10_25_chunk  (cost=0.29..66.01 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_25_chunk_sensor_data_sensor_id_time_idx on _hyper_10_25_chunk  (cost=0.29..1694.95 rows=42311 wid
th=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_26_chunk  (cost=0.29..65.44 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_26_chunk_sensor_data_sensor_id_time_idx on _hyper_10_26_chunk  (cost=0.29..1170.59 rows=29487 wid
th=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_27_chunk  (cost=0.29..65.43 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_27_chunk_sensor_data_sensor_id_time_idx on _hyper_10_27_chunk  (cost=0.29..1016.93 rows=25643 wid
th=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_28_chunk  (cost=0.29..65.42 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_28_chunk_sensor_data_sensor_id_time_idx on _hyper_10_28_chunk  (cost=0.29..1217.82 rows=30769 wid
th=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_29_chunk  (cost=0.42..93.49 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_29_chunk_sensor_data_sensor_id_time_idx on _hyper_10_29_chunk  (cost=0.42..28243.61 rows=665279 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_30_chunk  (cost=0.42..92.99 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_30_chunk_sensor_data_sensor_id_time_idx on _hyper_10_30_chunk  (cost=0.42..17120.39 rows=403198 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_31_chunk  (cost=0.42..92.98 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_31_chunk_sensor_data_sensor_id_time_idx on _hyper_10_31_chunk  (cost=0.42..19651.68 rows=463684 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_32_chunk  (cost=0.42..92.98 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_32_chunk_sensor_data_sensor_id_time_idx on _hyper_10_32_chunk  (cost=0.42..20522.07 rows=483843 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_33_chunk  (cost=0.42..93.49 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_33_chunk_sensor_data_sensor_id_time_idx on _hyper_10_33_chunk  (cost=0.42..28243.61 rows=665279 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_34_chunk  (cost=0.42..92.98 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_34_chunk_sensor_data_sensor_id_time_idx on _hyper_10_34_chunk  (cost=0.42..19667.59 rows=463678 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_35_chunk  (cost=0.42..92.98 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_35_chunk_sensor_data_sensor_id_time_idx on _hyper_10_35_chunk  (cost=0.42..20522.07 rows=483843 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_36_chunk  (cost=0.42..92.99 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_36_chunk_sensor_data_sensor_id_time_idx on _hyper_10_36_chunk  (cost=0.42..17108.42 rows=403200 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_37_chunk  (cost=0.42..93.46 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_37_chunk_sensor_data_sensor_id_time_idx on _hyper_10_37_chunk  (cost=0.42..26349.98 rows=623037 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_38_chunk  (cost=0.42..92.97 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_38_chunk_sensor_data_sensor_id_time_idx on _hyper_10_38_chunk  (cost=0.42..18397.98 rows=434237 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (SkipScan) on _hyper_10_39_chunk  (cost=0.42..92.98 rows=200 width=4)
               ->  Index Only Scan using _hyper_10_39_chunk_sensor_data_sensor_id_time_idx on _hyper_10_39_chunk  (cost=0.42..19205.12 rows=453113 w
idth=4)
                     Index Cond: (sensor_id > NULL::integer)
         ->  Custom Scan (DecompressChunk) on _hyper_10_40_chunk  (cost=0.10..39.85 rows=380000 width=4)
               ->  Index Scan using compress_hyper_13_41_chunk__compressed_hypertable_13_sensor_id_ on compress_hyper_13_41_chunk  (cost=0.27..39.85
 rows=380 width=8)

Uncompressed chunk Query#2: explain select distinct sensor_id from _timescaledb_internal._hyper_10_25_chunk;

                                                                     QUERY PLAN                                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.29..10.97 rows=33 width=4)
   ->  Custom Scan (SkipScan) on _hyper_10_25_chunk  (cost=0.29..10.89 rows=33 width=4)
         ->  Index Only Scan using _hyper_10_25_chunk_sensor_data_sensor_id_time_idx on _hyper_10_25_chunk  (cost=0.29..1694.95 rows=42311 width=4)
               Index Cond: (sensor_id > NULL::integer)

Compressed chunk Query#3: explain select distinct sensor_id from _timescaledb_internal._hyper_10_40_chunk;

                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 HashAggregate  (cost=960.80..961.00 rows=20 width=4)
   Group Key: _hyper_10_40_chunk.sensor_id
   ->  Custom Scan (DecompressChunk) on _hyper_10_40_chunk  (cost=0.03..10.80 rows=380000 width=4)
         ->  Seq Scan on compress_hyper_13_41_chunk  (cost=0.00..10.80 rows=380 width=8)

Compressed Internal Chunk Query#4: explain select distinct sensor_id from _timescaledb_internal.compress_hyper_13_41_chunk

                                                                              QUERY PLAN                                                            

----------------------------------------------------------------------------------------------------------------------------------------------------
------------------
 Unique  (cost=0.27..7.60 rows=20 width=4)
   ->  Custom Scan (SkipScan) on compress_hyper_13_41_chunk  (cost=0.27..7.55 rows=20 width=4)
         ->  Index Only Scan using compress_hyper_13_41_chunk__compressed_hypertable_13_sensor_id_ on compress_hyper_13_41_chunk  (cost=0.27..39.85 
rows=380 width=4)
               Index Cond: (sensor_id > NULL::integer)
shhnwz commented 1 year ago

Implementation Details:

Planner Changes:

patstrom commented 8 months ago

What is the status on this issue? This is something that would be great for us!

decarbonise commented 1 week ago

Wanted to drop in and add my support for this. I was using Timescale for compression, but the lack of skip scan makes distinct on queries unusably slow