timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.77k stars 886 forks source link

[Enhancement]: 'time_bucket' queries using 'first' or 'distict on' are far slower (10000x) than necessary. #7211

Open ewoolsey opened 2 months ago

ewoolsey commented 2 months ago

What type of enhancement is this?

Performance, User experience

What subsystems and features will be improved?

Query planner

What does the enhancement do?

I think this is a really important optimization. It seems to me that very quickly downsampling data (without wanting to create additional caggs) is likely a very common use case.

I have this simple query:

select
    time_bucket($2::interval, time) as bucket,
    -- using distinct on instead of first has similar perf
    first(price_item -> 'price_state' ->> 'price', time) as price
FROM 
    event,
    jsonb_array_elements(attributes -> 'prices') AS price_item
where price_item ->> 'price_id' = $1
AND   time >= $3::timestamptz
GROUP BY bucket;

I would expect making the interval larger would improve. performance, but it does not. After hours of fooling around I found a solution:

WITH time_intervals AS (
    SELECT
        generate_series(
            $3::timestamptz,
            (SELECT max(time) FROM event WHERE time >= $3::timestamptz),
            $2::interval
        ) AS bucket_start
)
SELECT DISTINCT ON (ti.bucket_start)
    ti.bucket_start AS bucket,
    pd.price
FROM 
    time_intervals ti
LEFT JOIN LATERAL (
    SELECT
        price_item -> 'price_state' ->> 'price' AS price
    FROM
        event,
        jsonb_array_elements(attributes -> 'prices') AS price_item
    WHERE
        event.time >= ti.bucket_start
        AND event.time < ti.bucket_start + $2::interval
        AND price_item ->> 'price_id' = $1
    ORDER BY event.time DESC
    LIMIT 1
) pd ON true
ORDER BY 
    ti.bucket_start;

The second query can be 10000x faster (literally). It's a shame that the simple time_bucket query isn't able to match it. Especially when using something like 'first' I would expect the query planner to figure this out.

Additionally, this fast query can be written as a recursive cte.

ewoolsey commented 2 months ago

Slow query plan:

HashAggregate  (cost=205051.45..205055.19 rows=214 width=40) (actual time=3244.923..3279.311 rows=11 loops=1)
  Group Key: (time_bucket('1 day'::interval, event."time"))
  Batches: 1  Memory Usage: 40kB
  ->  Gather  (cost=1000.44..202209.13 rows=162418 width=48) (actual time=11.204..3203.753 rows=166764 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Nested Loop  (cost=0.44..184967.33 rows=95540 width=48) (actual time=9.194..3164.508 rows=83382 loops=2)
              ->  Parallel Custom Scan (ChunkAppend) on event  (cost=0.43..40463.08 rows=95540 width=223) (actual time=8.990..482.878 rows=84892 loops=2)
                    Order: event."time"
                    Chunks excluded during startup: 0
                    ->  Parallel Index Scan using _hyper_42_109_chunk_idx_event_time on _hyper_42_109_chunk  (cost=0.43..40463.08 rows=95540 width=223) (actual time=8.987..473.147 rows=84892 loops=2)
                          Index Cond: ("time" >= (now() - '10 days'::interval))
                          Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                          Rows Removed by Filter: 289091
              ->  Function Scan on jsonb_array_elements price_item  (cost=0.01..1.50 rows=1 width=32) (actual time=0.027..0.031 rows=1 loops=169784)
                    Filter: ((value ->> 'price_id'::text) = '0x2b89b9dc8fdf9f34709a5b106b472f0f39bb6ca9ce04b0fd7f2e971688e2e53b'::text)
                    Rows Removed by Filter: 27
Planning Time: 0.652 ms
JIT:
  Functions: 29
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 1.665 ms, Inlining 0.000 ms, Optimization 0.809 ms, Emission 17.004 ms, Total 19.477 ms
Execution Time: 3280.454 ms

Fast Query Plan:

Unique  (cost=9110.68..9115.68 rows=200 width=40) (actual time=2.072..2.079 rows=10 loops=1)
  ->  Sort  (cost=9110.68..9113.18 rows=1000 width=40) (actual time=2.071..2.075 rows=10 loops=1)
        Sort Key: (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))
        Sort Method: quicksort  Memory: 25kB
        ->  Nested Loop Left Join  (cost=0.91..9060.85 rows=1000 width=40) (actual time=1.310..2.064 rows=10 loops=1)
              ->  ProjectSet  (cost=0.47..5.49 rows=1000 width=8) (actual time=0.030..0.037 rows=10 loops=1)
                    InitPlan 2 (returns $1)
                      ->  Result  (cost=0.46..0.47 rows=1 width=8) (actual time=0.025..0.026 rows=1 loops=1)
                            InitPlan 1 (returns $0)
                              ->  Limit  (cost=0.43..0.46 rows=1 width=8) (actual time=0.023..0.024 rows=1 loops=1)
                                    ->  Custom Scan (ChunkAppend) on event event_1  (cost=0.43..21022.58 rows=738970 width=8) (actual time=0.022..0.022 rows=1 loops=1)
                                          Order: event_1."time" DESC
                                          Chunks excluded during startup: 0
                                          ->  Index Only Scan Backward using _hyper_42_109_chunk_idx_event_time on _hyper_42_109_chunk _hyper_42_109_chunk_1  (cost=0.43..21022.58 rows=738970 width=8) (actual time=0.021..0.021 rows=1 loops=1)
                                                Index Cond: (("time" IS NOT NULL) AND ("time" >= (now() - '10 days'::interval)))
                                                Heap Fetches: 1
                    ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001 rows=1 loops=1)
              ->  Limit  (cost=0.43..9.04 rows=1 width=40) (actual time=0.202..0.202 rows=1 loops=10)
                    ->  Nested Loop  (cost=0.43..82675.70 rows=9613 width=40) (actual time=0.201..0.202 rows=1 loops=10)
                          ->  Custom Scan (ChunkAppend) on event  (cost=0.43..68112.00 rows=9613 width=227) (actual time=0.044..0.044 rows=1 loops=10)
                                Order: event."time" DESC
                                Chunks excluded during startup: 0
                                Chunks excluded during runtime: 10
                                ->  Index Scan Backward using _hyper_42_109_chunk_idx_event_time on _hyper_42_109_chunk  (cost=0.43..420.13 rows=1452 width=223) (actual time=0.012..0.012 rows=1 loops=10)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                                      Rows Removed by Filter: 2
                                ->  Index Scan Backward using _hyper_42_86_chunk_idx_event_time on _hyper_42_86_chunk  (cost=0.43..2171.48 rows=1548 width=207) (never executed)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                                ->  Index Scan Backward using _hyper_42_85_chunk_idx_event_time on _hyper_42_85_chunk  (cost=0.43..9669.24 rows=1673 width=209) (never executed)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                                ->  Index Scan Backward using _hyper_42_84_chunk_idx_event_time on _hyper_42_84_chunk  (cost=0.43..10862.69 rows=1662 width=219) (never executed)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                                ->  Index Scan Backward using _hyper_42_82_chunk_idx_event_time on _hyper_42_82_chunk  (cost=0.43..5431.64 rows=981 width=236) (never executed)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                                ->  Index Scan Backward using _hyper_42_83_chunk_idx_event_time on _hyper_42_83_chunk  (cost=0.43..4841.69 rows=375 width=261) (never executed)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                                ->  Index Scan Backward using _hyper_42_87_chunk_idx_event_time on _hyper_42_87_chunk  (cost=0.43..6394.68 rows=294 width=262) (never executed)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                                ->  Index Scan Backward using _hyper_42_91_chunk_idx_event_time on _hyper_42_91_chunk  (cost=0.43..10766.78 rows=323 width=270) (never executed)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                                ->  Index Scan Backward using _hyper_42_95_chunk_idx_event_time on _hyper_42_95_chunk  (cost=0.43..6029.81 rows=347 width=243) (never executed)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                                ->  Index Scan Backward using _hyper_42_100_chunk_idx_event_time on _hyper_42_100_chunk  (cost=0.43..5069.16 rows=386 width=256) (never executed)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                                ->  Index Scan Backward using _hyper_42_104_chunk_idx_event_time on _hyper_42_104_chunk  (cost=0.43..6454.70 rows=572 width=260) (never executed)
                                      Index Cond: (("time" >= (generate_series((now() - '10 days'::interval), $1, '1 day'::interval))) AND ("time" < ((generate_series((now() - '10 days'::interval), $1, '1 day'::interval)) + '1 day'::interval)))
                                      Filter: (kind = 'injective.oracle.v1beta1.EventSetPythPrices'::text)
                          ->  Function Scan on jsonb_array_elements price_item  (cost=0.01..1.50 rows=1 width=32) (actual time=0.153..0.153 rows=1 loops=10)
                                Filter: ((value ->> 'price_id'::text) = '0x2b89b9dc8fdf9f34709a5b106b472f0f39bb6ca9ce04b0fd7f2e971688e2e53b'::text)
                                Rows Removed by Filter: 1
Planning Time: 1.667 ms
Execution Time: 2.522 ms
akuzm commented 2 months ago

This family of optimizations for DISTINCT ON is usually called "Skip Scan". IIRC we don't have something like that for time_bucket, unfortunately.

I was wondering, why does you first query have GROUP BY bucket, price;? If you're calculating first(price, time), shouldn't it be GROUP BY bucket?

Also might be good to use attributes @> '[{"price": $1}'::jsonb to run this filter at the event scan level instead of doing it after a join with jsonb_array_elements.

ewoolsey commented 2 months ago

I was wondering, why does you first query have GROUP BY bucket, price;? If you're calculating first(price, time), shouldn't it be GROUP BY bucket?

Sorry that was a typo when I copied it over. I just edited the post.

Also might be good to use attributes @> '[{"price": $1}'::jsonb to run this filter at the event scan level instead of doing it after a join with jsonb_array_elements.

Indeed I had the same hunch, though I tested this and it had no effect.

ewoolsey commented 2 months ago

This family of optimizations for DISTINCT ON is usually called "Skip Scan". IIRC we don't have something like that for time_bucket, unfortunately.

@akuzm Forgive me I know nothing about postgres internals. Are the optimizations I'm asking for here possible? From a very high level view I would think it should be both for 'first' or 'distinct on'. I'd love to hear your opinion.

akuzm commented 2 months ago

Sorry that was a typo when I copied it over. I just edited the post.

Got it. It seems the explain is still for the query with typo -- the price is in Group Key:

              ->  Group  (cost=195980.96..197419.85 rows=95926 width=40) (actual time=9333.476..9352.393 rows=47560 loops=2)
                    Group Key: (time_bucket('1 day'::interval, event."time")), (((price_item.value -> 'price_state'::text) ->> 'price'::text))
                    ->  Sort  (cost=195980.96..196220.78 rows=95926 width=40) (actual time=9333.425..9339.926 rows=83370 loops=2)
                          Sort Key: (time_bucket('1 day'::interval, event."time")), (((price_item.value -> 'price_state'::text) ->> 'price'::text))

Do you mind posting the updated explain as well?

Are the optimizations I'm asking for here possible? From a very high level view I would think it should be both for 'first' or 'distinct on'. I'd love to hear your opinion.

Technically we could implement a special query plan for it, but it's not done and currently not planned. So for now you'll have to use the workarounds. I like both ideas, generating the buckets separately or using the recursive cte.

Some context on how it could work (this is not going to help you solve the problem now, it's more of a note for TimescaleDB developers): the basic Skip Scan we currently support speeds up "distinct on" on a column that has an index. After processing a row, it scrolls the underlying index scan forward to the value greater than current row. This can be extended for monotonous expressions like time_bucket, to scroll through the index scan even further to the start of the next bucket. And without this specialized plan it has to read all the rows in each bucket, only to discard them, hence the bad performance.

ewoolsey commented 2 months ago

I just updated. BTW in the above queries I omitted AND kind = '...' for simplicity, but that's why you see it in the query plan.

ewoolsey commented 2 months ago

Technically we could implement a special query plan for it, but it's not done and currently not planned

This would be amazing, and I think would benefit a lot of people as this feels like a pretty common use case! Anyhow we can leave this issue up until the team has time to work on this. Thanks for the quick response @akuzm.