hydradatabase / hydra

Hydra: Column-oriented Postgres. Add scalable analytics to your project in minutes.
https://www.hydra.so
Apache License 2.0
2.8k stars 73 forks source link

Add vectorization for timestamp[tz] data types #266

Closed japinli closed 2 months ago

japinli commented 3 months ago

Hi,

I find that timestamp[tz] do not support vectorization when filtering, here is a PR to improve this.

Here is my test with debug information.

CREATE EXTENSION columnar;
CREATE TABLE t (id int, ts timestamptz) USING columnar ;
\timing
INSERT INTO t SELECT id, now() + id * '1 day'::interval FROM generate_series(1, 100000) id;

Without patched.

[local]:1288691 postgres=# EXPLAIN (analyze, verbose) SELECT * FROM t WHERE ts between '2025-01-01'::timestamptz and '2025-02-01'::timestamptz;
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Custom Scan (ColumnarScan) on public.t  (cost=0.00..10080.10 rows=100000 width=12) (actual time=2.436..8.841 rows=31 loops=1)
   Output: id, ts
   Filter: ((t.ts >= '2025-01-01 00:00:00+08'::timestamp with time zone) AND (t.ts <= '2025-02-01 00:00:00+08'::timestamp with time zone))
   Rows Removed by Filter: 9969
   Columnar Projected Columns: id, ts
   Columnar Chunk Group Filters: ((ts >= '2025-01-01 00:00:00+08'::timestamp with time zone) AND (ts <= '2025-02-01 00:00:00+08'::timestamp with time zone))
   Columnar Chunk Groups Removed by Filter: 9
 Planning Time: 1.149 ms
 Execution Time: 8.920 ms
(9 rows)

Time: 11.847 ms

[local]:1288691 postgres=# EXPLAIN (analyze, verbose) SELECT * FROM t WHERE ts between '2026-01-01'::timestamptz and '2026-02-01'::timestamptz;
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Custom Scan (ColumnarScan) on public.t  (cost=0.00..10080.10 rows=100000 width=12) (actual time=2.646..7.665 rows=31 loops=1)
   Output: id, ts
   Filter: ((t.ts >= '2026-01-01 00:00:00+08'::timestamp with time zone) AND (t.ts <= '2026-02-01 00:00:00+08'::timestamp with time zone))
   Rows Removed by Filter: 9969
   Columnar Projected Columns: id, ts
   Columnar Chunk Group Filters: ((ts >= '2026-01-01 00:00:00+08'::timestamp with time zone) AND (ts <= '2026-02-01 00:00:00+08'::timestamp with time zone))
   Columnar Chunk Groups Removed by Filter: 9
 Planning Time: 1.026 ms
 Execution Time: 7.734 ms
(9 rows)

Time: 10.119 ms

With patched.

[local]:1291472 postgres=# EXPLAIN (analyze, verbose) SELECT * FROM t WHERE ts between '2025-01-01'::timestamptz and '2025-02-01'::timestamptz;
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Custom Scan (ColumnarScan) on public.t  (cost=0.00..10080.10 rows=100000 width=12) (actual time=4.178..4.402 rows=31 loops=1)
   Output: id, ts
   Columnar Projected Columns: id, ts
   Columnar Chunk Group Filters: ((ts >= '2025-01-01 00:00:00+08'::timestamp with time zone) AND (ts <= '2025-02-01 00:00:00+08'::timestamp with time zone))
   Columnar Chunk Groups Removed by Filter: 9
   Columnar Vectorized Filter: ((ts >= '2025-01-01 00:00:00+08'::timestamp with time zone) AND (ts <= '2025-02-01 00:00:00+08'::timestamp with time zone))
 Planning Time: 1.024 ms
 Execution Time: 4.554 ms
(8 rows)

Time: 7.176 ms
[local]:1291472 postgres=# EXPLAIN (analyze, verbose) SELECT * FROM t WHERE ts between '2026-01-01'::timestamptz and '2026-02-01'::timestamptz;
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Custom Scan (ColumnarScan) on public.t  (cost=0.00..10080.10 rows=100000 width=12) (actual time=4.122..4.334 rows=31 loops=1)
   Output: id, ts
   Columnar Projected Columns: id, ts
   Columnar Chunk Group Filters: ((ts >= '2026-01-01 00:00:00+08'::timestamp with time zone) AND (ts <= '2026-02-01 00:00:00+08'::timestamp with time zone))
   Columnar Chunk Groups Removed by Filter: 9
   Columnar Vectorized Filter: ((ts >= '2026-01-01 00:00:00+08'::timestamp with time zone) AND (ts <= '2026-02-01 00:00:00+08'::timestamp with time zone))
 Planning Time: 0.926 ms
 Execution Time: 4.472 ms
(8 rows)

Time: 6.861 ms
wuputah commented 2 months ago

Can you please add a test (probably an explain) to cover this, similar to your PR text?

japinli commented 2 months ago

Can you please add a test (probably an explain) to cover this, similar to your PR text?

Of course, I create a new columnar_vectorization.sql for testing vectorization. Updated.