timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.78k stars 886 forks source link

[Bug]: create_hypertable(..., migrate_data=>true) doesn't create a valid partition index #6835

Open alx-net opened 6 months ago

alx-net commented 6 months ago

What type of bug is this?

Performance issue

What subsystems and features are affected?

Query planner

What happened?

Scenario:

  1. Create a normal table with columns (time, value).
  2. Insert data.
  3. Apply create_hypertable(..., migrate_data=>true) to turn this table into a hypertable.

Observation/Bug: The index which gets created on the time column by timescale is broken and unusable for the query planner. Therefore, the query planner always uses seq scans instead of index scans.

This behaviour only occurs when we migrate data, whereas creating the hypertable first and then inserting data doesn't cause any indexing issues.

Reindexing fails: We tried to reindex and received the following results:

-- Query
reindex table test_data_timescale_index_migration_bug;
-- Output
table "_hyper_29683_894782_chunk" has no indexes to reindex
table "_hyper_29683_894783_chunk" has no indexes to reindex
...

We concluded, after checking pg_indexes by executing

select * from pg_indexes where tablename = 'test_data_timescale_index_migration_bug'

that the index on the hypertable is correctly registered, but must be broken in some way.

Query Planner Results

explain analyze confirmed that the timescale index does not work.

Query:

explain analyze
select * from test_data_timescale_index_migration_bug
order by time desc
limit 1;

Query plan using timescales default index:

-- Output
->  Parallel Seq Scan on _hyper_29683_894784_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
->  Parallel Seq Scan on _hyper_29683_894783_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.001 rows=1 loops=1)

Postgres scans all partitions, instead of checking the partition with the youngest timestamp.

Fix/Workaround: We fixed the issue by manually creating a new index on the hypertable.

-- Query
create index time_index on test_data_timescale_index_migration_bug (time desc);

Then we get the expected index scans in the query plan:

-- Output
->  Index Scan using _hyper_29684_894837_chunk_time_index on _hyper_29684_894837_chunk  (cost=0.12..2.34 rows=1 width=12) (actual time=0.020..0.020 rows=1 loops=1)
->  Index Scan using _hyper_29684_894836_chunk_time_index on _hyper_29684_894836_chunk  (cost=0.12..2.34 rows=1 width=12) (never executed)
...

Now we see the expected behaviour.

Credits: @probstal, @theokli, @BastianHentschel, @alx-net

TimescaleDB version affected

2.14.2

PostgreSQL version used

16.2

What operating system did you use?

Ubuntu 21.04 x86_64

What installation method did you use?

Deb/Apt

What platform did you run on?

On prem/Self-hosted

Relevant log output and stack trace

Limit  (cost=2346.30..2346.42 rows=1 width=12) (actual time=21.241..25.953 rows=1 loops=1)
  ->  Gather Merge  (cost=2346.30..15531.90 rows=108120 width=12) (actual time=21.238..25.948 rows=1 loops=1)
        Workers Planned: 6
        Workers Launched: 6
        ->  Sort  (cost=1346.20..1391.25 rows=18020 width=12) (actual time=0.255..0.261 rows=0 loops=7)
"              Sort Key: _hyper_29683_894784_chunk.""time"" DESC"
              Sort Method: top-N heapsort  Memory: 25kB
              Worker 0:  Sort Method: quicksort  Memory: 25kB
              Worker 1:  Sort Method: quicksort  Memory: 25kB
              Worker 2:  Sort Method: quicksort  Memory: 25kB
              Worker 3:  Sort Method: quicksort  Memory: 25kB
              Worker 4:  Sort Method: quicksort  Memory: 25kB
              Worker 5:  Sort Method: quicksort  Memory: 25kB
              ->  Parallel Append  (cost=0.00..1256.10 rows=18020 width=12) (actual time=0.003..0.026 rows=8 loops=7)
                    ->  Parallel Seq Scan on _hyper_29683_894784_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894783_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894782_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894781_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894780_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894779_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.003..0.003 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894778_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894777_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894776_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894775_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894774_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894773_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894772_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894771_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894770_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894769_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894768_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894767_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894766_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894765_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894764_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894763_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894762_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894761_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.006..0.007 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894760_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894759_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.001 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894758_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894757_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894756_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894755_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894754_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894753_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894752_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894751_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894750_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894749_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.001..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894748_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894747_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894746_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894745_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.008..0.008 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894744_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894743_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894742_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894741_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894740_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894739_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.008..0.009 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894738_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894737_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894736_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.003..0.003 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894735_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894734_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.002..0.002 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894733_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.003..0.003 rows=1 loops=1)
                    ->  Parallel Seq Scan on _hyper_29683_894732_chunk  (cost=0.00..22.00 rows=1200 width=12) (actual time=0.014..0.015 rows=1 loops=1)
Planning Time: 2.642 ms
Execution Time: 26.060 ms

How can we reproduce the bug?

drop table if exists test_data_timescale_index_migration_bug;
create table test_data_timescale_index_migration_bug as (select time, random() as value from generate_series(
    '2021-01-01',
    '2022-01-01', INTERVAL '7 days'
) as time);

select create_hypertable('test_data_timescale_index_migration_bug', by_range('time'), migrate_data=>true);

explain analyze
select * from test_data_timescale_index_migration_bug
order by time desc
limit 1;
probstal commented 5 months ago

push

theokli commented 4 months ago

push

RobAtticus commented 2 months ago

I think this is fixed via #7190

RobAtticus commented 2 months ago

cc @fabriziomello to confirm?