timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
16.83k stars 852 forks source link

[Bug]: Could not find pathkey item to sort on partially compressed chunks of a hypertable #6975

Closed srieding closed 1 month ago

srieding commented 1 month ago

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Query executor, Query planner

What happened?

The problem occurs for a query that takes the form:

select *
from measurements
       join devices on measurements.device_id = devices.id
where generated_at > '2023-01-04 12:00:00'
  and generated_at < '2023-01-19 12:17:00'
  and device_id = 1
order by generated_at, location_id, device_id, devices.name;

The query is not a production like query. It instead comes from the included script that has a more minimal reproduction path to get to the same error. Here

It could be similar to https://github.com/timescale/timescaledb/issues/2232 But that is almost impossible to tell since that ticket is neither moving nor does it have a clean enough reproduction path.

TimescaleDB version affected

2.14.2

PostgreSQL version used

14.1

What operating system did you use?

window 11 x64

What installation method did you use?

Docker

What platform did you run on?

Timescale Cloud

Relevant log output and stack trace

ERROR: could not find pathkey item to sort

How can we reproduce the bug?

-- starting on an empty database
-- running docker image: image: timescale/timescaledb:2.14.2-pg14
-- with timescale 2.14.2
SELECT extversion FROM pg_extension where extname = 'timescaledb';

CREATE TABLE measurements (
                              "generated_at" timestamptz not null,
                              "value" int4 not null,
                              "location_id" int4 not null,
                              "device_id" int4 not null
);

CREATE TABLE devices (
                            "id" int4 not null,
                            "name" varchar(255) not null
);

DO $$
  BEGIN
    FOR i IN 1..100000 LOOP
        insert into devices (id, name) values (i, 'device ' || i);
      END LOOP;
  END
$$;

ALTER TABLE "measurements" ADD CONSTRAINT measurements_pkey PRIMARY KEY (location_id, generated_at);

SELECT create_hypertable (
               'measurements',
               'generated_at',
               chunk_time_interval => INTERVAL '7 days',
               if_not_exists => TRUE,
               create_default_indexes => FALSE
           );

-- create a bit of test data we want to retrieve
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-01 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-02 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-03 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-04 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-05 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-06 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-07 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-08 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-09 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-11 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-12 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-13 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-14 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-15 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-16 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-17 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-18 12:00:00', 1, 1, 1);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:00:00', 1, 1, 1);

insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-01 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-02 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-03 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-04 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-05 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-06 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-07 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-08 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-09 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-11 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-12 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-13 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-14 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-15 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-16 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-17 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-18 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:01:00', 1, 1, 2);

-- create a lot of junk data that will make sure we don't end up with a SeqScan query plan that will work
DO $$
  BEGIN
    FOR i IN 3..100000 LOOP
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:02:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:03:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:04:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:05:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:06:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:07:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:08:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:09:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:10:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:11:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:12:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:13:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:14:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:15:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:16:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:17:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:18:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:19:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:20:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:21:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:22:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:23:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:24:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:25:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:26:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:27:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:28:00', 1, i, i);
        insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-19 12:29:00', 1, i, i);
      END LOOP;
  END
$$;

-- allow compression
ALTER TABLE measurements SET (timescaledb.compress, timescaledb.compress_segmentby = 'location_id,device_id');

-- compress all chunks
DO $$
  DECLARE chunk regclass;
  BEGIN
    FOR chunk IN SELECT format('%I.%I', chunk_schema, chunk_name)::regclass
                 FROM timescaledb_information.chunks
                 WHERE is_compressed = false and hypertable_name = 'measurements'
      LOOP
        RAISE NOTICE 'compressing cunk %', chunk::text;
        PERFORM compress_chunk(chunk);
      END LOOP;
  END
$$;

-- Now we mess around a bit with the chunk to make it partially compressed
delete from measurements where generated_at = '2023-01-07 12:01:00';
delete from measurements where generated_at = '2023-01-08 12:01:00';
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-07 12:01:00', 1, 1, 2);
insert into measurements (generated_at, value, location_id, device_id) values ('2023-01-08 12:01:00', 1, 1, 2);
update measurements set value = 2 where generated_at = '2023-01-07 12:01:00';

-- And mess with it some more...
DO $$
  BEGIN
    FOR i IN 3..5 LOOP
        update measurements set value = i where generated_at = '2023-01-19 12:02:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:03:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:04:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:05:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:06:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:07:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:08:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:09:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:10:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:11:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:12:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:13:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:14:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:15:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:16:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:17:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:18:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:19:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:20:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:21:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:22:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:23:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:24:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:25:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:26:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:27:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:28:00' and location_id = i and device_id = i;
        update measurements set value = i where generated_at = '2023-01-19 12:29:00' and location_id = i and device_id = i;
      END LOOP;
  END
$$;

-- This now errors with error ERROR: could not find pathkey item to sort.
explain select * from measurements join devices on measurements.device_id = devices.id where generated_at > '2023-01-04 12:00:00' and generated_at < '2023-01-19 12:17:00' and device_id = 1 order by generated_at,location_id,device_id, devices.name;
-- and so will:
select *
from measurements
       join devices on measurements.device_id = devices.id
where generated_at > '2023-01-04 12:00:00'
  and generated_at < '2023-01-19 12:17:00'
  and device_id = 1
order by generated_at, location_id, device_id, devices.name;

--recompress all chunks:
DO $$
  DECLARE chunk regclass;
  BEGIN
    FOR chunk IN SELECT format('%I.%I', chunk_schema, chunk_name)::regclass
                 FROM timescaledb_information.chunks
                 WHERE hypertable_name = 'measurements'
      LOOP
        RAISE NOTICE 're-compressing cunk %', chunk::text;
        PERFORM decompress_chunk(chunk);
        PERFORM compress_chunk(chunk);
      END LOOP;
  END
$$;

-- This now no longer errors
explain select * from measurements join devices on measurements.device_id = devices.id where generated_at > '2023-01-04 12:00:00' and generated_at < '2023-01-19 12:17:00' and device_id = 1 order by generated_at,location_id,device_id, devices.name;
-- and neither does:
select * from measurements join devices on measurements.device_id = devices.id where generated_at > '2023-01-04 12:00:00' and generated_at < '2023-01-19 12:17:00' and device_id = 1 order by generated_at,location_id,device_id, devices.name;
antekresic commented 1 month ago

Thanks for reporting with a great repro case!

Taking a look at this, will update once I have something.

zilder commented 1 month ago

I just realized I haven't sent the comment I've prepared last week. I've reduced the reproduction script, getting the same error.

CREATE TABLE measurements (
    generated_at timestamptz not null,
    device_id int4 not null,
    CONSTRAINT measurements_pkey PRIMARY KEY (generated_at)
);

CREATE TABLE devices (
    id int4 not null,
    name varchar(255) not null
);

SELECT create_hypertable(
    'measurements',
    'generated_at',
    chunk_time_interval => INTERVAL '7 days',
    if_not_exists => TRUE,
    create_default_indexes => FALSE
);

INSERT INTO measurements VALUES ('2023-01-01 12:00', 1), ('2023-01-02 12:00', 1), ('2023-01-08 12:00', 1);

ALTER TABLE measurements SET (timescaledb.compress);

SELECT compress_chunk(format('%I.%I', chunk_schema, chunk_name)::regclass)
FROM timescaledb_information.chunks
WHERE is_compressed = false and hypertable_name = 'measurements';

UPDATE measurements SET device_id = 2 WHERE generated_at = '2023-01-01 12:00:00';

EXPLAIN SELECT * FROM measurements
JOIN devices ON measurements.device_id = devices.id
WHERE device_id = 1 ORDER BY devices.name;
ERROR:  could not find pathkey item to sort

Looks like there is a problem creating a Merge Append plan with partially compressed chunk when the result is sorted by a column that doesn't belong to the hypertable. Here's a stacktrace:

#0  errstart (elevel=21, domain=0x0) at elog.c:336
#1  0x00005f6a72cf2d79 in prepare_sort_from_pathkeys (lefttree=0x5f6a74694898, pathkeys=0x5f6a74650aa8,
    relids=0x5f6a74638ef0, reqColIdx=0x0, adjust_tlist_in_place=<optimized out>,
    p_numsortkeys=0x5f6a74694910, p_sortColIdx=0x5f6a74694918, p_sortOperators=0x5f6a74694920,
    p_collations=0x5f6a74694928, p_nullsFirst=0x5f6a74694930) at createplan.c:6236
#2  0x00005f6a72fd0a5e in create_merge_append_plan (root=0x5f6a74639538, best_path=0x5f6a74694eb8,
    flags=1) at createplan.c:1450
#3  0x00005f6a72fd14e3 in create_customscan_plan (scan_clauses=0x5f6a7464fda0, tlist=0x5f6a74696a30,
    best_path=0x5f6a74694d18, root=0x5f6a74639538) at createplan.c:4256
#4  create_scan_plan (root=0x5f6a74639538, best_path=0x5f6a74694d18, flags=<optimized out>)
    at createplan.c:773
#5  0x00005f6a72fccb0b in create_nestloop_plan (best_path=0x5f6a74695c30, root=0x5f6a74639538)
    at createplan.c:4331
#6  create_join_plan (best_path=0x5f6a74695c30, root=0x5f6a74639538) at createplan.c:1076
#7  create_plan_recurse (root=0x5f6a74639538, best_path=0x5f6a74695c30, flags=<optimized out>)
    at createplan.c:416
#8  0x00005f6a72fd0165 in create_plan (root=root@entry=0x5f6a74639538, best_path=<optimized out>)
    at createplan.c:347
#9  0x00005f6a72fdf888 in standard_planner (parse=parse@entry=0x5f6a74664fb0,
    query_string=query_string@entry=0x5f6a7453ddb0 "explain select * from measurements\njoin devices on measurements.device_id = devices.id\nwhere generated_at > '2023-01-04 12:00:00' and generated_at < '2023-01-19 12:17:00'\n    and device_id = 1 order b"..., cursorOptions=cursorOptions@entry=2048,
    boundParams=boundParams@entry=0x0) at planner.c:415
#10 0x0000795f18c41138 in timescaledb_planner (parse=0x5f6a74664fb0,
    query_string=0x5f6a7453ddb0 "explain select * from measurements\njoin devices on measurements.device_id = devices.id\nwhere generated_at > '2023-01-04 12:00:00' and generated_at < '2023-01-19 12:17:00'\n    and device_id = 1 order b"..., cursor_opts=2048, bound_params=0x0)
    at /home/zilder/projects/timescaledb/src/planner/planner.c:543
#11 0x00005f6a730ba23d in pg_plan_query (querytree=0x5f6a74664fb0,
    query_string=query_string@entry=0x5f6a7453ddb0 "explain select * from measurements\njoin devices on measurements.device_id = devices.id\nwhere generated_at > '2023-01-04 12:00:00' and generated_at < '2023-01-19 12:17:00'\n    and device_id = 1 order b"..., cursorOptions=cursorOptions@entry=2048,
    boundParams=boundParams@entry=0x0) at postgres.c:883
#12 0x00005f6a72eaf714 in ExplainOneQuery (query=<optimized out>, cursorOptions=2048, into=0x0,
    es=0x5f6a74638258,
    queryString=0x5f6a7453ddb0 "explain select * from measurements\njoin devices on measurements.device_id = devices.id\nwhere generated_at > '2023-01-04 12:00:00' and generated_at < '2023-01-19 12:17:00'\n    and device_id = 1 order b"..., params=0x0, queryEnv=0x0) at explain.c:397
...
sgflt commented 1 month ago

Workaround:

SELECT * FROM (
  SELECT * FROM measurements
  JOIN devices ON measurements.device_id = devices.id
  WHERE device_id = 1
) x
 ORDER BY x.name;
antekresic commented 1 month ago

Thanks for all the input, I have a PR up for the fix that should go out with the next release.