timescale / timescaledb

An open-source time-series SQL database optimized for fast ingest and complex queries. Packaged as a PostgreSQL extension.
https://www.timescale.com/
Other
17.72k stars 885 forks source link

Segfault on hypertable query in lateral join with aggregate #579

Closed fvannee closed 6 years ago

fvannee commented 6 years ago

On TimescaleDb 0.9.2 (PG10.4) we observed a segfault on certain queries. I haven't tested on 0.10.0 yet, but I think the part of the code that crashes has not changed in the new version. I've narrowed the query down to the following:

explain
select * FROM normal_table ik
LEFT JOIN LATERAL (SELECT max(time0::bigint) FROM hypertable s WHERE ik.name = 'value' and s.time0 > now()) s ON TRUE

The query is a bit weird, as it has a WHERE clause inside the subquery which references the outer query (ik.name = 'value'). It seems the combination of ik.name='value' and the max aggregate function makes TimescaleDb crash. Removing either one results in a valid plan. Removing the condition on s.time0 > now() also yields a valid plan, but this is just because the constraint_aware_append is not triggered without this condition.

The first part of the stack trace:

0 list_nth_cell (list=0x2d44bd0, n=47494368) at list.c:399

1 0x00000000006239e2 in list_nth (list=, n=) at list.c:413

2 0x00007f58d865d6bb in ca_append_begin (node=0x2d4a550, estate=0x2d486e8, eflags=17) at /home/florisvannee/timescaledb/src/constraint_aware_append.c:165

3 0x00000000005f1712 in ExecInitCustomScan (cscan=cscan@entry=0x2d440b8, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at nodeCustom.c:105

4 0x00000000005e54fd in ExecInitNode (node=0x2d440b8, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at execProcnode.c:282

5 0x00000000005fe644 in ExecInitResult (node=node@entry=0x2d44408, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at nodeResult.c:223

6 0x00000000005e52fd in ExecInitNode (node=0x2d44408, estate=estate@entry=0x2d486e8, eflags=eflags@entry=17) at execProcnode.c:164

It crashes on the following line in ca_append_begin: RangeTblEntry *rte = rt_fetch(scan->scanrelid, estate->es_range_table);

The es_range_table is something with 6 entries and scanrelid is a very large number in a T_RESULT node. It tries to access the 47484376th element of the range table list.

(gdb) print estate->es_range_table $1 = {type = T_List, length = 6, head = 0x2d44bb0, tail = 0x2d458a0} (gdb) print scan $2 = {plan = {type = T_Result, startup_cost = 0.14999999999999999, total_cost = 7.2587499999999991, plan_rows = 1, plan_width = 8, parallel_aware = 0 '\000', parallel_safe = 0 '\000', plan_node_id = 7, targetlist = 0x2d4adc8, qual = 0x0, lefttree = 0x2d4aeb8, righttree = 0x0, initPlan = 0x0, extParam = 0x2d4b4b8, allParam = 0x2d4b4d0}, scanrelid = 47494376}

I believe the aggregate function turns the children of the Append node into a T_RESULT (where the T_RESULT children contain the actual sequential/index scan on the relation). This is the plan when I change the query to use Postres partition exclusion instead of the constraint aware append:

Nested Loop Left Join (cost=14.02..10131.40 rows=720 width=93) -> Seq Scan on normal_table ik (cost=0.00..17.20 rows=720 width=85) -> Aggregate (cost=14.02..14.03 rows=1 width=8) -> Append (cost=0.00..13.68 rows=135 width=8) -> Result (cost=0.00..0.00 rows=1 width=8) One-Time Filter: (ik.name = 'value'::text) -> Seq Scan on hyper_table s (cost=0.00..0.00 rows=1 width=8) Filter: (time0 > '2018-05-20 00:00:00.0 +0200'::timestamp9) -> Result (cost=0.00..13.68 rows=134 width=8) One-Time Filter: (ik.name = 'value'::text) -> Seq Scan on _hyper_383_10587_chunk s_1 (cost=0.00..13.68 rows=134 width=8) Filter: (time0 > '2018-05-20 00:00:00.0 +0200'::timestamp9)

To me it looks like a quick fix to avoid segfaults would be to move:

        AppendRelInfo *appinfo = lfirst(lc_info);
        RangeTblEntry *rte = rt_fetch(scan->scanrelid, estate->es_range_table);

Inside the case block. This avoids the segfault. However, this is not the behavior you'd like, as it will not prune chunks from the plan. Ideally, there'd be a special case for T_RESULT which then looks at the children of T_RESULT to determine if the chunk can be pruned or not. This would result in correct behavior.

Any ideas on how best to approach this?

fvannee commented 6 years ago

I verified that the proposed quick fix to move the lfirst/rt_fetch calls inside the case-block works (eg. the database does not crash anymore).

mfreed commented 6 years ago

Thanks for the detailed bug report, @fvannee . We will try to replicate with your instructions.

fvannee commented 6 years ago

Fixed in 0.10.1 :)