yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.92k stars 1.06k forks source link

[YSQL] [randgen] SIGABRT Assertion failure at postgres`cost_gather_merge + 225 at .../yugabyte-db/src/postgres/src/backend/optimizer/path/costsize.c:485 #21733

Open mtakahar opened 6 months ago

mtakahar commented 6 months ago

Jira Link: DB-10605

Description

Problem

YB test# /*+ Set(enable_seqscan OFF) */  SELECT 'l' UNION  ALL (  SELECT 'g' )  ORDER  BY  1   ;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
psql (15.1, server 11.2-YB-2.23.0.0-b0)
YB test#

Backtrace on a debug build:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
  * frame #0: 0x00007ff8136ab196 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007ff8136e2ee6 libsystem_pthread.dylib`pthread_kill + 263
    frame #2: 0x00007ff813609b45 libsystem_c.dylib`abort + 123
    frame #3: 0x000000010390958d postgres`ExceptionalCondition + 125 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/utils/error/assert.c:54
    frame #4: 0x0000000103564a01 postgres`cost_gather_merge + 225 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/optimizer/path/costsize.c:485
    frame #5: 0x00000001035d2369 postgres`create_gather_merge_path + 633 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/optimizer/util/pathnode.c:2127
    frame #6: 0x00000001035a74e4 postgres`create_ordered_paths + 612 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/optimizer/plan/planner.c:5185
    frame #7: 0x00000001035a2e51 postgres`grouping_planner + 3617 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/optimizer/plan/planner.c:2290
    frame #8: 0x00000001035a03d1 postgres`subquery_planner + 3073 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/optimizer/plan/planner.c:981
    frame #9: 0x000000010359efae postgres`standard_planner + 638 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/optimizer/plan/planner.c:410
    frame #10: 0x0000000107d58799 pg_hint_plan.so`pg_hint_plan_planner + 1385 at /Users/mtakahara/code/yugabyte-db/src/postgres/third-party-extensions/pg_hint_plan/pg_hint_plan.c:3186
    frame #11: 0x000000010359ed05 postgres`planner + 53 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/optimizer/plan/planner.c:266
    frame #12: 0x00000001036f9569 postgres`pg_plan_query + 137 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/tcop/postgres.c:905
    frame #13: 0x00000001036f96bf postgres`pg_plan_queries + 239 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/tcop/postgres.c:971
    frame #14: 0x00000001037048a9 postgres`exec_simple_query + 1113 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/tcop/postgres.c:1148
    frame #15: 0x0000000103702415 postgres`yb_exec_simple_query_impl + 21 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/tcop/postgres.c:4759
    frame #16: 0x000000010370257a postgres`yb_exec_query_wrapper_one_attempt + 346 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/tcop/postgres.c:4727
    frame #17: 0x00000001037023ea postgres`yb_exec_query_wrapper + 74 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/tcop/postgres.c:4751
    frame #18: 0x00000001036fd665 postgres`yb_exec_simple_query + 69 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/tcop/postgres.c:4774
    frame #19: 0x00000001036fbf2e postgres`PostgresMain + 2910 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/tcop/postgres.c:5399
    frame #20: 0x0000000103618e55 postgres`BackendRun + 933 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/postmaster/postmaster.c:4736
    frame #21: 0x0000000103617eff postgres`BackendStartup + 703 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/postmaster/postmaster.c:4400
    frame #22: 0x00000001036168b0 postgres`ServerLoop + 992 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/postmaster/postmaster.c:1778
    frame #23: 0x0000000103612e3e postgres`PostmasterMain + 7566 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/postmaster/postmaster.c:1434
    frame #24: 0x000000010350209b postgres`PostgresServerProcessMain + 779 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/main/main.c:234
    frame #25: 0x0000000103502652 postgres`main + 34
    frame #26: 0x00007ff81338941f dyld`start + 1903
(lldb) f 4
f 4
frame #4: 0x0000000103564a01 postgres`cost_gather_merge + 225 at /Users/mtakahara/code/yugabyte-db/src/postgres/src/backend/optimizer/path/costsize.c:485
   482       * be overgenerous since the leader will do less work than other workers
   483       * in typical cases, but we'll go with it for now.
   484       */
-> 485      Assert(path->num_workers > 0);
            ^
   486      N = (double) path->num_workers + 1;
   487      logN = LOG2(N);
   488
(lldb) p path
p path
(GatherMergePath *) 0x0000000112d27db0
(lldb) p path->num_workers
p path->num_workers
(int) 0

Test Case

No tables necessary. Just run the query with SeqScan disabled either via hint or the guc parameter.

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

mtakahar commented 6 months ago

The plan on a release build shown below. Note Workers Planned: 0.

                           QUERY PLAN                           
----------------------------------------------------------------
 Gather Merge  (cost=1000.03..1000.04 rows=0 width=32)
   Workers Planned: 0
   ->  Sort  (cost=0.04..0.04 rows=1 width=32)
         Sort Key: ('g'::text)
         ->  Parallel Append  (cost=0.00..0.03 rows=1 width=32)
               ->  Result  (cost=0.00..0.01 rows=1 width=32)
               ->  Result  (cost=0.00..0.01 rows=1 width=32)
(7 rows)

Also the plan from the vanilla PG15:

                           QUERY PLAN                           
----------------------------------------------------------------
 Gather Merge  (cost=1000.02..1000.03 rows=0 width=32)
   Workers Planned: 0
   ->  Sort  (cost=0.03..0.03 rows=1 width=32)
         Sort Key: ('l'::text)
         ->  Parallel Append  (cost=0.00..0.01 rows=1 width=32)
               ->  Result  (cost=0.00..0.01 rows=1 width=32)
               ->  Result  (cost=0.00..0.01 rows=1 width=32)
(7 rows)
mtakahar commented 6 months ago

Tried in on a local build of vanilla PG REL_11_STABLE (11.20) (built with --enable-debug --enable-cassert) to see if it hits the assert, but it wouldn't choose the parallel plan as the 15.1 installed via homebrew.

PG11 test# select version();
                                                      version
--------------------------------------------------------------------------------------------------------------------
 PostgreSQL 11.20 on x86_64-apple-darwin22.6.0, compiled by Apple clang version 15.0.0 (clang-1500.1.0.2.5), 64-bit
(1 row)

PG11 test# /*+ Set(enable_seqscan OFF) */  SELECT 'l' UNION  ALL (  SELECT 'g' )  ORDER  BY  1   ;
 ?column?
----------
 g
 l
(2 rows)

PG11 test# explain /*+ Set(enable_seqscan OFF) */  SELECT 'l' UNION  ALL (  SELECT 'g' )  ORDER  BY  1   ;
                      QUERY PLAN
-------------------------------------------------------
 Sort  (cost=0.06..0.07 rows=2 width=32)
   Sort Key: ('l'::text)
   ->  Append  (cost=0.00..0.05 rows=2 width=32)
         ->  Result  (cost=0.00..0.01 rows=1 width=32)
         ->  Result  (cost=0.00..0.01 rows=1 width=32)
(5 rows)

PG11 test# set force_parallel_mode=on;
SET
PG11 test# /*+ Set(enable_seqscan OFF) */  SELECT 'l' UNION  ALL (  SELECT 'g' )  ORDER  BY  1   ;
 ?column?
----------
 g
 l
(2 rows)

PG11 test# explain /*+ Set(enable_seqscan OFF) */  SELECT 'l' UNION  ALL (  SELECT 'g' )  ORDER  BY  1   ;
                         QUERY PLAN
-------------------------------------------------------------
 Gather  (cost=1000.06..1000.27 rows=2 width=32)
   Workers Planned: 1
   Single Copy: true
   ->  Sort  (cost=0.06..0.07 rows=2 width=32)
         Sort Key: ('l'::text)
         ->  Append  (cost=0.00..0.05 rows=2 width=32)
               ->  Result  (cost=0.00..0.01 rows=1 width=32)
               ->  Result  (cost=0.00..0.01 rows=1 width=32)
(8 rows)
andrei-mart commented 5 months ago

@mtakahar, are you sure you had pg_hint_plan enabled when you ran it on vanilla PG? The pg_hint_plan has known issues with parallel append: https://github.com/ossc-db/pg_hint_plan/issues/95 I was able to reproduce Workers Planned: 0 on vanilla PG and pg_hint_plan (both latest master), but it did not assert. I commented on the upstream issue and suggested a fix. We may wait for their response for some time, with Parallel Append disabled it should not bother us for some time.

mtakahar commented 5 months ago

@andrei-mart

I was able to reproduce Workers Planned: 0 on vanilla PG and pg_hint_plan (both latest master), That's inline with what I saw (https://github.com/yugabyte/yugabyte-db/issues/21733#issuecomment-2026451750).

but it did not assert. Did you try with a binary built with --enable-debug --enable-cassert? If so, great that you were able to confirm.

I guess you are right, I may have forgotten to build & enable corresponding pg_hint_plan when I tried the vanilla PG 11 and 15 I built with --enable-debug --enable-cassert and that may be the reason I was unable to reproduce the plan with those.

andrei-mart commented 5 months ago

I first built without, and than tried to rebuild with --enable-debug --enable-cassert later, so I'm not sure I did not make a mistake somewhere and used correct binary. Anyway, it is all about a block of code in pg_hint_plan code that does funny things to ParallelAppend. We have ParallelAppend disabled in our code, so we are safe for a while, just need to be aware not to enable ParallelAppend and use pg_hint_plan at the same time. Hence we can wait to see what upstream does with the issue. If nothing, we will have to address it ourself before we re-enable ParallelAppend.