yugabyte / yugabyte-db

YugabyteDB - the cloud native distributed SQL database for mission-critical applications.
https://www.yugabyte.com
Other
8.94k stars 1.06k forks source link

[YSQL] [randgen] Distinct pushdown error: Key is not table tombstone key #22822

Closed timothy-e closed 3 months ago

timothy-e commented 4 months ago

Jira Link: DB-11721

Description

/*+ Set(enable_seqscan OFF) Set(enable_sort OFF) */ SELECT table1.col_int_nokey AS field1 FROM C AS table1 WHERE ( ( table1.col_varchar_key, table1.pk ) = ( SELECT DISTINCT SUBQUERY3_t1.col_varchar_nokey AS SUBQUERY3_field1, SUBQUERY3_t1.col_int_nokey AS SUBQUERY3_field2 FROM ( CC AS SUBQUERY3_t1 INNER JOIN CC AS SUBQUERY3_t2 ON (( SUBQUERY3_t2.col_int_nokey, SUBQUERY3_t1.col_int_nokey ) NOT IN
( SELECT DISTINCT CHILD_SUBQUERY1_t2.col_int_key AS CHILD_SUBQUERY1_field1, CHILD_SUBQUERY1_t1.pk AS child_subquery1_field2 FROM ( D AS CHILD_SUBQUERY1_t1 LEFT JOIN CC AS CHILD_SUBQUERY1_t2 ON (CHILD_SUBQUERY1_t2.col_int_key = CHILD_SUBQUERY1_t1.col_int_nokey ) INNER JOIN C AS CHILD_SUBQUERY1_t3 ON (CHILD_SUBQUERY1_t3.col_varchar_key = CHILD_SUBQUERY1_t2.col_varchar_nokey ) ) ) )  ) ORDER BY 1, 2 LIMIT 1 ) );
ERROR:  Key 47121048800000012153473DDF488000010001102121000021 is not table tombstone key.
EXPLAIN /*+ Set(enable_seqscan OFF) Set(enable_sort OFF) */ SELECT table1.col_int_nokey AS field1 FROM C AS table1 WHERE ( ( table1.col_varchar_key, table1.pk ) = ( SELECT DISTINCT SUBQUERY3_t1.col_varchar_nokey AS SUBQUERY3_field1, SUBQUERY3_t1.col_int_nokey AS SUBQUERY3_field2 FROM ( CC AS SUBQUERY3_t1 INNER JOIN CC AS SUBQUERY3_t2 ON (( SUBQUERY3_t2.col_int_nokey, SUBQUERY3_t1.col_int_nokey ) NOT IN
( SELECT DISTINCT CHILD_SUBQUERY1_t2.col_int_key AS CHILD_SUBQUERY1_field1, CHILD_SUBQUERY1_t1.pk AS child_subquery1_field2 FROM ( D AS CHILD_SUBQUERY1_t1 LEFT JOIN CC AS CHILD_SUBQUERY1_t2 ON (CHILD_SUBQUERY1_t2.col_int_key = CHILD_SUBQUERY1_t1.col_int_nokey ) INNER JOIN C AS CHILD_SUBQUERY1_t3 ON (CHILD_SUBQUERY1_t3.col_varchar_key = CHILD_SUBQUERY1_t2.col_varchar_nokey ) ) ) )  ) ORDER BY 1, 2 LIMIT 1 ) );
                                                                                         QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using c_pkey on c table1  (cost=20000027995.00..20000027999.12 rows=1 width=4)
   Index Cond: (pk = $2051)
   Storage Filter: ((col_varchar_key)::text = ($2050)::text)
   InitPlan 2 (returns $2050,$2051)
     ->  Limit  (cost=20000027995.00..20000027995.00 rows=1 width=12)
           ->  Sort  (cost=20000027995.00..20000027995.50 rows=200 width=12)
                 Sort Key: subquery3_t1.col_varchar_nokey, subquery3_t1.col_int_nokey
                 ->  HashAggregate  (cost=10000027992.00..10000027994.00 rows=200 width=12)
                       Group Key: subquery3_t1.col_varchar_nokey, subquery3_t1.col_int_nokey
                       ->  Nested Loop  (cost=10000007761.50..10000025492.00 rows=500000 width=12)
                             Join Filter: (NOT (hashed SubPlan 1))
                             ->  Index Only Scan using cc_varchar_key on cc subquery3_t1  (cost=0.00..114.00 rows=1000 width=12)
                             ->  Materialize  (cost=0.00..119.00 rows=1000 width=4)
                                   ->  Index Only Scan using cc_varchar_key on cc subquery3_t2  (cost=0.00..114.00 rows=1000 width=4)
                             SubPlan 1
                               ->  HashAggregate  (cost=10000007449.00..10000007699.00 rows=25000 width=8)
                                     Group Key: child_subquery1_t2.col_int_key, child_subquery1_t1.pk
                                     ->  YB Batched Nested Loop Join  (cost=10000000000.00..10000007324.00 rows=25000 width=8)
                                           Join Filter: ((child_subquery1_t2.col_varchar_nokey)::text = (child_subquery1_t3.col_varchar_key)::text)
                                           ->  YB Batched Nested Loop Join  (cost=10000000000.00..10000001304.00 rows=5000 width=16)
                                                 Join Filter: (child_subquery1_t1.col_int_nokey = child_subquery1_t2.col_int_key)
                                                 ->  Seq Scan on d child_subquery1_t1  (cost=10000000000.00..10000000100.00 rows=1000 width=8)
                                                 ->  Distinct Index Only Scan using cc_int_key on cc child_subquery1_t2  (cost=0.00..1.10 rows=10 width=12)
                                                       Index Cond: (col_int_key = ANY (ARRAY[child_subquery1_t1.col_int_nokey, $1, $2, ..., $1023]))
                                                       Distinct Keys: child_subquery1_t2.col_int_key, child_subquery1_t2.col_int_nokey, child_subquery1_t2.col_varchar_nokey
                                           ->  Distinct Index Only Scan using c_varchar_key on c child_subquery1_t3  (cost=0.00..1.10 rows=10 width=8)
                                                 Index Cond: (col_varchar_key = ANY (ARRAY[(child_subquery1_t2.col_varchar_nokey)::text, ($1025)::text, ($1026)::text, ..., ($2047)::text]))
                                                 Distinct Keys: child_subquery1_t3.col_varchar_key
(28 rows)

But succeeds when distinct pushdown is false.

SET yb_enable_distinct_pushdown = false;
/*+ Set(enable_seqscan OFF) Set(enable_sort OFF) */ SELECT table1.col_int_nokey AS field1 FROM C AS table1 WHERE ( ( table1.col_varchar_key, table1.pk ) = ( SELECT DISTINCT SUBQUERY3_t1.col_varchar_nokey AS SUBQUERY3_field1, SUBQUERY3_t1.col_int_nokey AS SUBQUERY3_field2 FROM ( CC AS SUBQUERY3_t1 INNER JOIN CC AS SUBQUERY3_t2 ON (( SUBQUERY3_t2.col_int_nokey, SUBQUERY3_t1.col_int_nokey ) NOT IN
( SELECT DISTINCT CHILD_SUBQUERY1_t2.col_int_key AS CHILD_SUBQUERY1_field1, CHILD_SUBQUERY1_t1.pk AS child_subquery1_field2 FROM ( D AS CHILD_SUBQUERY1_t1 LEFT JOIN CC AS CHILD_SUBQUERY1_t2 ON (CHILD_SUBQUERY1_t2.col_int_key = CHILD_SUBQUERY1_t1.col_int_nokey ) INNER JOIN C AS CHILD_SUBQUERY1_t3 ON (CHILD_SUBQUERY1_t3.col_varchar_key = CHILD_SUBQUERY1_t2.col_varchar_nokey ) ) ) )  ) ORDER BY 1, 2 LIMIT 1 ) );
 field1
--------
(0 rows)
                                                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using c_pkey on c table1  (cost=20000027995.00..20000027999.12 rows=1 width=4)
   Index Cond: (pk = $2051)
   Storage Filter: ((col_varchar_key)::text = ($2050)::text)
   InitPlan 2 (returns $2050,$2051)
     ->  Limit  (cost=20000027995.00..20000027995.00 rows=1 width=12)
           ->  Sort  (cost=20000027995.00..20000027995.50 rows=200 width=12)
                 Sort Key: subquery3_t1.col_varchar_nokey, subquery3_t1.col_int_nokey
                 ->  HashAggregate  (cost=10000027992.00..10000027994.00 rows=200 width=12)
                       Group Key: subquery3_t1.col_varchar_nokey, subquery3_t1.col_int_nokey
                       ->  Nested Loop  (cost=10000007761.50..10000025492.00 rows=500000 width=12)
                             Join Filter: (NOT (hashed SubPlan 1))
                             ->  Index Only Scan using cc_varchar_key on cc subquery3_t1  (cost=0.00..114.00 rows=1000 width=12)
                             ->  Materialize  (cost=0.00..119.00 rows=1000 width=4)
                                   ->  Index Only Scan using cc_varchar_key on cc subquery3_t2  (cost=0.00..114.00 rows=1000 width=4)
                             SubPlan 1
                               ->  HashAggregate  (cost=10000007449.00..10000007699.00 rows=25000 width=8)
                                     Group Key: child_subquery1_t2.col_int_key, child_subquery1_t1.pk
                                     ->  YB Batched Nested Loop Join  (cost=10000000000.00..10000007324.00 rows=25000 width=8)
                                           Join Filter: ((child_subquery1_t2.col_varchar_nokey)::text = (child_subquery1_t3.col_varchar_key)::text)
                                           ->  YB Batched Nested Loop Join  (cost=10000000000.00..10000001304.00 rows=5000 width=16)
                                                 Join Filter: (child_subquery1_t1.col_int_nokey = child_subquery1_t2.col_int_key)
                                                 ->  Seq Scan on d child_subquery1_t1  (cost=10000000000.00..10000000100.00 rows=1000 width=8)
                                                 ->  Index Only Scan using cc_int_key on cc child_subquery1_t2  (cost=0.00..1.10 rows=10 width=12)
                                                       Index Cond: (col_int_key = ANY (ARRAY[child_subquery1_t1.col_int_nokey, $1, $2, ..., $1023]))
                                           ->  Index Only Scan using c_varchar_key on c child_subquery1_t3  (cost=0.00..1.10 rows=10 width=8)
                                                 Index Cond: (col_varchar_key = ANY (ARRAY[(child_subquery1_t2.col_varchar_nokey)::text, ($1025)::text, ($1026)::text, ..., ($2047)::text]))
(26 rows)

Schema and data: dump-c-cc-dd.txt

Issue Type

kind/bug

Warning: Please confirm that this issue does not contain any sensitive information

pao214 commented 4 months ago

A delete related issue was resolved before https://github.com/yugabyte/yugabyte-db/issues/19911 cc @tanujnay112 for visibility

pao214 commented 4 months ago
->  Distinct Index Only Scan using cc_int_key on cc child_subquery1_t2  (cost=0.00..1.10 rows=10 width=12)
  Index Cond: (col_int_key = ANY (ARRAY[child_subquery1_t1.col_int_nokey, $1, $2, ..., $1023]))
  Distinct Keys: child_subquery1_t2.col_int_key, child_subquery1_t2.col_int_nokey, child_subquery1_t2.col_varchar_nokey

Suspicious: Include columns are in the prefix. They are not sorted.

pao214 commented 3 months ago

Landed the fix on master, 2024.1 and 2.20.