ossc-db / pg_hint_plan

Extension adding support for optimizer hints in PostgreSQL
Other
674 stars 101 forks source link

Hints do not work on subqueries with DISTINCT #187

Open DontBreakAlex opened 2 months ago

DontBreakAlex commented 2 months ago

I could not get some hint to work a bigger query and narrowed the issue to this pattern:

CREATE TABLE t1 (id int PRIMARY KEY, val int);
CREATE TABLE t2 (id int PRIMARY KEY, val int);
EXPLAIN
/*+
    MergeJoin(t1 t2)
    Rows(t1 t2 #1234)
 */
SELECT *
FROM (SELECT DISTINCT ON (id) *
      FROM t1) S
            INNER JOIN t2 ON S.id = t2.id;

QUERY PLAN
Hash Join  (cost=61.01..150.50 rows=2260 width=16)
  Hash Cond: (t1.id = t2.id)
  ->  Unique  (cost=0.15..83.71 rows=2260 width=8)
        ->  Index Scan using t1_pkey on t1  (cost=0.15..78.06 rows=2260 width=8)
  ->  Hash  (cost=32.60..32.60 rows=2260 width=8)
        ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)

Without the DISTINCT ON, the hints work fine:

EXPLAIN
/*+
    MergeJoin(t1 t2)
    Rows(t1 t2 #1234)
 */
SELECT *
FROM (SELECT *
      FROM t1) S
            INNER JOIN t2 ON S.id = t2.id;

QUERY PLAN
Merge Join  (cost=0.31..190.01 rows=1234 width=16)
  Merge Cond: (t1.id = t2.id)
  ->  Index Scan using t1_pkey on t1  (cost=0.15..78.06 rows=2260 width=8)
  ->  Index Scan using t2_pkey on t2  (cost=0.15..78.06 rows=2260 width=8)

Using ANY_subquery does not work

EXPLAIN
/*+
    MergeJoin(ANY_subquery t2)
    Rows(ANY_subquery t2 #1234)
 */
SELECT *
FROM (SELECT DISTINCT ON (id) *
      FROM t1) S
            INNER JOIN t2 ON S.id = t2.id;

QUERY PLAN
Hash Join  (cost=61.01..150.50 rows=2260 width=16)
  Hash Cond: (t1.id = t2.id)
  ->  Unique  (cost=0.15..83.71 rows=2260 width=8)
        ->  Index Scan using t1_pkey on t1  (cost=0.15..78.06 rows=2260 width=8)
  ->  Hash  (cost=32.60..32.60 rows=2260 width=8)
        ->  Seq Scan on t2  (cost=0.00..32.60 rows=2260 width=8)

Merge joins are possible, if I set ENABLE_HASHJOIN to OFF, I get:

QUERY PLAN
Merge Join  (cost=0.31..195.66 rows=2260 width=16)
  Merge Cond: (t1.id = t2.id)
  ->  Unique  (cost=0.15..83.71 rows=2260 width=8)
        ->  Index Scan using t1_pkey on t1  (cost=0.15..78.06 rows=2260 width=8)
  ->  Index Scan using t2_pkey on t2  (cost=0.15..78.06 rows=2260 width=8)

I may be missing something, but this looks like a bug. I'm using PostgreSQL 16.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12), 64-bit on AWS RDS.