ossc-db / pg_hint_plan

Extension adding support for optimizer hints in PostgreSQL
Other
703 stars 103 forks source link

Join order incorrect when using join method and leading hints #58

Closed gmittal closed 1 year ago

gmittal commented 3 years ago

I have turned geqo off and increased from_collapse_limit and join_collapse_limit to appropriate values. I am attempting to plan the following:

/*+                                                                                                                                                             
  HashJoin(ct t chn cn ci mc rt)                                                                                                                                
  HashJoin(t chn cn ci mc rt)                                                                                                                                   
  HashJoin(chn cn ci mc rt)                                                                                                                                     
  HashJoin(cn ci mc rt)                                                                                                                                         
  HashJoin(ci mc rt)                                                                                                                                            
  HashJoin(ci mc)                                                                                                                                                                                                                                                                                          
  Leading((ct (t (chn (cn ((ci mc) rt))))))                                                                                                                     
 */ explain SELECT MIN(chn.name) AS character,                                                                                                     
       MIN(t.title) AS movie_with_american_producer                                                                                                             
FROM char_name AS chn,                                                                                                                                          
     cast_info AS ci,                                                                                                                                           
     company_name AS cn,                                                                                                                                        
     company_type AS ct,                                                                                                                                        
     movie_companies AS mc,                                                                                                                                     
     role_type AS rt,                                                                                                                                           
     title AS t                                                                                                                                                 
WHERE ci.note LIKE '%(producer)%'                                                                                                                               
  AND cn.country_code = '[us]'                                                                                                                                  
  AND t.production_year > 1990                                                                                                                                  
  AND t.id = mc.movie_id                                                                                                                                        
  AND t.id = ci.movie_id                                                                                                                                        
  AND ci.movie_id = mc.movie_id                                                                                                                                 
  AND chn.id = ci.person_role_id                                                                                                                                
  AND rt.id = ci.role_id                                                                                                                                        
  AND cn.id = mc.company_id                                                                                                                                     
  AND ct.id = mc.company_type_id;

However, pg_hint_plan claims that all hints are used but the returned QEP does not follow the join order specified above.

2020-10-18 07:37:20.852 UTC [12379] LOG:  pg_hint_plan[qno=0xb]: HintStateDump: {used hints:HashJoin(ci mc)HashJoin(ci mc rt)HashJoin(ci cn mc rt)HashJoin(chn ci cn mc rt)HashJoin(chn ci cn mc rt t)HashJoin(chn ci cn ct mc rt t)Leading((ct (t (chn (cn ((ci mc) rt))))))}, {not used hints:(none)}, {duplicate hints:(none)}, {error hints:(none)}

Resulting execution plan (which very clearly does not follow the join order specified by Leading):

 Aggregate  (cost=10078338320.71..10078338321.21 rows=1 width=64)
   ->  Hash Join  (cost=10077494801.71..10078097989.21 rows=961326 width=33)
         Hash Cond: (ct.id = mc.company_type_id)
         ->  Seq Scan on company_type ct  (cost=0.00..3.00 rows=4 width=4)
         ->  Hash  (cost=10076893972.96..10076893972.96 rows=961326 width=37)
               ->  Gather  (cost=10071476776.84..10076893972.96 rows=961326 width=37)
                     Workers Planned: 4
                     ->  Parallel Hash Join  (cost=10071475776.84..10072086342.96 rows=240332 width=37)
                           Hash Cond: (t.id = mc.movie_id)
                           ->  Parallel Seq Scan on title t  (cost=0.00..431062.38 rows=435273 width=21)
                                 Filter: (production_year > 1990)
                           ->  Parallel Hash  (cost=10070962528.09..10070962528.09 rows=821198 width=28)
                                 ->  Nested Loop  (cost=10005677650.40..10070962528.09 rows=821198 width=28)
                                       ->  Parallel Hash Join  (cost=5677628.90..52478321.67 rows=821198 width=16)
                                             Hash Cond: (cn.id = mc.company_id)
                                             ->  Parallel Seq Scan on company_name cn  (cost=0.00..89390.96 rows=49663 width=4)
                                                   Filter: ((country_code)::text = '[us]'::text)
                                             ->  Parallel Hash  (cost=5272860.78..5272860.78 rows=647629 width=20)
                                                   ->  Hash Join  (cost=965666.13..5272860.78 rows=647629 width=20)
                                                         Hash Cond: (ci.role_id = rt.id)
                                                         ->  Parallel Hash Join  (cost=965651.63..5157824.72 rows=647629 width=24)
                                                               Hash Cond: (ci.movie_id = mc.movie_id)
                                                               ->  Parallel Seq Scan on cast_info ci  (cost=0.00..4028332.83 rows=237535 width=12)
                                                                     Filter: (note ~~ '%(producer)%'::text)
                                                               ->  Parallel Hash  (cost=439617.26..439617.26 rows=841655 width=12)
                                                                     ->  Parallel Seq Scan on movie_companies mc  (cost=0.00..439617.26 rows=841655 width=12
)
                                                         ->  Hash  (cost=7.00..7.00 rows=12 width=4)
                                                               ->  Seq Scan on role_type rt  (cost=0.00..7.00 rows=12 width=4)
                                       ->  Index Scan using char_name_pkey on char_name chn  (cost=21.50..22.51 rows=1 width=20)
                                             Index Cond: (id = ci.person_role_id)
(30 rows)
kasaharatt commented 3 years ago

Hi. Are you pointing out that the join between the chn(char_name) table and the other tables is not a Hash Join as a problem ? If so, please try following workarounds.

If you use PostgreSQL v13, please use a latest pg_hint_plan (HEAD). Latest commit (https://github.com/ossc-db/pg_hint_plan/commit/e891d547ad3f37f4ad971a28834a752564765d18) may fix your problem.

If you use PostgreSQL v12 or earlier, try setting the work_mem parameter to a larger value than it is now, or add SeqScan(chn) to the HINT clause.

Regards,

LumingSun commented 2 years ago

I encountered a similar problem:

job=# /*+  Leading(t mi mi_idx) */ explain select count(*) from title t,  movie_info mi,  movie_info_idx mi_idx where t.id=mi.movie_id and t.id=mi_idx.movie_id and mi.info_type_id > 16 and mi_idx.info_type_id = 100;

but get

 Aggregate  (cost=479389.64..479389.65 rows=1 width=8)
   ->  Hash Join  (cost=118376.83..478371.27 rows=407350 width=0)
         Hash Cond: (mi.movie_id = t.id)
         ->  Seq Scan on movie_info mi  (cost=0.00..347529.39 rows=2237746 width=4)
               Filter: (info_type_id > 16)
         ->  Hash  (cost=112623.81..112623.81 rows=460242 width=8)
               ->  Hash Join  (cost=30938.46..112623.81 rows=460242 width=8)
                     Hash Cond: (t.id = mi_idx.movie_id)
                     ->  Seq Scan on title t  (cost=0.00..61281.03 rows=2528303 width=4)
                     ->  Hash  (cost=25185.44..25185.44 rows=460242 width=4)
                           ->  Seq Scan on movie_info_idx mi_idx  (cost=0.00..25185.44 rows=460242 width=4)
                                 Filter: (info_type_id = 100)

I'm using PG v12 and pg_hint_plan 1.3.7 and the work_mem is 16GB.

michaelpq commented 1 year ago

This does not seem to be relevant anymore, hence closing for now.