Closed lucifer12346 closed 2 days ago
you should use leading hint, Join method hints cannot force join order
However, using leading hint as follow does not correct the join order.
/*+
HashJoin(mc ct)
HashJoin(mi_idx it)
HashJoin(mi_idx it mc ct)
NestLoop(mi_idx it mc ct t)
IndexScan(t)
SeqScan(mi_idx)
SeqScan(it)
SeqScan(mc)
SeqScan(ct)*/
explain SELECT MIN(mc.note) AS production_note, MIN(t.title) AS movie_title, MIN(t.production_year) AS movie_year FROM company_type AS ct, info_type AS it, movie_companies AS mc, movie_info_idx AS mi_idx, title AS t WHERE ct.kind = 'production companies' AND it.info = 'top 250 rank' AND mc.note NOT LIKE '%(as Warner Bros. Pictures)%' AND (mc.note LIKE '%(co-production)%') AND t.production_year > 2015 AND ct.id = mc.company_type_id AND t.id = mc.movie_id AND t.id = mi_idx.movie_id AND mc.movie_id = mi_idx.movie_id AND it.id = mi_idx.info_type_id;
Here's the output:
Aggregate (cost=51977.95..51977.96 rows=1 width=68)
-> Gather (cost=36133.94..51977.94 rows=1 width=45)
Workers Planned: 2
-> Nested Loop (cost=35133.94..50977.84 rows=1 width=45)
Join Filter: (mc.movie_id = t.id)
-> Hash Join (cost=35133.51..50976.65 rows=2 width=32)
Hash Cond: (mi_idx.info_type_id = it.id)
-> Parallel Hash Join (cost=35131.08..50973.41 rows=212 width=36)
Hash Cond: (mi_idx.movie_id = mc.movie_id)
-> Parallel Seq Scan on movie_info_idx mi_idx (cost=0.00..13685.15 rows=575015 width=8)
-> Parallel Hash (cost=35130.18..35130.18 rows=72 width=28)
-> Hash Join (cost=18.93..35130.18 rows=72 width=28)
Hash Cond: (mc.company_type_id = ct.id)
-> Parallel Seq Scan on movie_companies mc (cost=0.00..35097.06 rows=3595 width=32)
Filter: (((note)::text !~~ '%(as Warner Bros. Pictures)%'::text) AND ((note)::text ~~ '%(co-production)%'::text))
-> Hash (cost=18.88..18.88 rows=4 width=4)
-> Seq Scan on company_type ct (cost=0.00..18.88 rows=4 width=4)
Filter: ((kind)::text = 'production companies'::text)
-> Hash (cost=2.41..2.41 rows=1 width=4)
-> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=4)
Filter: ((info)::text = 'top 250 rank'::text)
-> Index Scan using title_pkey on title t (cost=0.43..0.58 rows=1 width=25)
Index Cond: (id = mi_idx.movie_id)
Filter: (production_year > 2015)
which still works differently from the expected output.
However, using leading hint as follow does not correct the join order.
/*+ HashJoin(mc ct) HashJoin(mi_idx it) HashJoin(mi_idx it mc ct) NestLoop(mi_idx it mc ct t) IndexScan(t) SeqScan(mi_idx) SeqScan(it) SeqScan(mc) SeqScan(ct)*/ explain SELECT MIN(mc.note) AS production_note, MIN(t.title) AS movie_title, MIN(t.production_year) AS movie_year FROM company_type AS ct, info_type AS it, movie_companies AS mc, movie_info_idx AS mi_idx, title AS t WHERE ct.kind = 'production companies' AND it.info = 'top 250 rank' AND mc.note NOT LIKE '%(as Warner Bros. Pictures)%' AND (mc.note LIKE '%(co-production)%') AND t.production_year > 2015 AND ct.id = mc.company_type_id AND t.id = mc.movie_id AND t.id = mi_idx.movie_id AND mc.movie_id = mi_idx.movie_id AND it.id = mi_idx.info_type_id;
Here's the output:
Aggregate (cost=51977.95..51977.96 rows=1 width=68) -> Gather (cost=36133.94..51977.94 rows=1 width=45) Workers Planned: 2 -> Nested Loop (cost=35133.94..50977.84 rows=1 width=45) Join Filter: (mc.movie_id = t.id) -> Hash Join (cost=35133.51..50976.65 rows=2 width=32) Hash Cond: (mi_idx.info_type_id = it.id) -> Parallel Hash Join (cost=35131.08..50973.41 rows=212 width=36) Hash Cond: (mi_idx.movie_id = mc.movie_id) -> Parallel Seq Scan on movie_info_idx mi_idx (cost=0.00..13685.15 rows=575015 width=8) -> Parallel Hash (cost=35130.18..35130.18 rows=72 width=28) -> Hash Join (cost=18.93..35130.18 rows=72 width=28) Hash Cond: (mc.company_type_id = ct.id) -> Parallel Seq Scan on movie_companies mc (cost=0.00..35097.06 rows=3595 width=32) Filter: (((note)::text !~~ '%(as Warner Bros. Pictures)%'::text) AND ((note)::text ~~ '%(co-production)%'::text)) -> Hash (cost=18.88..18.88 rows=4 width=4) -> Seq Scan on company_type ct (cost=0.00..18.88 rows=4 width=4) Filter: ((kind)::text = 'production companies'::text) -> Hash (cost=2.41..2.41 rows=1 width=4) -> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=4) Filter: ((info)::text = 'top 250 rank'::text) -> Index Scan using title_pkey on title t (cost=0.43..0.58 rows=1 width=25) Index Cond: (id = mi_idx.movie_id) Filter: (production_year > 2015)
which still works differently from the expected output.
Where is the leading hint?
However, using leading hint as follow does not correct the join order.
/*+ HashJoin(mc ct) HashJoin(mi_idx it) HashJoin(mi_idx it mc ct) NestLoop(mi_idx it mc ct t) IndexScan(t) SeqScan(mi_idx) SeqScan(it) SeqScan(mc) SeqScan(ct)*/ explain SELECT MIN(mc.note) AS production_note, MIN(t.title) AS movie_title, MIN(t.production_year) AS movie_year FROM company_type AS ct, info_type AS it, movie_companies AS mc, movie_info_idx AS mi_idx, title AS t WHERE ct.kind = 'production companies' AND it.info = 'top 250 rank' AND mc.note NOT LIKE '%(as Warner Bros. Pictures)%' AND (mc.note LIKE '%(co-production)%') AND t.production_year > 2015 AND ct.id = mc.company_type_id AND t.id = mc.movie_id AND t.id = mi_idx.movie_id AND mc.movie_id = mi_idx.movie_id AND it.id = mi_idx.info_type_id;
Here's the output:
Aggregate (cost=51977.95..51977.96 rows=1 width=68) -> Gather (cost=36133.94..51977.94 rows=1 width=45) Workers Planned: 2 -> Nested Loop (cost=35133.94..50977.84 rows=1 width=45) Join Filter: (mc.movie_id = t.id) -> Hash Join (cost=35133.51..50976.65 rows=2 width=32) Hash Cond: (mi_idx.info_type_id = it.id) -> Parallel Hash Join (cost=35131.08..50973.41 rows=212 width=36) Hash Cond: (mi_idx.movie_id = mc.movie_id) -> Parallel Seq Scan on movie_info_idx mi_idx (cost=0.00..13685.15 rows=575015 width=8) -> Parallel Hash (cost=35130.18..35130.18 rows=72 width=28) -> Hash Join (cost=18.93..35130.18 rows=72 width=28) Hash Cond: (mc.company_type_id = ct.id) -> Parallel Seq Scan on movie_companies mc (cost=0.00..35097.06 rows=3595 width=32) Filter: (((note)::text !~~ '%(as Warner Bros. Pictures)%'::text) AND ((note)::text ~~ '%(co-production)%'::text)) -> Hash (cost=18.88..18.88 rows=4 width=4) -> Seq Scan on company_type ct (cost=0.00..18.88 rows=4 width=4) Filter: ((kind)::text = 'production companies'::text) -> Hash (cost=2.41..2.41 rows=1 width=4) -> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=4) Filter: ((info)::text = 'top 250 rank'::text) -> Index Scan using title_pkey on title t (cost=0.43..0.58 rows=1 width=25) Index Cond: (id = mi_idx.movie_id) Filter: (production_year > 2015)
which still works differently from the expected output.
Where is the leading hint?
sorry for my omission and here's my SQL but it also has the wrong output:
/*+
Leading((((mi_idx it)(mc ct))t))
HashJoin(mc ct)
HashJoin(mi_idx it)
HashJoin(mi_idx it mc ct)
NestLoop(mi_idx it mc ct t)
IndexScan(t)
SeqScan(mi_idx)
SeqScan(it)
SeqScan(mc)
SeqScan(ct)*/
explain SELECT MIN(mc.note) AS production_note, MIN(t.title) AS movie_title, MIN(t.production_year) AS movie_year FROM company_type AS ct, info_type AS it, movie_companies AS mc, movie_info_idx AS mi_idx, title AS t WHERE ct.kind = 'production companies' AND it.info = 'top 250 rank' AND mc.note NOT LIKE '%(as Warner Bros. Pictures)%' AND (mc.note LIKE '%(co-production)%') AND t.production_year > 2015 AND ct.id = mc.company_type_id AND t.id = mc.movie_id AND t.id = mi_idx.movie_id AND mc.movie_id = mi_idx.movie_id AND it.id = mi_idx.info_type_id;
Hi, I tested your SQL with the hint in my PostgreSQL 16.3 version. It works correctly! Original results without hint:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=47780.29..47780.30 rows=1 width=68)
-> Nested Loop (cost=32476.75..47780.28 rows=1 width=45)
Join Filter: (t.id = mc.movie_id)
-> Gather (cost=32476.32..47779.69 rows=1 width=32)
Workers Planned: 2
-> Parallel Hash Join (cost=31476.32..46779.59 rows=1 width=32)
Hash Cond: (mi_idx.movie_id = mc.movie_id)
-> Hash Join (cost=2.43..15286.60 rows=5089 width=4)
Hash Cond: (mi_idx.info_type_id = it.id)
-> Parallel Seq Scan on movie_info_idx mi_idx (cost=0.00..13718.15 rows=575015 width=8)
-> Hash (cost=2.41..2.41 rows=1 width=4)
-> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=4)
Filter: ((info)::text = 'top 250 rank'::text)
-> Parallel Hash (cost=31473.73..31473.73 rows=13 width=28)
-> Hash Join (cost=18.93..31473.73 rows=13 width=28)
Hash Cond: (mc.company_type_id = ct.id)
-> Parallel Seq Scan on movie_companies mc (cost=0.00..31448.82 rows=2269 width=32)
Filter: ((note !~~ '%(as Warner Bros. Pictures)%'::text) AND (note ~~ '%(co-production)%'::text))
-> Hash (cost=18.88..18.88 rows=4 width=4)
-> Seq Scan on company_type ct (cost=0.00..18.88 rows=4 width=4)
Filter: ((kind)::text = 'production companies'::text)
-> Index Scan using title_pkey on title t (cost=0.43..0.58 rows=1 width=25)
Index Cond: (id = mi_idx.movie_id)
Filter: (production_year > 2015)
Results with the hint:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=48424.83..48424.84 rows=1 width=68)
-> Gather (cost=32480.12..48424.82 rows=1 width=45)
Workers Planned: 2
-> Nested Loop (cost=31480.12..47424.72 rows=1 width=45)
Join Filter: (t.id = mc.movie_id)
-> Parallel Hash Join (cost=31479.69..47424.12 rows=1 width=32)
Hash Cond: (mi_idx.movie_id = mc.movie_id)
-> Hash Join (cost=2.43..15927.77 rows=5089 width=4)
Hash Cond: (mi_idx.info_type_id = it.id)
-> Parallel Seq Scan on movie_info_idx mi_idx (cost=0.00..13718.15 rows=575015 width=8)
-> Hash (cost=2.41..2.41 rows=1 width=4)
-> Seq Scan on info_type it (cost=0.00..2.41 rows=1 width=4)
Filter: ((info)::text = 'top 250 rank'::text)
-> Parallel Hash (cost=31476.70..31476.70 rows=45 width=28)
-> Hash Join (cost=18.93..31476.70 rows=45 width=28)
Hash Cond: (mc.company_type_id = ct.id)
-> Parallel Seq Scan on movie_companies mc (cost=0.00..31448.82 rows=2269 width=32)
Filter: ((note !~~ '%(as Warner Bros. Pictures)%'::text) AND (note ~~ '%(co-production)%'::text))
-> Hash (cost=18.88..18.88 rows=4 width=4)
-> Seq Scan on company_type ct (cost=0.00..18.88 rows=4 width=4)
Filter: ((kind)::text = 'production companies'::text)
-> Index Scan using title_pkey on title t (cost=0.43..0.58 rows=1 width=25)
Index Cond: (id = mi_idx.movie_id)
Filter: (production_year > 2015)
Hmm. Based on the latest update, I am not seeing a bug here, so let's close this except if there is an extra problem.
By the way, I would recommend to send a self-contained test case in the future, as well. That would reduce my investigation time quite a lot so as I don't have to guess the schema you are using from the query. Thanks for doing that.
While using Join and Scan hints in PG v12.16 and pg_hint_plan 1.3.6 , I encountered the problem as follow. Here is a SQl using hints and I expect it join 'mc' and 'ct' first, then 'mi_idx' and 'it' seperately.
However, pg_hint_plan claims that all hints are used but the returned QEP does not follow the join order specified above.
LOG: pg_hint_plan[qno=0x3]: HintStateDump: {used hints:SeqScan(ct)SeqScan(it)SeqScan(mc)SeqScan(mi_idx)IndexScan(t)HashJoin(ct mc)HashJoin(it mi_idx)HashJoin(it mc mi_idx)HashJoin(ct it mc mi_idx)NestLoop(ct it mc mi_idx t)}, {not used hints:(none)}, {duplicate hints:(none)}, {error hints:(none)}
It turned out to be:
which joined table 'mc' with 'mi_idx' and 'it' firstly and then joined 'ct' with three tables secondly and that was not what I have expected: