ossc-db / pg_hint_plan

Extension adding support for optimizer hints in PostgreSQL
Other
696 stars 103 forks source link

parallel hint invalid when SQL has only one table #102

Closed cstarc closed 1 year ago

cstarc commented 2 years ago

pg13 pg_hint_plan 13.7

case1 parallel not work when only one table:

chuhx@postgres=# create table test(id int,val int);
CREATE TABLE
chuhx@postgres=# explain select * from test;
                       QUERY PLAN                       
--------------------------------------------------------
 Seq Scan on test  (cost=0.00..32.60 rows=2260 width=8)
(1 row)

chuhx@postgres=# explain select /*+parallel(test 5 hard)*/* from test;
                           QUERY PLAN                           
----------------------------------------------------------------
 Seq Scan on test @"lt#0"  (cost=0.00..32.60 rows=2260 width=8)
(1 row)

case 2 parallel worked :

explain select /*+parallel(test 5 hard)*/* from test,test x;
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..51108.60 rows=5107600 width=16)
   ->  Seq Scan on test x @"lt#0"  (cost=0.00..32.60 rows=2260 width=8)
   ->  Gather  (cost=0.00..0.00 rows=2260 width=8)
         Workers Planned: 5
         ->  Parallel Seq Scan on test @"lt#0"  (cost=0.00..0.00 rows=729 width=8)
(5 rows)

after debug, i find it is because in pg_hint_plan_set_rel_pathlist func ,following code is not executed because of bms_membership(root->all_baserels) == BMS_SINGLETON:

/* Generate gather paths */
if (rel->reloptkind == RELOPT_BASEREL &&
    bms_membership(root->all_baserels) != BMS_SINGLETON)
    generate_gather_paths(root, rel, false);

Generate gather paths is on apply_scanjoin_target_to_paths , here , the guc which is setted by parallel hint had alreadly been resetted,so parallel hint is not used when Generating gather paths.

#1  0x000000000086ce7d in create_gather_path (root=0x1b7bda0, rel=0x1b57ad0, 
    subpath=0x1b66758, target=0x1b668f8, required_outer=0x0, rows=0x0)
    at pathnode.c:1945
#2  0x00000000007fdea0 in generate_gather_paths (root=0x1b7bda0, rel=0x1b57ad0, 
    override_rows=false) at allpaths.c:2711
#3  0x00000000007fe17d in generate_useful_gather_paths (root=0x1b7bda0, 
    rel=0x1b57ad0, override_rows=false) at allpaths.c:2843
#4  0x0000000000843b8e in apply_scanjoin_target_to_paths (root=0x1b7bda0, 
    rel=0x1b57ad0, scanjoin_targets=0x1b669c8, scanjoin_targets_contain_srfs=0x0, 
    scanjoin_target_parallel_safe=true, tlist_same_exprs=true) at planner.c:7676
#5  0x0000000000839611 in grouping_planner (root=0x1b7bda0, 
michaelpq commented 1 year ago

Generate gather paths is on apply_scanjoin_target_to_paths , here , the guc which is setted by parallel hint had alreadly been resetted,so parallel hint is not used when Generating gather paths.

It seems to me that this comment points to the root of the issue. One basically has to set up the parallel GUCs before being able to use hints on a gather node for a single relation. The regression tests in ut-W actually stress this behavior, but there is no coverage for the case where the GUCs are not set. This can be used as a workaround, still that feels unnatural to me.

I honestly don't know what we can do here. Like upstream, pg_hint_plan enforces gather oaths to not be computed when we have an inheritance child, which sounds pretty much right. But the costs are enforced after we regenerate the new paths and a new set of partial_pathlist. Worth noting that it is hard to grasp the intention of 63570f9d7be2db5f503f6c928e2b0e74dcc9eb93, where there is no difference in the regression tests generated.

While looking at this issue, I have noted two holes in the current code: 1) We don't currently check for the case where we have one single relation with single settings and a parallel hint set, which is why this report went missing. Assuming that we'd want to do nothing here and rely on the GUCs to be reset to ensure that the hints are enforced, we'd better have coverage anyway. 2) The check to apply generate_gather_paths() look incorrect to me. First, it seems that we'd better use generate_useful_gather_paths(). Second, we should use bms_equal() to look for the case of inheritance children. See upstream commit d8e34fa7a18fab4aa8eb010edac133d63ecc11c6. In order to fix these two issues, I have sent a patch in #112.

Adding @horiguti and @rjuju for comments.

michaelpq commented 1 year ago

It seems to me that this comment points to the root of the issue. One basically has to set up the parallel GUCs before being able to use hints on a gather node for a single relation. The regression tests in ut-W actually stress this behavior, but there is no coverage for the case where the GUCs are not set. This can be used as a workaround, still that feels unnatural to me.

After thinking more on this one, the use case of being able to enforce hints when it comes to a relation does not seem that obvious to me when it comes to the parallel handling. It looks like we could rework a bit the costing model of upstream so as we don't need to rely on a reset of the GUCs, but it could impact existing use cases for little gain.

Anyway, I have applied the tests to track the default behavior for now so as we keep track of the existing behavior. I have also fixed the check for the top-most relation to be in line with upstream.