Regression with Parallel hints combined with removal of indexes in Index hints

michaelpq commented 8 months ago

A user has reported a regression in the latest release series regarding subject, related to commit 684986acc3b3156aaf22c1571cd44dd34059a346 (applies to all stable branches). Imagine two simple tables:

CREATE TABLE bookings (
    book_ref varchar NOT NULL,
    book_date timestamp with time zone NOT NULL,
    total_amount numeric NOT NULL 
); 
CREATE INDEX ON bookings(total_amount);
CREATE UNIQUE INDEX ON bookings(book_ref) INCLUDE (book_date);
ALTER TABLE bookings ADD CONSTRAINT bookings_pkey PRIMARY KEY
    USING INDEX bookings_book_ref_book_date_idx;
ANALYZE bookings;

CREATE TABLE bookings_tmp
    AS SELECT * FROM bookings ORDER BY book_ref;
ALTER TABLE bookings_tmp ADD PRIMARY KEY(book_ref);
ANALYZE bookings_tmp;

Now here is an exotic case:

/*+
    IndexScan(a bookings_total_amount_idx)
    IndexScan(b bookings_tmp_not_exist)
    Parallel(a 4 hard)
    Parallel(b 2 hard)
*/                                                                                                                                                                                                           
EXPLAIN(costs false) SELECT * FROM bookings a,bookings_tmp b
  WHERE a.book_ref = b.book_ref and a.total_amount > 28000.00;

Under 1.4.1, we are getting this plan:

                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Nested Loop
   Join Filter: (a.book_ref = b.book_ref)
   ->  Gather
         Workers Planned: 4
         ->  Parallel Index Scan using bookings_total_amount_idx on bookings a
               Index Cond: (total_amount > 28000.00)
   ->  Gather
         Workers Planned: 2
         ->  Parallel Seq Scan on bookings_tmp b
(9 rows)

I think that this is not correct, for two reasons:

IndexScan(b bookings_tmp_not_exist) is listed as used, but it does not make sense because it lists an index that does not exist.
Why is the sequential scan enforced under a gather? Shouldn't this use an Index Scan under the gather based on the pkey of bookings_tmp? I get that both behavior are debatable, but this looks strange for me as historical behavior.

Now, under 1.4.2, we get this plan:

                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Nested Loop
   Join Filter: (a.book_ref = b.book_ref)
   ->  Gather
         Workers Planned: 4
         ->  Parallel Index Scan using bookings_total_amount_idx on bookings a
               Index Cond: (total_amount > 28000.00)
   ->  Seq Scan on bookings_tmp b
(7 rows)

This is better in the fact that IndexScan(b bookings_tmp_not_exist) is marked as unused. Still, this is worse in the fact that the Parallel hint with the number of workers gets entirely ignored, leading to a regression.

I've been considering a couple of options about what to tweak in the case where an index restriction is applied, but my mind comes down to a few conclusions:

For now, I would like to choose a revert of 684986acc3b3156aaf22c1571cd44dd34059a346 & friends on stable branches, getting back to the previous-still-somewhat-broken behavior.
Add a few more regression tests based on what I saw reported back to me here. I'm OK to send a patch about that.
@samimseih has mentioned me a potential solution to enforce only a subset of the ENABLE masks for the GUCs controlling if some scans should be used or not, rather than bypass the whole. This slightly impacts the existing plans, so perhaps a revert on HEAD of the original patch makes sense anyway. Or folks here are OK to let the new behavior on HEAD? I mean, it correctly discards indexes, so removing what looks like an improvement in hint detection would be sad, and it has the merit of letting the change brew longer with the release of this project for 17~.

samimseih commented 8 months ago

Thanks for the report @michaelpq

After we spoke offline, I spent time looking further into this. What I observed is the follows:

skipping over setup_scan_method_enforcement https://github.com/ossc-db/pg_hint_plan/blob/master/pg_hint_plan.c#L3955-L3956 does not prevent the setting up of parallel enforcement https://github.com/ossc-db/pg_hint_plan/blob/master/pg_hint_plan.c#L3986 I confirmed with some additional debugging statements.
Also, Testing the repro case on 1.4.1, I discovered some very inconsistent/odd parallel enforcement behavior that 684986a exposes in 1.4.2, but I do not think is the cause of. See the cases below.

postgres=# \dx
                    List of installed extensions
     Name     | Version |   Schema   |         Description          
--------------+---------+------------+------------------------------
 pg_hint_plan | 1.4.1   | hint_plan  | 
 plpgsql      | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_index_scan_size to 0;
set min_parallel_table_scan_size=0;                                                                               
set max_parallel_workers_per_gather To 4;

I have a repro of the test case mentioned in the thread.

postgres=# /*+                            
    IndexScan(a bookings_total_amount_idx)
    IndexScan(b bookings_tmp_not_exist)
    Parallel(a 4 hard)
    Parallel(b 2 hard)
*/                                                                                                                                                                                                           
EXPLAIN(costs false) SELECT * FROM bookings a,bookings_tmp b
  WHERE a.book_ref = b.book_ref and a.total_amount > 28000.00;
                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Nested Loop
   Join Filter: ((a.book_ref)::text = (b.book_ref)::text)
   ->  Gather
         Workers Planned: 4
         ->  Parallel Index Scan using bookings_total_amount_idx on bookings a
               Index Cond: (total_amount > 28000.00)
   ->  Gather
         Workers Planned: 2
         ->  Parallel Seq Scan on bookings_tmp b
(9 rows)

Here is where things get interesting. If only the Parallel hints are supplied, no parallel hint is enforced.

postgres=# /*+                            
    Parallel(a 4 hard)
    Parallel(b 2 hard)
*/

EXPLAIN(costs false) SELECT * FROM bookings a,bookings_tmp b
  WHERE a.book_ref = b.book_ref and a.total_amount > 28000.00;

                        QUERY PLAN                        
----------------------------------------------------------
 Nested Loop
   Join Filter: ((a.book_ref)::text = (b.book_ref)::text)
   ->  Seq Scan on bookings a
         Filter: (total_amount > 28000.00)
   ->  Seq Scan on bookings_tmp b
(5 rows)

I can force the parallel plan by keeping only index scans enabled. But, notice it's not a Parallel Seq scan on "b", but rather a Parallel Index Scan.

set enable_mergejoin = off;
set enable_seqscan = off;
set enable_hashjoin = off;
SET
SET
SET
postgres=# /*+                            
    Parallel(a 6 hard)
    Parallel(b 2 hard)
*/                                                                                                                                                                                                           
EXPLAIN(costs false) SELECT * FROM bookings a,bookings_tmp b
  WHERE a.book_ref = b.book_ref and a.total_amount > 28000.00;
                                QUERY PLAN                                 
---------------------------------------------------------------------------
 Nested Loop
   Join Filter: ((a.book_ref)::text = (b.book_ref)::text)
   ->  Gather
         Workers Planned: 6
         ->  Parallel Index Scan using bookings_pkey on bookings a
               Filter: (total_amount > 28000.00)
   ->  Gather
         Workers Planned: 2
         ->  Parallel Index Scan using bookings_tmp_pkey on bookings_tmp b
(9 rows)

It is clear that parallel enforcement, including the "hard" option, is not showing the expected results even in 1.4.1

thoughts?

Regards,

Sami

samimseih commented 8 months ago

For now, I would like to choose a revert of https://github.com/ossc-db/pg_hint_plan/commit/684986acc3b3156aaf22c1571cd44dd34059a346 & friends on stable branches, getting back to the previous-still-somewhat-broken behavior.

With the above said, I do agree to revert the patch on stable branches, since the existing behavior, while not great, is what the users expect.

michaelpq commented 8 months ago

It is clear that parallel enforcement, including the "hard" option, is not showing the expected results even in 1.4.1.

Yeah. The fact that we would completely discard the Parallel hints when mixing more than one table sounds strange to me.

At this stage, I think that we'd better document all that in the shape of tests where one could look at a regression.diffs to see the difference generated when tweaking the code, or development is not going to scale well. I'm OK if we add comments like "strangely, this ignores both Parallel hints, perhaps these should be pushed under two Gather nodes". Or stuff like that.

I'll go put the stable branches in their previous stable with a revert, as I doubt that we've appreciated all the plan manipulations possible yet. Changing that in a major release is less or an issue to me. At this stage I'd rather prioritize plan stability to not impact users across these minor updated.

samimseih commented 8 months ago

I agree. I'll submit a patch for Parallel Hint tests ( including those mentioned above ).

samimseih commented 8 months ago

Attached are tests that cover several of the gaps discovered in this discussion:

1/ Ensure that Parallel enforcement does not break when a non-existent index is used in an IndexScan hint. 2/ Demonstrate that Parallel hints cannot be enforced on empty tables.

0001-v1-Add-additional-parallel-hint-regression-tests.patch

michaelpq commented 8 months ago

Thanks for the patch.

The patch you are proposing for the additional tests has no explanation about what's being done, like the interactions between the non-existing indexes and the hints. It may be good to document now with some big XXX that some of these plans are weird because of YYY.

I am not sure that it is a good idea to use and empty the existing tables t5 and t6 for these new tests. Perhaps splitting that into a new file would make more sense?

samimseih commented 8 months ago

The patch you are proposing for the additional tests has no explanation about what's being done,

I hesitated doing that as it's not the norm to add much documentation to the tests, but that is not a good reason not to. I will add a better description.

I am not sure that it is a good idea to use and empty the existing tables t5 and t6 for these new tests. Perhaps splitting that into a new file would make more sense?

I am creating new tables for these tests in the sql/ut-init.sql file. I did not think it was worth adding a new init and test files for these few tests.

+CREATE TABLE s1.t5 (LIKE s1.t1 INCLUDING ALL);
+CREATE TABLE s1.t6 (LIKE s1.t1 INCLUDING ALL);

samimseih commented 8 months ago

After second thought, showing that the non-existing index does not prevent parallel hint enforcement is not necessary. So, the test now shows the important part. For empty tables, parallel hints can only be enforced on index scans and not sequential scans.

Attached is the revised test.

Regards,

Sami 0001-Add-Parallel-hint-tests-for-empty-tables.patch

michaelpq commented 1 day ago

a3646e1b073c still exists on HEAD and now PG17. Now is the time to do a re-evaluation of what has been committed, knowing the reasons why the patch had to be backed out of past stable branches.

samimseih commented 22 hours ago

Looking at the discussion above, this patch exposed pre-existing odd behavior with parallel enforcement. Especially, if there are no rows in a table, parallel sequential scans cannot be enforced. This scenario is covered by the suggested test case [1]. Do you think there is value in these test?

In the case mentioned at the top of the thread, 1.4.1 is able to perform a parallel sequential scan on an empty table by "accident". Because an IndexScan hint is used on bookings_tmp_not_exist, the sequential scan is disabled by pg_hint_plan ( enable_seqscan = off ). However, because bookings_tmp_not_exist is not an existing index, this leaves Postgres with only the sequential scan access method. The use of parallel with the sequential scan is surprising. I was not able to reproduce the same plan by forcing parallelism.

set max_parallel_workers_per_gather To 4;  
set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_index_scan_size to 0;
set min_parallel_table_scan_size=0;                                                                             

EXPLAIN SELECT * FROM bookings a,bookings_tmp b
  WHERE a.book_ref = b.book_ref and a.total_amount > 28000.00;

I think this patch is a good improvement and I am not sure if the pre-existing behavior is the behavior that we want to maintain for the next major version release.

[1] https://github.com/ossc-db/pg_hint_plan/files/13387722/0001-Add-Parallel-hint-tests-for-empty-tables.patch

ossc-db / pg_hint_plan

Regression with Parallel hints combined with removal of indexes in Index hints #164