ossc-db / pg_hint_plan

Extension adding support for optimizer hints in PostgreSQL
Other
715 stars 103 forks source link

Supplying IndexScan hint with unavailable index gives unexpected results. #136

Closed samimseih closed 1 year ago

samimseih commented 1 year ago

If an index is dropped or renamed and the hint is not updated, this can cause some unexpected behavior and performance regression until the hint is corrected.

postgres=# create table test ( id int); 
postgres=# insert into test select n from generate_series(1, 50) as n; 
postgres=# create unique index test_idx1 on test(id);
CREATE TABLE
INSERT 0 50
CREATE INDEX
postgres=# explain /*+ IndexScan(t test_idx1) */ select * from test AS t where id = 1;
                               QUERY PLAN
------------------------------------------------------------------------
 Index Scan using test_idx1 on test t  (cost=0.14..8.16 rows=1 width=4)
   Index Cond: (id = 1)
(2 rows)

postgres=# alter index test_idx1 rename to test_idx1_renamed;
ALTER INDEX
postgres=# explain /*+ IndexScan(t test_idx1) */ select * from test AS t where id = 1;
                                QUERY PLAN
--------------------------------------------------------------------------
 Seq Scan on test t  (cost=10000000000.00..10000000001.62 rows=1 width=4)
   Filter: (id = 1)
(2 rows)

The reason for this is becuase restrict_indexes compares for each rel index if it's in a hint, and if it's not remove it immediately from consideration. So, in cases where all the indexes in the hint are no longer viable because they've been renamed, then we end up unexpected results.

I think it's better to have a safeguard in which if we are about to delete all the indexes, then don't delete any indexes at all.

Here is an idea for a patch below. Instead of deleting the idnexes immediately, store the index OIDs of the candidate indexes to delete. If at the end we observe we are about to delete all the indexes, don't do anything.

diff --git a/pg_hint_plan.c b/pg_hint_plan.c
index 5b8f80a..13c98e3 100644
--- a/pg_hint_plan.c
+++ b/pg_hint_plan.c
@@ -3429,6 +3429,8 @@ restrict_indexes(PlannerInfo *root, ScanMethodHint *hint, RelOptInfo *rel,
        StringInfoData  buf;
        RangeTblEntry  *rte = root->simple_rte_array[rel->relid];
        Oid                             relationObjectId = rte->relid;
+       List           *final_delete = NIL;
+       bool           do_delete = false;

        /*
         * We delete all the IndexOptInfo list and prevent you from being usable by
@@ -3646,11 +3648,37 @@ restrict_indexes(PlannerInfo *root, ScanMethodHint *hint, RelOptInfo *rel,
                }

                if (!use_index)
-                       rel->indexlist = foreach_delete_current(rel->indexlist, cell);
+                       final_delete = lappend_oid(final_delete, info->indexoid);

                pfree(indexname);
        }

+       /*
+        * If all the indexes are marked for deletion, this means our list
+        * of hints in the indexes do not match any valid indexes. In this
+        * case, do not delete any indexes.
+        */
+       do_delete = list_length(final_delete) < list_length(rel->indexlist);
+
+       if (do_delete)
+       {
+               foreach (cell, final_delete)
+               {
+                       Oid final_oid = lfirst_oid(cell);
+                       ListCell           *l;
+
+                       foreach (l, rel->indexlist)
+                       {
+                               IndexOptInfo   *info = (IndexOptInfo *) lfirst(l);
+
+                               if (info->indexoid == final_oid)
+                                       rel->indexlist = foreach_delete_current(rel->indexlist, l);
+                       }
+               }
+       }
+
+       list_free(final_delete);
+
        if (debug_level > 0)
        {
                StringInfoData  rel_buf;

Regards,

Sami Imseih Amazon Web Services (AWS)

michaelpq commented 1 year ago

Thanks for the report. Hm, I see. Indeed, once the index is renamed, it feels more natural to fall back to the index renamed if the IndexScan hint does not include the index wanted, rather than blindly switch to an IndexScan. The fallback plan would depend on the costs, but with enough tuples I would not have expected a sequential scan to be chosen as fallback.

samimseih commented 1 year ago

Correct, and in fact if all the supplied indexes have the wrong name, the only available path becomes the sequential scan.

Thinking about this a bit more, it seems that the correct behavior here is to make the entire hint invalid if it contains an index that does not exist and debug print should make it clear that the hint is not being used.

I prepared a patch to do something along that line.

In the example below, I created a table with foo_idx1 and foo_idx2

postgres=# create table foo ( id int ); create index foo_idx1 on foo (id);  create index foo_idx2 on foo(id); insert into foo select n from generate_series(1, 10000) as n;
CREATE TABLE
CREATE INDEX
CREATE INDEX
INSERT 0 10000

if the IndexScan hint is supplied indexes that do not exist, and even if one of the indexes does in fact exist, it should invalidate the entire hint.

postgres=# explain /*+ IndexScan(foo foo_idx4 foo_idx3 foo_idx2) */ select * from foo where id = 1;
LOG:  pg_hint_plan:
used hint:
not used hint:
duplication hint:
error hint:
IndexScan(foo foo_idx4 foo_idx3 foo_idx2)

                             QUERY PLAN
--------------------------------------------------------------------
 Index Scan using foo_idx2 on foo  (cost=0.29..8.30 rows=1 width=4)
   Index Cond: (id = 1)
(2 rows)

with a valid index, the hint is used.


postgres=# explain /*+ IndexScan(foo foo_idx1) */ select * from foo where id = 1;
LOG:  available indexes for IndexScan(foo): foo_idx1
LOG:  pg_hint_plan:
used hint:
IndexScan(foo foo_idx1)
not used hint:
duplication hint:
error hint:

                             QUERY PLAN
--------------------------------------------------------------------
 Index Scan using foo_idx1 on foo  (cost=0.29..8.30 rows=1 width=4)
   Index Cond: (id = 1)
(2 rows)

Let me know your thoughts and I can complete the patch with proper test cases.

michaelpq commented 1 year ago

if the IndexScan hint is supplied indexes that do not exist, and even if one of the indexes does in fact exist, it should invalidate the entire hint.

Agreed. That feels more natural to me to just invalidate the whole hint if it includes at least one index that does not exist. It seems to me that we should warn about that at parsing, at least? The module is rather soft when it comes to error handling.

samimseih commented 1 year ago

I don't think it will be possible to do this check at hint parse time since we don't have the list of available indexes at that point. I will double check and prepare a patch for review.

samimseih commented 1 year ago

Agreed. That feels more natural to me to just invalidate the whole hint if it includes at least one index that does not exist

While this is the ideal solution, upon further investigation it's really not possible without introducing a great deal of complexity to deal with partitioned tables. For non-partitioned tables, it's quite simple to validate that all hinted indexes are in the table. However, for partitioned tables the restrict_indexes function is called for each child table ( via the set_rel_pathlist_hook ) and and indexes in the hint that may not be available to one of the children may apply to another child. For example:

/*+IndexScan(p2 p2_c1_id_val_idx)*/
EXPLAIN (COSTS false) SELECT * FROM p2 WHERE id >= 50 AND id <= 51 AND p2.ctid = '(1,1)';
LOG:  available indexes for IndexScan(p2):
LOG:  available indexes for IndexScan(p2_c1): p2_c1_id_val_idx
LOG:  available indexes for IndexScan(p2_c1_c1):
LOG:  available indexes for IndexScan(p2_c1_c2):
LOG:  pg_hint_plan:
used hint:
IndexScan(p2 p2_c1_id_val_idx)
not used hint:
duplication hint:
error hint:

So, it's still a good idea to make the IndexScan safer by simply making sure that we are not destroying the entire index list if none of the indexes in the hint are available.


/*+IndexScan(t5 no_exist)*/
EXPLAIN (COSTS false) SELECT * FROM t5 WHERE t5.id = 1;
LOG:  available indexes for IndexScan(t5):
LOG:  pg_hint_plan:
used hint:
IndexScan(t5 no_exist)
not used hint:
duplication hint:
error hint:

                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Index Scan using t5_idaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa on t5
   Index Cond: (id = 1)
(2 rows)

Attached below is a patch with tests.

0001-Fix-handling-of-unavailable-indexes-in-IndexScan-hin.patch

michaelpq commented 1 year ago
-   ->  Seq Scan on p2 p2_1
-         Filter: ((id >= 50) AND (id <= 51) AND (ctid = '(1,1)'::tid))
+   ->  Index Scan using p2_val2_id on p2 p2_1
+         Index Cond: ((id >= 50) AND (id <= 51))
+         Filter: (ctid = '(1,1)'::tid)
    ->  Index Scan using p2_c1_id_val_idx on p2_c1 p2_2
          Index Cond: ((id >= 50) AND (id <= 51))
          Filter: (ctid = '(1,1)'::tid)
-   ->  Seq Scan on p2_c1_c1 p2_3
-         Filter: ((id >= 50) AND (id <= 51) AND (ctid = '(1,1)'::tid))
-   ->  Seq Scan on p2_c1_c2 p2_4
-         Filter: ((id >= 50) AND (id <= 51) AND (ctid = '(1,1)'::tid))
-(10 rows)
+   ->  Index Scan using p2_c1_c1_id_val_idx on p2_c1_c1 p2_3
+         Index Cond: ((id >= 50) AND (id <= 51))
+         Filter: (ctid = '(1,1)'::tid)
+   ->  Index Scan using p2_c1_c2_id_val_idx on p2_c1_c2 p2_4
+         Index Cond: ((id >= 50) AND (id <= 51))
+         Filter: (ctid = '(1,1)'::tid)
+(13 rows)

I have looked at the tests and.. Hmm. This diff and things around that are giving me a very long pause. The hint specified in this case allows one to specify an index that will be used across one partition while the others would use a SeqScan, but the patch would ignore the index name specified and force an IndexScan across all the partitions.

Okay, one can use /*+IndexScan(p2_c1 p2_c1_id_val_idx)*/ to mimic the previous behavior, still the other test changes also clearly document what happens when an index does not exist.

Anyway, with this patch,, I have found what looks like a case where the hint gets entirely broken. A regular expression with a BitmapScan still forces a BitmapScan even if no indexes match with the expression given. This is tracked by the following case: /*+ BitmapScan(p1 p1_.*val2.*)*/ EXPLAIN (COSTS false) SELECT val FROM p1 WHERE val = 1;

That does not look OK to me, because I would not expect BitmapScan in this case.

I am not sure how to attach a file to this reply, but I have pushed a patch on a local branch to keep track of the plan changes that are affect by this change in the regression tests: https://github.com/michaelpq/pg_hint_plan/tree/index_hint_bug

samimseih commented 1 year ago

I have looked at the tests and.. Hmm. This diff and things around that are giving me a very long pause. The hint specified in this case allows one to specify an index that will be used across one partition while the others would use a SeqScan, but the patch would ignore the index name specified and force an IndexScan across all the partitions.

Okay, one can use /+IndexScan(p2_c1 p2_c1_id_val_idx)/ to mimic the previous behavior, still the other test changes also clearly document what happens when an index does not exist.

Anyway, with this patch,, I have found what looks like a case where the hint gets entirely broken. A regular expression with a BitmapScan still forces a BitmapScan even if no indexes match with the expression given. This is tracked by the following case: /+ BitmapScan(p1 p1_.val2.)/ EXPLAIN (COSTS false) SELECT val FROM p1 WHERE val = 1;

These 2 cases are actually related to the same incorrect behavior the patch introduces. The issue is that the scan method is enforced even without matching indexes. You can think of it as /*+ IndexScan(t5 no_exist) */is the same as/*+ indexScan(t5) */. The reason it works currently is because when none of the available indexes match, we just end up removing all the indexes and end up with a sequential scan.

So, in v2 of the patch, I think this is handled better.

if we are going to end up deleting all the indexes, don't delete anything, and mark the scan hint as UNUSED. Also, don't enforce the scan method. This is essentially as if we did not supply the hint to begin with.

Therefore, instead of forcing a sequential scan, we let the planning occur as if there was no scan hint supplied.

the v2 test changes are also a lot more palatable. 0001-v2-Fix-handling-of-unavailable-indexes-in-Scan-hints.patch

Regards,

michaelpq commented 1 year ago

Based on the patch, this query:

 /*+ IndexScanRegexp(t5 ^.*t5_idaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab)*/
 EXPLAIN (COSTS false) SELECT id FROM t5 WHERE id = 1;

Changes to the following:

 @@ -7302,15 +7302,15 @@ EXPLAIN (COSTS false) SELECT id FROM t5 WHERE id = 1;
  LOG:  available indexes for IndexScanRegexp(t5):
 LOG:  pg_hint_plan:
 used hint:
-IndexScanRegexp(t5 ^.*t5_idaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab)
 not used hint:
+IndexScanRegexp(t5 ^.*t5_idaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab)
 duplication hint:

Okay, I am not surprised by this behavior in the patch. The user tries an IndexScan with a regexp, there is no index matching, and we just fallback to the default of an index-only scan.

@@ -7959,8 +7959,8 @@ LOG:  available indexes for BitmapScanRegexp(p1_c3_c1):
 LOG:  available indexes for BitmapScanRegexp(p1_c3_c2):
 LOG:  pg_hint_plan:
 used hint:
-BitmapScanRegexp(p1 p1[^_].*)
 not used hint:
+BitmapScanRegexp(p1 p1[^_].*)

This one is also less confusing than HEAD. The regexp does not match any of the indexes on the relations, falls back to a SeqScan. And this switches the hint from "used" to "not used", which is also correct because we don't use it.

As a whole, the regression tests are indeed much easier to understand this way.

Regarding the patch, I had a few comments:

+   } else
+       current_hint_state->set_scan_method = false;

This clearly deserves a comment. In short, it seems to me that we are in the case where we don't have any indexes left, and that we'd better explain why this is set? Anyway, do we need this flag at all? Why isn't HintStatus sufficient to track if a hint should be used or not.

+    * then we keep all the indexes and skip enforcing the scan mehthod. i.e.,

s/mehthod/method/

            ScanMethodHint * pshint = current_hint_state->parent_scan_hint; 
-           pshint->base.state = HINT_STATE_USED;
-
            /* Apply index mask in the same manner to the parent. */

Hmm. Not sure that this is completely correct.

samimseih commented 1 year ago

Regarding the patch, I had a few comments:

index_delete_list does not completely reflect what this list of indexes stores? I guess that it would be better to rename that to unused_indexes. delete_indexes does not seem necessary.

good point. I cleaned this up.

This clearly deserves a comment. In short, it seems to me that we are in the case where we don't have any indexes left,

I added a clarifying comment

Why isn't HintStatus sufficient to track if a hint should be used or not.

HintStatus->base.state will track if a hint is used overall. If any of the relations ( parent or child ) in the query use the hint, we mark the hint as USED. the set_scan_method is needed to track if we should enforce the scan method hint for each relation. I also renamed set_scan_method to skip_scan_method and set the default as false.

s/mehthod/method/

fixed

Hmm. Not sure that this is completely correct.

This is not needed in our patch as the HINT_STATE_USED is set by setup_scan_method_enforcement. By keeping this line of code, we are saying that the hint is used before we call restrict_indexes to see if that is really true.

See v3: 0001-v3-Fix-handling-of-unavailable-indexes-in-Scan-hints.patch

Regards,

michaelpq commented 1 year ago

Sorry for the delay in replying back. I have finished looking at that, and I think that v3 is mostly good. One thing I did not like is that we'd want to track if the scan method enforcement should happen or not depending on the static state in the current hint while the check is local to a specific code path. So I have changed things so as restrict_indexes() returns a status result instead, that's afterwards used to see if the second step should happen or not.

Another thing was that the patch was getting iffy in PG11 and PG12 due to the changes in list handlings, so I have skipped these branches, applying the patch down to PG13. With PG11 being EOL at the next release point, changing this behavior was a bit stressing. PG12 could be changed, perhaps.

samimseih commented 1 year ago

Thanks!

I'm OK with backpatching down to 13 for now.