ossc-db / pg_hint_plan

Extension adding support for optimizer hints in PostgreSQL
Other
674 stars 101 forks source link

Hint table doesn't seem to have effect on query plans #183

Closed avandras closed 2 months ago

avandras commented 3 months ago

I've got the following table:

test=# \d t_sample
                          Table "test.t_sample"
   Column    |           Type           | Collation | Nullable | Default 
-------------+--------------------------+-----------+----------+---------
 tstamp      | timestamp with time zone |           |          | 
 sensor_id   | integer                  |           |          | 
 measurement | numeric                  |           |          | 
Indexes:
    "t_sample_sensor_id_idx" btree (sensor_id)
    "t_sample_sensor_id_measurement_idx" btree (sensor_id, measurement)

pg_hint_plan is set up like this:

test=# SELECT version();
                                                              version                                                              
-----------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit

test=# \dx pg_hint_plan 
           List of installed extensions
     Name     | Version |  Schema   | Description 
--------------+---------+-----------+-------------
 pg_hint_plan | 1.5.1   | hint_plan | 

test=# SELECT name, setting FROM pg_settings WHERE name LIKE '%pg_hint%';
              name              | setting 
--------------------------------+---------
 pg_hint_plan.debug_print       | on
 pg_hint_plan.enable_hint       | on
 pg_hint_plan.enable_hint_table | on
 pg_hint_plan.hints_anywhere    | off
 pg_hint_plan.message_level     | debug1
 pg_hint_plan.parse_messages    | info
(6 rows)

With these settings, I can do the following:

test=# EXPLAIN  /*+ SeqScan(s) */ SELECT * FROM t_sample AS s WHERE sensor_id = 4;
DEBUG:  pg_hint_plan:
used hint:
SeqScan(s)
not used hint:
duplication hint:
error hint:

                             QUERY PLAN                              
---------------------------------------------------------------------
 Seq Scan on t_sample s  (cost=0.00..815550.25 rows=473761 width=23)
   Filter: (sensor_id = 4)
(2 rows)

So far, so good.

Now I inserted the following into the hint table:

test=# TABLE hint_plan.hints;
 id |                norm_query_string                | application_name |   hints    
----+-------------------------------------------------+------------------+------------
  3 | SELECT * FROM t_sample AS s WHERE sensor_id = ? |                  | SeqScan(s)
(1 row)

As far as I can tell the norm query string matches my query, but there is still no hint being applied:

test=# EXPLAIN  SELECT * FROM t_sample AS s WHERE sensor_id = 4;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t_sample s  (cost=5276.09..760738.78 rows=473761 width=23)
   Recheck Cond: (sensor_id = 4)
   ->  Bitmap Index Scan on t_sample_sensor_id_idx  (cost=0.00..5157.65 rows=473761 width=0)
         Index Cond: (sensor_id = 4)
(4 rows)

What am I doing wrong? And how to debug this further?

avandras commented 3 months ago

FWIW, upon a hint that the EXPLAIN might result in a different query, I've enabled auto_explain and still see no involvement from the hint table:

SET auto_explain.log_min_duration = 0;

2024-04-16 15:30:14.910 CEST,"postgres","test",753717,"[local]",661d60e9.b8035,40,"SELECT",2024-04-15 19:16:25 CEST,4/1094,0,LOG,00000,"duration: 1558.567 ms  plan:
Query Text: SELECT * FROM t_sample AS s WHERE sensor_id = 4;
Bitmap Heap Scan on t_sample s  (cost=5276.09..760738.78 rows=473761 width=23)
  Recheck Cond: (sensor_id = 4)
  ->  Bitmap Index Scan on t_sample_sensor_id_idx  (cost=0.00..5157.65 rows=473761 width=0)
        Index Cond: (sensor_id = 4)",,,,,,,,,"psql","client backend",,-6175464486641872830
avandras commented 3 months ago

Further analyzing this, I see the hint table is visited:

2024-04-16 16:28:50.942 CEST,"postgres","test",753717,"[local]",661d60e9.b8035,55,"SELECT",2024-04-15 19:16:25 CEST,4/1102,0,LOG,00000,"duration: 0.022 ms  plan:
Query Text: SELECT hints   FROM hint_plan.hints  WHERE norm_query_string = $1    AND ( application_name = $2     OR application_name = '' )  ORDER BY application_name DESC
Sort  (cost=11.31..11.31 rows=1 width=64) (actual time=0.021..0.022 rows=0 loops=1)
  Sort Key: application_name DESC
  Sort Method: quicksort  Memory: 25kB
  ->  Bitmap Heap Scan on hints  (cost=4.17..11.30 rows=1 width=64) (actual time=0.018..0.018 rows=0 loops=1)
        Recheck Cond: (norm_query_string = $1)
        Filter: ((application_name = $2) OR (application_name = ''::text))
        ->  Bitmap Index Scan on hints_norm_and_app  (cost=0.00..4.17 rows=3 width=0) (actual time=0.016..0.017 rows=0 loops=1)
              Index Cond: (norm_query_string = $1)",,,,,"SQL statement ""SELECT hints   FROM hint_plan.hints  WHERE norm_query_string = $1    AND ( application_name = $2     OR application_name = '' )  ORDER BY application_name DESC""",,,,"psql","client backend",,-6175464486641872830
2024-04-16 16:28:50.960 CEST,"postgres","test",753717,"[local]",661d60e9.b8035,56,"SELECT",2024-04-15 19:16:25 CEST,4/1102,0,LOG,00000,"duration: 17.125 ms  plan:
Query Text: SELECT * FROM t_sample AS s WHERE sensor_id = 4;
Bitmap Heap Scan on t_sample s  (cost=1653.55..50483.23 rows=15112 width=23) (actual time=5.504..13.173 rows=14401 loops=1)
  Recheck Cond: (sensor_id = 4)
  Heap Blocks: exact=14019
  ->  Bitmap Index Scan on t_sample_sensor_id_idx  (cost=0.00..1649.77 rows=15112 width=0) (actual time=3.899..3.900 rows=14401 loops=1)
        Index Cond: (sensor_id = 4)",,,,,,,,,"psql","client backend",,-6175464486641872830

but there is no row found matching my query.

My guess right now is that I don't fully understand how to come up with the norm query.

avandras commented 3 months ago

So, after further experimenting, we've found the following:

Based on all that, I think the query normalization approach has to be changed. It would be rather helpful if query strings from pg_stat_statements (or similar) could be used just with replacing positional parameters with question marks.

df7cb commented 3 months ago

There is a query id mechanism that got spun off pg_stat_statements and is now usable by other subsystems, I'd think pg_hint_plan should use that as well.

fakdaddy75 commented 2 months ago

Had/have the same issue. This gets almost impossible to use with complex wrapped queries. I agree with above statement that query_id should be all you need to pass to a function to load the hint table. Or have someway to trace the postgres $1 value for what text its searching in the statement

SELECT hints FROM hint_plan.hints WHERE norm_query_string = $1 AND ( application_name = $2 OR application_name = '' ) ORDER BY application_name DESC

michaelpq commented 2 months ago

There is a query id mechanism that got spun off pg_stat_statements and is now usable by other subsystems, I'd think pg_hint_plan should use that as well.

Yeah, we really need to rip the query normalization entirely off pg_hint_plan and grab all the information we need from the query IDs. I've not been able to get down to do it, though. So if somebody would like to write a patch I could check, that would be welcome.

michaelpq commented 2 months ago

This stuff will be addressed by #190 as an item for the next release, and will not be backpatched to stable branches, so closing this issue for now to avoid a duplicate.