ossc-db / pg_hint_plan

Extension adding support for optimizer hints in PostgreSQL
Other
674 stars 101 forks source link

Next release of pg_hint_plan with PostgreSQL 17 compatibility #190

Open michaelpq opened 2 months ago

michaelpq commented 2 months ago

Hi all,

I have that on my tablets for some time. Let's do a new release of this module at the same time as PostgreSQL 17. I have on my TODO list a few things, as of:

  1. Finish any compatibility work required for HEAD.
  2. Fork a new branch called PG17, with changes on HEAD to plug in PG18.
  3. Feature: plug-in query IDs rather than query string normalizations for the hint table. This has been on the table for some time, and I really want to get rid of that entirely now that query IDs are compiled in core, included for utilities.
  4. Feature: Introduce a proper parser for the hint strings. bfb45447c9d4 has done most of the work to be able to extract the hints, but we still need to translate the hints to binary trees, which is something that the code now does with some manual steps in the hint type callbacks.
  5. A round table of existing bugs, to see what could be addressed.
  6. More work and analysis is needed for a3646e1b073c, which is still on HEAD. This has been reverted from the back-branches following plan instabilities as of bfaabf161824, and I think it's not quite safe yet in terms of parallel handling.
  7. This will be the last release for PG12, with PostgreSQL 12 compatibility, following upstream that will EOL it. So let's be very careful to not break anything on this branch.

The release would be planned for September, potentially in its 2nd week but let's be flexible about that, just before PostgreSQL GA. In terms of features, I think that I'll be able to tackle item 3, not sure about 4.

If @yamatattsu , @rjuju or any others have any comments, feel free.

andyatkinson commented 2 months ago

Looking forward to trying out new changes! I'm sure @FranckPachot would be interested as well.

yamatattsu commented 2 months ago

Hi Michael-san!

  1. Feature: plug-in query IDs rather than query string normalizations for the hint table. This has been on the table for some time, and I really want to get rid of that entirely now that query IDs are compiled in core, included for utilities.

Does it mean the query string normalization column on the hint table will be removed and the query_id column will be added on the table instead?

The release would be planned for September, potentially in its 2nd week but

let's be flexible about that, just before PostgreSQL GA. In terms of features, I think that I'll be able to tackle item 3, not sure about 4.

If @yamatattsu https://github.com/yamatattsu , @rjuju https://github.com/rjuju or any others have any comments, feel free.

For now, The schedule seems fine for me. I'd like to add this item to the TODO list:

  1. Add new Japanese translation file (*.po) for the document

Message ID: @.***>

samimseih commented 2 months ago

Feature: plug-in query IDs rather than query string normalizations for the hint table. This has been on the table for some time, and I really want to get rid of that entirely now that query IDs are compiled in core, included for utilities.

3 is a really important feature IMO as the current hint_table is hard to use when dealing with large sql text or the same sql text that could have differences in comments and white space.

Just a thought on this feature:

I do think there needs to be a column that tracks the last time the hint was used. IMO, there are existing reasons a last_used is important to have, but especially with queryId changing if a table is dropped and recreated, pg_dump, etc., the user will need to perform maintenance on the table and remove no longer valid queryIds. Of course there is an overhead to continually updating a last_used, and I don't think that accuracy is required here ( maybe update last_used every so often ).

Regards,

Sami Imseih

fakdaddy75 commented 2 months ago

Cant wait for the query_id enhancement. thx

michaelpq commented 2 months ago

Does it mean the query string normalization column on the hint table will be removed and the query_id column will be added on the table instead?

Yes, it means to wipe out entirely this dependency from the tree. It would still be possible to cross-check the normalized string with pg_stat_statements, which is something that most users use anyway these days for monitoring.

yamatattsu commented 2 months ago

Hi Michael-san and Sami,

Yes, it means to wipe out entirely this dependency from the tree. It would

still be possible to cross-check the normalized string with pg_stat_statements, which is something that most users use anyway these days for monitoring.

Ah, I see. In other words, pg_stat_statements is required as a prerequisite for using the hint table of pg_hint_plan. Right?

3 https://github.com/ossc-db/pg_hint_plan/issues/3 is a really important

feature IMO as the current hint_table is hard to use when dealing with large sql text or the same sql text that could have differences in comments and white space.

As an extra information, pg_plan_advsr automatically generates normalized query strings and automatically registers them in the hint table for use. I don't understand the issue with normalized query strings in detail, but if you are having trouble with it, you might want to try pg_plan_advsr.

On Thu, May 2, 2024 at 6:57 AM Michael Paquier @.***> wrote:

Does it mean the query string normalization column on the hint table will be removed and the query_id column will be added on the table instead?

Yes, it means to wipe out entirely this dependency from the tree. It would still be possible to cross-check the normalized string with pg_stat_statements, which is something that most users use anyway these days for monitoring.

— Reply to this email directly, view it on GitHub https://github.com/ossc-db/pg_hint_plan/issues/190#issuecomment-2089198243, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSHQG2VEVZNHSZFDQKYB7DZAFQMLAVCNFSM6AAAAABHBEXSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBZGE4TQMRUGM . You are receiving this because you were mentioned.Message ID: @.***>

michaelpq commented 2 months ago

I don't understand the issue with normalized query strings in detail, but if you are having trouble with it,

Duplication. The normalization code in pg_hint_plan is a copy-paste of pg_stat_statements.

I can see what you are getting at with pg_plan_advsr as it relies on pg_hint_plan's normalize_query.h. Still it looks like the same thing to me. Is there any point in normalizing the query if we have a query ID. Perhaps that's an overoptimistic assumption, but we can assume that pg_stat_statements is used, meaning that the query would be normalized and that it would be able to retrieve that with a join? @rjuju , thoughts?

rjuju commented 2 months ago

well, there's an assumption that some normalisation stuff is installed, as now pg_stat_statements can work with custom algorithm. so in theory if your use case is to add hints on queries touching temp tables because you don't issue vacuum and you know that a certain plan is better it could be a good idea to rely on a custom jumbling approach that computes a query identifier based on the relation names for temp tables so that you can actually have a stable queryid. I'm not sure how much we rely on having the normalised query string available (either in this extension or pg_plan_asvsr), but if for some reason pgss isn't an option users could also hack a simplified version that only stores the normalised query string, which could made way more efficient than pgss and the garbage collection approach.

michaelpq commented 2 months ago

I'm not sure how much we rely on having the normalised query string available (either in this extension or pg_plan_asvsr), but if for some reason pgss isn't an option users could also hack a simplified version that only stores the normalised query string, which could made way more efficient than pgss and the garbage collection approach.

pg_hint_plan uses the normalized query to do lookups of the hint table, using the same normalization as pg_stat_statements where the comments don't matter. There's a point that this reduces debugging if relying only on the normalized string if looking at the hint table, but honestly I'm going to hang on my argument about "let's just use pg_stat_statements" for the mapping and do JOINs, reducing the bloat of the hint table if using long query strings. We're paying twice the normalization costs, currently.

rjuju commented 2 months ago

my point was more with the curent plan of relying on the queryid only. if we do that there's no need for the query string or normalised query string right? so there would be no need for pgss either, just enable core queryid computation or install a custom one that better suits your needs.

yamatattsu commented 2 months ago

I can see what you are getting at with pg_plan_advsr as it relies on pg_hint_plan's normalize_query.h. Still it looks like the same thing to me. Is there any point in normalizing the query if we have a query ID. Perhaps that's an overoptimistic assumption, but we can assume that pg_stat_statements is used, meaning that the query would be normalized and that it would be able to retrieve that with a join?

Ah, right, pg_plan_advsr uses pg_hint_plan's normalize_query.h. I thought the problem was that it was difficult to create a normalized query string, so I proposed to use pg_plan_advsr. But I understand that it is not the problem on this thread.

I'm going to hang on my argument about "let's just use pg_stat_statements"

for the mapping and do JOINs, reducing the bloat of the hint table if using long query strings. We're paying twice the normalization costs, currently.

The problem here is two things like:

  1. reducing the bloat of the hint table by long query text
  2. reducing the costs of normalization string Is my understanding right? If so, using queryid based on compute_query_id on PG core seems better to solve the problems.

P.S. pg_plan_advsr uses queryid by using compute_query_id since PG14, so I guess it's okay to use it in the hint table.

samimseih commented 2 months ago

feature IMO as the current hint_table is hard to use when dealing with large sql text or the same sql text that could have differences in comments and white space. As an extra information, pg_plan_advsr automatically generates normalized query strings and automatically registers them in the hint table for use.

I don't understand the issue with normalized query strings in detail, but if you are having trouble with it, you might want to try pg_plan_advsr.

Here is a repro of an example that I have seen on the field a few times. In the below case, my hints table includes the following query text

 SELECT * FROM t1 WHERE t1.id = ?; 
postgres=# select * from hint_plan.hints;
 id |         norm_query_string         | application_name |    hints    
----+-----------------------------------+------------------+-------------
  2 | SELECT * FROM t1 WHERE t1.id = ?; |                  | SeqScan(t1)
(1 row)

However, an application might issue a variant ( or variants ) of the query text in which a comment is added.

/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?; 

In this case, the hints table will only work for the uncommented version of the query text, even though they are semantically similar queries.

postgres=# SELECT * FROM t1 WHERE t1.id = 1;
LOG:  pg_hint_plan[qno=0x1b]: hints from table: "SeqScan(t1)": normalized_query="SELECT * FROM t1 WHERE t1.id = ?;", application name ="psql"
LOG:  pg_hint_plan[qno=0x1c]: hints from table: "SeqScan(t1)": normalized_query="SELECT * FROM t1 WHERE t1.id = ?;", application name ="psql"
LOG:  pg_hint_plan[qno=0x1b]: planner
LOG:  pg_hint_plan[qno=0x1b]: setup_hint_enforcement index deletion: relation=16404(t1), inhparent=0, current_hint_state=0x11a015578, hint_inhibit_level=0, scanmask=0x1
LOG:  pg_hint_plan[qno=0x1b]: HintStateDump: {used hints:SeqScan(t1)}, {not used hints:(none)}, {duplicate hints:(none)}, {error hints:(none)}
 id 
----
  1
(1 row)

postgres=# /*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1; 
LOG:  pg_hint_plan[qno=0x1d]: no match found in table:  application name = "psql", normalized_query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment=" APPLICATION 2 ", query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;", debug_query_string="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;"
LOG:  pg_hint_plan[qno=0x1e]: no match found in table:  application name = "psql", normalized_query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment=" APPLICATION 2 ", query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;", debug_query_string="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;"
INFO:  pg_hint_plan: hint syntax error at or near " APPLICATION 2 "
DETAIL:  Unrecognized hint keyword "APPLICATION".
LOG:  pg_hint_plan[qno=0x1d]: planner: no valid hint
 id 
----
  1
(1 row)

select query, queryid, calls from pg_stat_statements where query like '%t1%';

               query               |       queryid        | calls 
-----------------------------------+----------------------+-------
 SELECT * FROM t1 WHERE t1.id = $1 | -7122654414306258210 |     2
(1 row)

This is just on real-world case in which having a queryId will be very useful. There are other cases such as adding whitespace or a linebreak in the query text. All such queries will end up with the same queryId, but the hints table currently is not able to recognize this.

postgres=# SELECT *
postgres-# FROM t1 WHERE t1.id = 1;
LOG:  pg_hint_plan[qno=0x3b]: no match found in table:  application name = "psql", normalized_query="SELECT *
FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment="(none)", query="SELECT *
FROM t1 WHERE t1.id = 1;", debug_query_string="SELECT *
FROM t1 WHERE t1.id = 1;"
LOG:  pg_hint_plan[qno=0x3c]: no match found in table:  application name = "psql", normalized_query="SELECT *
FROM t1 WHERE t1.id = ?;"
LOG:  hints in comment="(none)", query="SELECT *
FROM t1 WHERE t1.id = 1;", debug_query_string="SELECT *
FROM t1 WHERE t1.id = 1;"
LOG:  pg_hint_plan[qno=0x3b]: planner: no valid hint
 id 
----
  1
(1 row)

Regards,

Sami

michaelpq commented 2 months ago

This is just on real-world case in which having a queryId will be very useful. There are other cases such as adding whitespace or a linebreak in the query text. All such queries will end up with the same queryId, but the hints table currently is not able to recognize this.

From the point where the hint table requires the values to be inserted, providing a 64b value is just going to be less error-prone because there are less contents to manipulate..

michaelpq commented 2 months ago

The problem here is two things like: 1. reducing the bloat of the hint table by long query text 2. reducing the costs of normalization string Is my understanding right? If so, using queryid based on compute_query_id on PG core seems better to solve the problems.

Thanks. As far as I can see, there are no major objections about that. I will send a patch that does the switch in the hint table then, tweaking the docs that query ID values inserted in the hint table to match with the entries in pg_stat_statements.

fakdaddy75 commented 2 months ago

Thanks. As far as I can see, there are no major objections about that. I will send a patch that does the switch in the hint table then, tweaking the docs that query ID values inserted in the hint table to match with the entries in pg_stat_statements.

From my lonely perspective this is perfect. Works like oracle using a statement id or hash.

Here's a real world example - no-one has been able to make it work. Meaning - postgres user groups, myself and a consultancy Never finds a text match in hint_table ...... noone knows what it should be as we cant expose the normalized query $1 the recursive optimizer is looking for.

https://github.com/ossc-db/pg_hint_plan/issues/180

thanks Michaelpq !!

michaelpq commented 2 months ago

I have begun my butcher work for this release, and began with two things:

  1. Stabilization of the regression tests. A few things have changed in upstream, for the best actually. I have applied these directly to allow the code to run installcheck.
  2. As promised, a patch to integrate query IDs in the hint table: https://github.com/ossc-db/pg_hint_plan/pull/193.
yamatattsu commented 2 months ago

/+ APPLICATION 2 / SELECT * FROM t1 WHERE t1.id = ?;

postgres=# SELECT *

postgres-# FROM t1 WHERE t1.id = 1;

Thanks for the clarification! I understand the problems.

yamatattsu commented 2 months ago

Michael-san,

  1. As promised, a patch to integrate query IDs in the hint table: #193 https://github.com/ossc-db/pg_hint_plan/pull/193.

FYI: I talked about using query_id in the hint table with Horiguchi-san today, and He has no objections. So, It's okay from us. :-D

michaelpq commented 2 months ago

So, It's okay from us. :-D

If you have any questions, we can talk about that on Friday in live.

yamatattsu commented 2 months ago

If you have any questions, we can talk about that on Friday in live.

Sounds good! Thanks.

devrimgunduz commented 6 days ago

What is the status in here? v17 beta2 is already out.

michaelpq commented 5 days ago

What is the status in here? v17 beta2 is already out.

Still need a bit of time, but I am towards a release by the end of August with the core changes pushed by the end of next week, before brushing what I can from the bug list. I don't have more plans for in-core changes with the module for this release, only bug fixes.

michaelpq commented 2 days ago

Looking at my calendar, I am proposing the 29th of August for the tarball release. @devrimgunduz

michaelpq commented 1 day ago

The branches have been structured and prepared for the upcoming release, and a new PG17 branch has been created with a github action job for automated tests. At this point it would be only a matter of tagging and creating the tarballs.

Now is time for the real action: clean up of the existing bug list and address issues. I also have on my list to reassess a3646e1b073c as it has been impacting the plan stability of some customer cases. This may finish by being reverted.