Open michaelpq opened 2 months ago
Looking forward to trying out new changes! I'm sure @FranckPachot would be interested as well.
Hi Michael-san!
- Feature: plug-in query IDs rather than query string normalizations for the hint table. This has been on the table for some time, and I really want to get rid of that entirely now that query IDs are compiled in core, included for utilities.
Does it mean the query string normalization column on the hint table will be removed and the query_id column will be added on the table instead?
The release would be planned for September, potentially in its 2nd week but
let's be flexible about that, just before PostgreSQL GA. In terms of features, I think that I'll be able to tackle item 3, not sure about 4.
If @yamatattsu https://github.com/yamatattsu , @rjuju https://github.com/rjuju or any others have any comments, feel free.
For now, The schedule seems fine for me. I'd like to add this item to the TODO list:
- Add new Japanese translation file (*.po) for the document
Message ID: @.***>
Feature: plug-in query IDs rather than query string normalizations for the hint table. This has been on the table for some time, and I really want to get rid of that entirely now that query IDs are compiled in core, included for utilities.
Just a thought on this feature:
I do think there needs to be a column that tracks the last time the hint was used. IMO, there are existing reasons a last_used is important to have, but especially with queryId changing if a table is dropped and recreated, pg_dump, etc., the user will need to perform maintenance on the table and remove no longer valid queryIds. Of course there is an overhead to continually updating a last_used, and I don't think that accuracy is required here ( maybe update last_used every so often ).
Regards,
Sami Imseih
Cant wait for the query_id enhancement. thx
Does it mean the query string normalization column on the hint table will be removed and the query_id column will be added on the table instead?
Yes, it means to wipe out entirely this dependency from the tree. It would still be possible to cross-check the normalized string with pg_stat_statements, which is something that most users use anyway these days for monitoring.
Hi Michael-san and Sami,
Yes, it means to wipe out entirely this dependency from the tree. It would
still be possible to cross-check the normalized string with pg_stat_statements, which is something that most users use anyway these days for monitoring.
Ah, I see. In other words, pg_stat_statements is required as a prerequisite for using the hint table of pg_hint_plan. Right?
feature IMO as the current hint_table is hard to use when dealing with large sql text or the same sql text that could have differences in comments and white space.
As an extra information, pg_plan_advsr automatically generates normalized query strings and automatically registers them in the hint table for use. I don't understand the issue with normalized query strings in detail, but if you are having trouble with it, you might want to try pg_plan_advsr.
On Thu, May 2, 2024 at 6:57 AM Michael Paquier @.***> wrote:
Does it mean the query string normalization column on the hint table will be removed and the query_id column will be added on the table instead?
Yes, it means to wipe out entirely this dependency from the tree. It would still be possible to cross-check the normalized string with pg_stat_statements, which is something that most users use anyway these days for monitoring.
— Reply to this email directly, view it on GitHub https://github.com/ossc-db/pg_hint_plan/issues/190#issuecomment-2089198243, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSHQG2VEVZNHSZFDQKYB7DZAFQMLAVCNFSM6AAAAABHBEXSC6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOBZGE4TQMRUGM . You are receiving this because you were mentioned.Message ID: @.***>
I don't understand the issue with normalized query strings in detail, but if you are having trouble with it,
Duplication. The normalization code in pg_hint_plan is a copy-paste of pg_stat_statements.
I can see what you are getting at with pg_plan_advsr as it relies on pg_hint_plan's normalize_query.h. Still it looks like the same thing to me. Is there any point in normalizing the query if we have a query ID. Perhaps that's an overoptimistic assumption, but we can assume that pg_stat_statements is used, meaning that the query would be normalized and that it would be able to retrieve that with a join? @rjuju , thoughts?
well, there's an assumption that some normalisation stuff is installed, as now pg_stat_statements can work with custom algorithm. so in theory if your use case is to add hints on queries touching temp tables because you don't issue vacuum and you know that a certain plan is better it could be a good idea to rely on a custom jumbling approach that computes a query identifier based on the relation names for temp tables so that you can actually have a stable queryid. I'm not sure how much we rely on having the normalised query string available (either in this extension or pg_plan_asvsr), but if for some reason pgss isn't an option users could also hack a simplified version that only stores the normalised query string, which could made way more efficient than pgss and the garbage collection approach.
I'm not sure how much we rely on having the normalised query string available (either in this extension or pg_plan_asvsr), but if for some reason pgss isn't an option users could also hack a simplified version that only stores the normalised query string, which could made way more efficient than pgss and the garbage collection approach.
pg_hint_plan uses the normalized query to do lookups of the hint table, using the same normalization as pg_stat_statements where the comments don't matter. There's a point that this reduces debugging if relying only on the normalized string if looking at the hint table, but honestly I'm going to hang on my argument about "let's just use pg_stat_statements" for the mapping and do JOINs, reducing the bloat of the hint table if using long query strings. We're paying twice the normalization costs, currently.
my point was more with the curent plan of relying on the queryid only. if we do that there's no need for the query string or normalised query string right? so there would be no need for pgss either, just enable core queryid computation or install a custom one that better suits your needs.
I can see what you are getting at with pg_plan_advsr as it relies on pg_hint_plan's normalize_query.h. Still it looks like the same thing to me. Is there any point in normalizing the query if we have a query ID. Perhaps that's an overoptimistic assumption, but we can assume that pg_stat_statements is used, meaning that the query would be normalized and that it would be able to retrieve that with a join?
Ah, right, pg_plan_advsr uses pg_hint_plan's normalize_query.h. I thought the problem was that it was difficult to create a normalized query string, so I proposed to use pg_plan_advsr. But I understand that it is not the problem on this thread.
I'm going to hang on my argument about "let's just use pg_stat_statements"
for the mapping and do JOINs, reducing the bloat of the hint table if using long query strings. We're paying twice the normalization costs, currently.
The problem here is two things like:
P.S. pg_plan_advsr uses queryid by using compute_query_id since PG14, so I guess it's okay to use it in the hint table.
feature IMO as the current hint_table is hard to use when dealing with large sql text or the same sql text that could have differences in comments and white space. As an extra information, pg_plan_advsr automatically generates normalized query strings and automatically registers them in the hint table for use.
I don't understand the issue with normalized query strings in detail, but if you are having trouble with it, you might want to try pg_plan_advsr.
Here is a repro of an example that I have seen on the field a few times. In the below case, my hints table includes the following query text
SELECT * FROM t1 WHERE t1.id = ?;
postgres=# select * from hint_plan.hints;
id | norm_query_string | application_name | hints
----+-----------------------------------+------------------+-------------
2 | SELECT * FROM t1 WHERE t1.id = ?; | | SeqScan(t1)
(1 row)
However, an application might issue a variant ( or variants ) of the query text in which a comment is added.
/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;
In this case, the hints table will only work for the uncommented version of the query text, even though they are semantically similar queries.
postgres=# SELECT * FROM t1 WHERE t1.id = 1;
LOG: pg_hint_plan[qno=0x1b]: hints from table: "SeqScan(t1)": normalized_query="SELECT * FROM t1 WHERE t1.id = ?;", application name ="psql"
LOG: pg_hint_plan[qno=0x1c]: hints from table: "SeqScan(t1)": normalized_query="SELECT * FROM t1 WHERE t1.id = ?;", application name ="psql"
LOG: pg_hint_plan[qno=0x1b]: planner
LOG: pg_hint_plan[qno=0x1b]: setup_hint_enforcement index deletion: relation=16404(t1), inhparent=0, current_hint_state=0x11a015578, hint_inhibit_level=0, scanmask=0x1
LOG: pg_hint_plan[qno=0x1b]: HintStateDump: {used hints:SeqScan(t1)}, {not used hints:(none)}, {duplicate hints:(none)}, {error hints:(none)}
id
----
1
(1 row)
postgres=# /*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;
LOG: pg_hint_plan[qno=0x1d]: no match found in table: application name = "psql", normalized_query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;"
LOG: hints in comment=" APPLICATION 2 ", query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;", debug_query_string="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;"
LOG: pg_hint_plan[qno=0x1e]: no match found in table: application name = "psql", normalized_query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = ?;"
LOG: hints in comment=" APPLICATION 2 ", query="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;", debug_query_string="/*+ APPLICATION 2 */ SELECT * FROM t1 WHERE t1.id = 1;"
INFO: pg_hint_plan: hint syntax error at or near " APPLICATION 2 "
DETAIL: Unrecognized hint keyword "APPLICATION".
LOG: pg_hint_plan[qno=0x1d]: planner: no valid hint
id
----
1
(1 row)
select query, queryid, calls from pg_stat_statements where query like '%t1%';
query | queryid | calls
-----------------------------------+----------------------+-------
SELECT * FROM t1 WHERE t1.id = $1 | -7122654414306258210 | 2
(1 row)
This is just on real-world case in which having a queryId will be very useful. There are other cases such as adding whitespace or a linebreak in the query text. All such queries will end up with the same queryId, but the hints table currently is not able to recognize this.
postgres=# SELECT *
postgres-# FROM t1 WHERE t1.id = 1;
LOG: pg_hint_plan[qno=0x3b]: no match found in table: application name = "psql", normalized_query="SELECT *
FROM t1 WHERE t1.id = ?;"
LOG: hints in comment="(none)", query="SELECT *
FROM t1 WHERE t1.id = 1;", debug_query_string="SELECT *
FROM t1 WHERE t1.id = 1;"
LOG: pg_hint_plan[qno=0x3c]: no match found in table: application name = "psql", normalized_query="SELECT *
FROM t1 WHERE t1.id = ?;"
LOG: hints in comment="(none)", query="SELECT *
FROM t1 WHERE t1.id = 1;", debug_query_string="SELECT *
FROM t1 WHERE t1.id = 1;"
LOG: pg_hint_plan[qno=0x3b]: planner: no valid hint
id
----
1
(1 row)
Regards,
Sami
This is just on real-world case in which having a queryId will be very useful. There are other cases such as adding whitespace or a linebreak in the query text. All such queries will end up with the same queryId, but the hints table currently is not able to recognize this.
From the point where the hint table requires the values to be inserted, providing a 64b value is just going to be less error-prone because there are less contents to manipulate..
The problem here is two things like: 1. reducing the bloat of the hint table by long query text 2. reducing the costs of normalization string Is my understanding right? If so, using queryid based on compute_query_id on PG core seems better to solve the problems.
Thanks. As far as I can see, there are no major objections about that. I will send a patch that does the switch in the hint table then, tweaking the docs that query ID values inserted in the hint table to match with the entries in pg_stat_statements.
From my lonely perspective this is perfect. Works like oracle using a statement id or hash.
Here's a real world example - no-one has been able to make it work. Meaning - postgres user groups, myself and a consultancy Never finds a text match in hint_table ...... noone knows what it should be as we cant expose the normalized query $1 the recursive optimizer is looking for.
https://github.com/ossc-db/pg_hint_plan/issues/180
thanks Michaelpq !!
I have begun my butcher work for this release, and began with two things:
/+ APPLICATION 2 / SELECT * FROM t1 WHERE t1.id = ?;
postgres=# SELECT *
postgres-# FROM t1 WHERE t1.id = 1;
Thanks for the clarification! I understand the problems.
Michael-san,
- As promised, a patch to integrate query IDs in the hint table: #193 https://github.com/ossc-db/pg_hint_plan/pull/193.
FYI: I talked about using query_id in the hint table with Horiguchi-san today, and He has no objections. So, It's okay from us. :-D
So, It's okay from us. :-D
If you have any questions, we can talk about that on Friday in live.
If you have any questions, we can talk about that on Friday in live.
Sounds good! Thanks.
What is the status in here? v17 beta2 is already out.
What is the status in here? v17 beta2 is already out.
Still need a bit of time, but I am towards a release by the end of August with the core changes pushed by the end of next week, before brushing what I can from the bug list. I don't have more plans for in-core changes with the module for this release, only bug fixes.
Looking at my calendar, I am proposing the 29th of August for the tarball release. @devrimgunduz
The branches have been structured and prepared for the upcoming release, and a new PG17 branch has been created with a github action job for automated tests. At this point it would be only a matter of tagging and creating the tarballs.
Now is time for the real action: clean up of the existing bug list and address issues. I also have on my list to reassess a3646e1b073c as it has been impacting the plan stability of some customer cases. This may finish by being reverted.
Hi all,
I have that on my tablets for some time. Let's do a new release of this module at the same time as PostgreSQL 17. I have on my TODO list a few things, as of:
The release would be planned for September, potentially in its 2nd week but let's be flexible about that, just before PostgreSQL GA. In terms of features, I think that I'll be able to tackle item 3, not sure about 4.
If @yamatattsu , @rjuju or any others have any comments, feel free.