powa-team / powa-archivist

powa-archivist: the powa PostgreSQL extension
http://powa.readthedocs.io/
PostgreSQL License
51 stars 20 forks source link

Unparameterized sql query in powa_statements table #59

Closed plvnv closed 3 months ago

plvnv commented 7 months ago

While viewing statistics via powa-web, I found two sql queries with the same query text, but one of them had constant parameters, the other one had readable constants instead of $1,$2 etc.

I thought that powa takes the query text from the pg_stat_statements view, but I did not find any query with constants there.

For query with constant we have wrong statistics (see screenshot)

изображение

rjuju commented 7 months ago

Hi,

Indeed powa solely relies on pg_stat_statements to retrieve the base queries, and it's pg_stat_statements' job to normalize the queries.

Did you see those queries with the constant still in the query text in the query detail tab or other place in the UI?

It's possible that this query was present in pg_stat_statements and has been evicted since, while powa will retain it for some time. Did they have different queryid? Can you check if that queryid exist in the remote server's pg_stat_statements view, possibly with a different query text? Note that postgrespro is a forked version of postgres, and I don't know whether they changed something in pg_stat_statements.

For the UI bug, that's quite surprising. Note that the next version will come with a fully rewritten interface so this will hopefully fix that problem.

plvnv commented 7 months ago

yes, i see it in the query detail tab (see screenshot) and on screen with all queries list

изображение

Those queries have different id's:

Now i can't see this querys in pg_stat_statements,

only in

powa_statements, powa_kcache_metrics_current, powa_statements_history

rjuju commented 7 months ago

I see. Do you still see new activity for the query in powa since you checked that there's no such record in the original server's pg_stat_statement?

plvnv commented 7 months ago

Right Now I don't see any new query activity, probably because pg_stat_statements no longer shows any of the 2 queries (with and without constants). But yesterday I saw a similar picture, the activity of the query with constants changed when viewed through powa-web and at that moment pg_stat_statements was showing only one query with parameters ($1, $2 ...)

Yesterday I think it was some corruption in powa repository, i update powa-archvist from 4.1.4->4.2.2, powa-web from 4.1.3->4.2.0 and recreate powa repository from scratch, but as we can see, it didn't help.

rjuju commented 7 months ago

Oh, that's interesting. Can you confirm that pg_stat_statementd view contains close to pg_stat_statements.max record?

If yes it means that your workload has more normalized queries that what pg_stat_statements can retain. That would explain that those entries are not present anymore if it's queries that haven't been executed recently.

Also, if your workload contains a lot more normalized queries that what pg_stat_statements can contain, it will keep evicting queries almost constantly. That's quite bad as it will really hurt performance (I've seen report of more than 40% overhead in that case), but it can also lead to pg_stat_statements storing non-normalized query text.

The reason for that is because pg_stat_statements can only normalize the query text after parsing the query (as it's the parser that gives the information about the constants location), but it the underlying entry is evicted between query parsing and the end of the execution, it will create a new entry using the original query text as a fallback. Now it's possible that this entry is then stored by powa, then evicted again and then stored again but during the query parsing, which means that you will see a normalized query text in pg_stat_statements while powa only stored the first version it saw, which wasn't normalized.

That's quite an unlikely scenario. Maybe your workload relies on temporary tables? In that case it would become much more likely, as basically all queries touching a temporary table will get a different queryid, leading to a virtually infinite number of entries that pg_stat_statements will never be able to handle.

plvnv commented 7 months ago
powa=# show pg_stat_statements.max 
powa-# ;
 pg_stat_statements.max 
------------------------
 10000
(1 row)

powa=# select count(*) from pg_stat_statements ;
 count 
-------
  9905
(1 row)

It seems to me that temp tables are not used often in our system

rjuju commented 7 months ago

Thanks for the details.

It seems clear to me that your pg_stat_statements.max is too low. If indeed you're not relying on temp tables, or not much, you may be able to fix this problem and also improve performance globally by raising it. Since powa will retain more queryid than pg_stat_statements it can give you an idea of what value you should use for pg_stat_statements.max.

plvnv commented 7 months ago

Thank you very much for your advice!

I'll try to play with the parameter pg_stat_statements.max

rjuju commented 7 months ago

Ok, let me know if you can find a value not too high that avoids most of the evictions.

There's another scenario I just thought of: partitioned tables with "rolling partitionning". This one is probably a better scenario, as while it can also lead to an infinite number of normalized queries, there will always be only a stable amount of them at any given time so you shouldn't have too frequent eviction (but will still have some of them unfortunately). This however makes it much harder to know what is a good value for pg_stat_statements.max, and since changing it requires a restart it's not always easy to tune it as much as we want.

plvnv commented 7 months ago

Hi, @rjuju

It seems that for test servers I found a value of pg_stat_statements.max = 30000, in the near future I will put it on product instances and report back with the results

rjuju commented 7 months ago

Hi @plvnv. Got it! Fortunately it's not too high so it doesn't seem unreasonable to use that value. If you're wondering, raising pg_stat_statements.max to 30000 will consume a total of a bit less than 12MB of ram, nothing to worry about.

plvnv commented 7 months ago

Hi, @rjuju

Unfortunately, on production servers things are somewhat worse; when setting pg_stat_statements.max = 50000, the maximum is reached while the instance is running; Moreover, queries with constants appears up in the POWA repository even when pg_stat_statements.max was half filled

Is it worth further increasing the value of pg_stat_statements.max in the hope of a positive result?

rjuju commented 7 months ago

That's a very sad news :(

The problem with a high number of entries is that it makes evictions and query text cleanup increasingly expensive. With a maximum number of entries of 50k you're likely to have some random "hiccups" where everything will seems to block for a few hundred ms.

Increasing pg_stat_statements.max is only going to help if you can guarantee that you can actually keep all entries in memory. Since I don't know why you have so many entries, I don't know if you're ever going to be able to achieve that or not.

What you should do is understand why you have so many entries, understand whether it's a finite but big number (like a multi-tenant application with hundreds of schema), an infinite number that can stay stable for sometime (like the partitioning case I mentioned) or just an infinite number that changes all the time (like the temporary table case).

For the queries with constant present, are you by any chance using prepared statements (or the extended protocol) with persistent connections? Those will make the problem more likely as they will limit the number of time queries are parsed.

plvnv commented 7 months ago

At this moment for the queries with constant present (in POWA repository) we don't use prepared statement at app level. In next mainteneace window i'll try to rise up to 100k pg_stat_statements.max and see what happens.

Now with pg_stat_statements.max = 50000 we have some statistics mess

for expample 2 queries with identical text (but with different constants, present in POWA rep and in pg_stat_statements simultaneously)

q1

q2

plvnv commented 7 months ago

Perhaps in Postgres 14+ the developers tried to solve this problem

1) we can change the method for calculating the hash of query IDs with compute_query_id = on 2) the statistics of the pg_stat_statements module itself are tracked and made available via a view named pg_stat_statements_info

rjuju commented 7 months ago

I'm actually the one who wrote this new compute_query_id thing :) The idea is indeed to allow users to use a different source of hashing to compute the identifier if the native one cannot work for some reason. Upgrading to this version and use a custom query hashing module might allow you to fix this issue, but you would have to understand why so many entries are present:

for example 2 queries with identical text (but with different constants, present in POWA rep and in pg_stat_statements simultaneously)

If you get 2 different entries then it has to use different objects somewhere in that query. With identical query text, it has to be temporary tables or unqualified relations with multiple schema used implicitly (with a different search_path).