zubkov-andrei / pg_profile

Postgres historic workload reports
Other
236 stars 33 forks source link

Wait sampling: percentage is not very useful #93

Open portnov opened 6 months ago

portnov commented 6 months ago

Hello.

Currently, in "Wait Sampling" section, the report shows %Total as percentage of particular event time in relation to all events which were registered. This number can be very misleading. For example, we can see something like:

Event Waited (s) %Total
Lock:tuple 90 90
LWLock:WALWrite 10 10

and we say: wow, we have a problem with locks! But the report was taken for 10 minutes, and there were in average 100 active connections during the time, so total database time was 10*60*100 = 60000 seconds. So 90 seconds is 0.15% (tenth of a percent), not 90% at all!

On the other hand,

Event Waited (s) %Total
Client:ClientRead 120000 67
Lock:tuple 59000 33

— this looks like database was waiting for client, but in fact "Lock:tuple" took 59000 out of 60000 database time seconds, i.e. 98% of database time. High "ClientRead" can mean just idle time (lots of sessions that are connected to DB but do nothing).

Suggestion is: in "wait sampling" section, add "% of database time" column, calculated as percentage of "Waited (s)" relative to "Active" column from "Session statistics" block. As far as I understand, that "Active" column more or less corresponds to what Oracle calls "DB Time". I'm not sure, maybe current "%Total" column should be removed. Or maybe someone knows a good use for it, so maybe let there be two percentage columns. Maybe it also worth to show "% of elapsed time" column - relation of "Waited (s)" to ~"Total"~ sum of "Active" + "Idle" from "Session statistics".

Another related point that may be misleading as well is about "Client:ClientRead" event. As far as I understood, it will be almost always in top, just because of the way how pg_wait_sampling works (it collects all data from pg_stat_activity, including idle sessions). So maybe somehow format that row with gray, for example, to do not confuse people who see it occupying most of the time?

portnov commented 5 months ago

After some thought, it seems the same principle is applicable to "Top SQL" sections: it would be good to add "% of DB Time" column. Moreover, that would solve known problem that "%Total" for SQL queries is not quite correct when nested queries are counted.

portnov commented 5 months ago

Hello @zubkov-andrei ! Sorry to bother you, but... any comments on this matter? :)

zubkov-andrei commented 5 months ago

Hi, @portnov Excuse me for late answer, I was fixing some bugs in the last release.

Yes, we have a problem here. However I'm not sure we can implement your suggestion "as is" because of several thoughts:

  1. We don't have session timing statistics before Postgres 14. However pg_profile should behave similar in all versions. Of course, we can add %Active field to the reports, based on samples collected on 14 and up, but we should left something usable in all versions.
  2. Session timing statistics and wait sampling timing statistics are collected using different means. Session stats is true timer values collected on session state changes, while wait sampling is near-real time sampling but it doesn't actually know the exact waiting time. I really don't want to deal with rates between such different values.

But it seems, we should exclude 'idle' waits from %Total calculation. However we should think about what to do with the %Total values for idle wait events.

portnov commented 5 months ago

Hello. Thanks for the answer.

About wait sampling, probably one way to do it would be to modify pg_wait_sampling in such a way that it would record not only event and event_type columns from pg_stat_activity, but state as well. So that monitoring system could filter based on state, like select * from pg_wait_sampling_profile where state = 'active'. This way we could more or less reliably exclude "parasite" ClientRead (and probably other) wait events.

We don't have session timing statistics before Postgres 14.

I do not have objections against having both "% Total" and "% of DB Time" columns at the same time.

Session stats is true timer values collected on session state changes, while wait sampling is near-real time sampling but it doesn't actually know the exact waiting time. I really don't want to deal with rates between such different values.

Yes, such percentage would not be exact, but imho it would be much more exact comparing to existing "%Total".

And what about the same idea (% of db time) with relation to "Top SQL" sections?