postgrespro / pg_wait_sampling

Sampling based statistics of wait events
Other
143 stars 34 forks source link

Track on CPU events too #74

Closed ants closed 3 months ago

ants commented 3 months ago

To not count dead backends as still running on the CPU we need to detect that the backend is dead. Starting from PG17 proc->pid is reset in ProcKill, before this we can check if the process latch is disowned. Not nice to be poking around in latch internals like this, but all alternatives seem to involve scanning bestatus array and correlating pids.

Verified that the latch disown mechanism works on at least PostgreSQL 12-16.

Also makes sense to exclude ourselves as we will always be on CPU while looking at wait events.

Resolves #10

ants commented 3 months ago

Should I add a GUC to turn this functionality on and off?

shinderuk commented 3 months ago

Thanks for working on this! I did some tests with pgbench and the results look good. Yes, I think we need a GUC to turn this on. Also the pg_wait_sampling_current view needs to be patched similarly.

ants commented 3 months ago

Added a sample_cpu GUC and updated the pg_wait_sampling_current view. GUC defaults to false for backwards compatibility. What is the preference on this? I think most people would want to see the events, so maybe it should default to true?

egor-rogov commented 3 months ago

Thank you guys!