postgrespro / pg_wait_sampling

Sampling based statistics of wait events
Other
147 stars 35 forks source link

Synchronization with collector may be lost if backend is terminated #48

Closed shinderuk closed 2 years ago

shinderuk commented 2 years ago

Consider the following scenario:

  1. Session 1 requests history.
  2. Session 1 is terminated while wating for collector.
  3. Session 2 requests profile.
  4. Collector sends history requested by session 1.
  5. Session 2 receives history instead of profile.

Stop the collector just before LockAcquire(&tag, ExclusiveLock, false, false); in collector_main (collector.c:442).

Session 1:

postgres=# select * from pg_wait_sampling_history limit 5;
FATAL:  terminating connection due to administrator command
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Terminate the backend while it is waiting for collector (with kill or pg_terminate_backend).

Session 2:

postgres=# select * from pg_wait_sampling_profile limit 5;
  pid  | event_type |        event        | queryid |      count      
-------+------------+---------------------+---------+-----------------
 93315 | Activity   | LogicalLauncherMain |       0 | 702740797103094
 93309 | Activity   | CheckpointerMain    |       0 | 702740797103094
 93310 | Activity   | BgWriterHibernate   |       0 | 702740797103094
 93312 | Activity   | WalWriterMain       |       0 | 702740797103094
 93635 | Client     | ClientRead          |       0 | 702740797224477
(5 rows)

After sending the query let the collector continue.

maksm90 commented 2 years ago

Good catch! Thanks for the issue. Generally, such asynchronous behavior of collector and requesting process might incur different failures.

maksm90 commented 2 years ago

PR https://github.com/postgrespro/pg_wait_sampling/pull/52 resolves this issue. Placing acquiring of collector lock before reading of collector_hdr->request would allow to complete history request, that from collector's side will fail on message queue, before backend initiating profile request will get progress (it will block on collector lock). If profile request gets progress before acquiring of collector lock then collector_hdr->request variable will be rewritten before collector read this value and as result history request will be ignored.