postgrespro / pg_wait_sampling

Sampling based statistics of wait events
Other
144 stars 34 forks source link

infinite Lock userlock during select count(*) from pg_wait_sampling_history; #18

Open grin31 opened 4 years ago

grin31 commented 4 years ago

Good day. During query from pg_wait_sampling_history session infinite wait. Query from pg_wait_sampling_current working.

config: pg_wait_sampling.history_period | 100 pg_wait_sampling.history_size | 5000 pg_wait_sampling.profile_period | 100 pg_wait_sampling.profile_pid | on pg_wait_sampling.profile_queries | on

SELECT version(); PostgreSQL 10.11 (Ubuntu 10.11-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 64-bit

1. query: [local]:5454 postgres@postgres=# select count(*) from pg_wait_sampling_history;

2. from pg_stat_activity -[ RECORD 18 ]---+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- datid | 12974 datname | postgres pid | 117180 usesysid | 10 usename | postgres application_name | psql client_addr | [null] client_hostname | [null] client_port | -1 backend_start | 2020-04-08 14:53:19.027164+03 xact_start | 2020-04-08 14:53:21.038274+03 query_start | 2020-04-08 14:53:21.038274+03 state_change | 2020-04-08 14:53:21.038276+03 wait_event_type | Lock wait_event | userlock state | active backend_xid | [null] backend_xmin | 32249100 query | select count(*) from pg_wait_sampling_history; backend_type | client backend

3. [local]:5454 postgres@postgres=# SELECT * from pg_locks where pid=117180; locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ------------+------------+----------+--------+--------+------------+---------------+---------+--------+----------+--------------------+--------+-----------------+---------+---------- relation | 12974 | 9020314 | [null] | [null] | [null] | [null] | [null] | [null] | [null] | 10/788746 | 117180 | AccessShareLock | t | t virtualxid | [null] | [null] | [null] | [null] | 10/788746 | [null] | [null] | [null] | [null] | 10/788746 | 117180 | ExclusiveLock | t | t userlock | 3398742279 | [null] | [null] | [null] | [null] | [null] | 1 | 0 | 0 | 10/788746 | 117180 | ExclusiveLock | f | f userlock | 3398742279 | [null] | [null] | [null] | [null] | [null] | 0 | 0 | 0 | 10/788746 | 117180 | ExclusiveLock | t | f (4 rows)

4. postgres@postgres=# SELECT relname from pg_class where relfilenode = 9020314; result: pg_wait_sampling_history

maksm90 commented 2 years ago

Hi, @grin31 ! I suppose this issue was resolved in https://github.com/postgrespro/pg_wait_sampling/pull/31 .

From your picture it might be concluded that collector process hangs holding its own lock, I suppose, while working with message queue: in fact, in a whole fragment under collector lock just manipulation with message queue is a single blocking operation. The pg_stat_activity entry for collector process would able to confirm this suggestion. Before the patch such hanging might occur when some last query to pg_wait_sampling function was interrupted, e.g., by Ctrl+C. Now, I don't see any reasons for collector hanging.

@grin31 do you remember, was some previous query to pg_wait_sampling before select count(*) from pg_wait_sampling_history interrupted? If so, I'll close this issue.