powa-team / pg_qualstats

A PostgreSQL extension for collecting statistics about predicates, helping find what indices are missing
Other
274 stars 26 forks source link

Hanged randomly in hash_seq_search #6

Closed amaury-d closed 7 years ago

amaury-d commented 8 years ago

Hello,

When I execute the following request around 10-50 times in a row, postgres hangs randomly on the connection. New connections are still working.

The CPU peaks at 100% and the process needs to be SIGKILLed. SIGTERM is not treated.

The request is: SELECT * FROM MEASURE_FLAT_H INNER JOIN ( SELECT max(dt) as dtmax FROM MEASURE_FLAT_H WHERE (ASSETS_NAME = ?) AND (METRIC_NAME = ?) ) d ON (D.DTMAX = DT) WHERE (ASSETS_NAME = ?) AND (METRIC_NAME = ?) At this step, no lock is found in pg_locks.

The process can be found stuck in the hash_seq_search().

Call stack:

0 0x000000000075e919 in hash_seq_search ()

1 0x00007fd0af74fb2b in pgqs_entry_dealloc ()

from /tech/postgres/9.4.1/server/lib/pg_qualstats.so

2 0x00007fd0af750a15 in pgqs_ExecutorEnd ()

from /tech/postgres/9.4.1/server/lib/pg_qualstats.so

3 0x000000000055dade in PortalCleanup ()

4 0x000000000077438a in PortalDrop ()

5 0x000000000077472a in PreCommit_Portals ()

6 0x00000000004b36ec in CommitTransaction ()

7 0x00000000004b4ab5 in CommitTransactionCommand ()

8 0x000000000067a269 in finish_xact_command ()

9 0x000000000067d685 in PostgresMain ()

10 0x000000000062c246 in PostmasterMain ()

11 0x00000000005c4ab8 in main ()

Each time, I've seen the process looping infinitely from 75e910 to 75e927:

0x000000000075e910 <+96>: mov (%r9,%rcx,8),%rax 0x000000000075e914 <+100>: test %rax,%rax 0x000000000075e917 <+103>: jne 0x75e93c <hash_seq_search+140> 0x000000000075e919 <+105>: add $0x1,%edx ---Type to continue, or q to quit--- => 0x000000000075e91c <+108>: cmp %edx,%esi 0x000000000075e91e <+110>: jb 0x75e958 <hash_seq_search+168> 0x000000000075e920 <+112>: add $0x1,%rcx 0x000000000075e924 <+116>: cmp %rcx,%r8 0x000000000075e927 <+119>: jg 0x75e910 <hash_seq_search+96>

For information, other hangs was also met in the past with other requests in similar situations but I don't have traces and can't confirm the issue was also in hash_seq_search().

Best regards,

Amaury

rjuju commented 8 years ago

Thanks for the report!

Can you easily reproduce this issue on a non production database? If a schema-only dump and the query enough to reproduce this issue, could you provide the schema-only dump? (powa (AT) dalibo (dot) com if you need privacy).

Which version of pg_qualstats is it?

rjuju commented 7 years ago

We just received another report which looks similar to this issue.

Here are the steps to reproduce the problem (need 9.6 psql for \gexec command):

CREATE TABLE ttt(id1 integer, id2 integer);
INSERT INTO ttt select i, i from generate_series(1,600) i;
with s(v) as (select string_agg(i::text || ', ' || i::text, '),(') from generate_series(1,500)i)
select 'select count(*) from ttt where (id1, id2) in (( ' || v || '))' from s;
\gexec
rjuju commented 7 years ago

Commit https://github.com/dalibo/pg_qualstats/commit/82d0b086b91d5bdfffc6945ac5942f439393e095 should fix this issue. Could you confirm it also fixes your test case?

amaury-d commented 7 years ago

Thanks, transmitted to our Postgresql team, waiting for their return.

rjuju commented 7 years ago

I'm closing this issue now, since this should now be fixed. If it's not the case, please reopen it.

Thanks again for reporting the problem.