powa-team / pg_qualstats

A PostgreSQL extension for collecting statistics about predicates, helping find what indices are missing
Other
272 stars 26 forks source link

Does not work with postgres10 #18

Closed asr1901 closed 6 years ago

asr1901 commented 6 years ago

Cannot install this extension on postgres10. Whenever I start postgres with "pg_qualstats" in "shared_preload_libraries" there is a core dump and postgres won't start.

rjuju commented 6 years ago

That's bad :/

Unfortunately, I can't reproduce your issue:

rjuju=# select version();
                                               version                                                
------------------------------------------------------------------------------------------------------
 PostgreSQL 10.0@0ab77a34f8 on x86_64-pc-linux-gnu, compiled by gcc (Gentoo 6.4.0 p1.0) 6.4.0, 64-bit
(1 row)

rjuju=# show shared_preload_libraries ;
              shared_preload_libraries               
-----------------------------------------------------
 pg_stat_statements,powa,pg_qualstats,pg_stat_kcache
(1 row)

Is there anything special about your server? Can you tell:

If possible, could you provide a stacktrace of the generated core dump? Debug packages for both postgres and pg_qualstats should be installed in order to have a useful stacktrace.

asr1901 commented 6 years ago

We are running 64 bit, CentOS 7.4.1708, I tried installing via RPM.

I also tried by downloading the source files and "make install" but got the same result. I tried all of the preload libraries 1 at a time and pg_qualstats was the only one that coredumped. What packages do I need to install to get a useful stack trace?

● postgresql-10.service - PostgreSQL 10 database server Loaded: loaded (/usr/lib/systemd/system/postgresql-10.service; enabled; vendor preset: disabled) Active: failed (Result: core-dump) since Tue 2017-10-24 13:14:46 EDT; 4min 26s ago Process: 32202 ExecStart=/usr/pgsql-10/bin/postmaster -D ${PGDATA} (code=dumped, signal=SEGV) Process: 32194 ExecStartPre=/usr/pgsql-10/bin/postgresql-10-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS) Main PID: 32202 (code=dumped, signal=SEGV)

Oct 24 13:14:39 postgres10 systemd[1]: Starting PostgreSQL 10 database server... Oct 24 13:14:40 postgres10 postmaster[32202]: < 2017-10-24 13:14:40 EDT [32202] : [1-1] user=,db=,remote= > LOG: listening on IPv4 address "0.0.0.0", port 5432 Oct 24 13:14:40 postgres10 postmaster[32202]: < 2017-10-24 13:14:40 EDT [32202] : [2-1] user=,db=,remote= > LOG: listening on IPv6 address "::", port 5432 Oct 24 13:14:40 postgres10 postmaster[32202]: < 2017-10-24 13:14:40 EDT [32202] : [3-1] user=,db=,remote= > LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" Oct 24 13:14:40 postgres10 postmaster[32202]: < 2017-10-24 13:14:40 EDT [32202] : [4-1] user=,db=,remote= > LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" Oct 24 13:14:46 postgres10 systemd[1]: postgresql-10.service: main process exited, code=dumped, status=11/SEGV Oct 24 13:14:46 postgres10 systemd[1]: Failed to start PostgreSQL 10 database server. Oct 24 13:14:46 postgres10 systemd[1]: Unit postgresql-10.service entered failed state. Oct 24 13:14:46 postgres10 systemd[1]: postgresql-10.service failed.

asr1901 commented 6 years ago

Oct 24 13:14:40 postgres10 postmaster[32202]: < 2017-10-24 13:14:40 EDT [32202] : [1-1] user=,db=,remote= > LOG: listening on IPv4 address "0.0.0.0", port 5432 Oct 24 13:14:40 postgres10 postmaster[32202]: < 2017-10-24 13:14:40 EDT [32202] : [2-1] user=,db=,remote= > LOG: listening on IPv6 address "::", port 5432 Oct 24 13:14:40 postgres10 postmaster[32202]: < 2017-10-24 13:14:40 EDT [32202] : [3-1] user=,db=,remote= > LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432" Oct 24 13:14:40 postgres10 postmaster[32202]: < 2017-10-24 13:14:40 EDT [32202] : [4-1] user=,db=,remote= > LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" Oct 24 13:14:40 postgres10 kernel: postmaster[32202]: segfault at 957ad00 ip 00007fed0a540f5c sp 00007ffe634a8940 error 6 in pg_qualstats.so[7fed0a53e000+6000] Oct 24 13:14:40 postgres10 abrt-hook-ccpp[32203]: Process 32202 (postgres) of user 26 killed by SIGSEGV - dumping core

asr1901 commented 6 years ago

core_backtrace:

{ "signal": 11 , "executable": "/usr/pgsql-10/bin/postgres" , "stacktrace": [ { "crash_thread": true , "frames": [ { "address": 140656057257820 , "build_id": "bee02421dd736fa987f8f36b9f4b3add59fee940" , "build_id_offset": 12124 , "function_name": "pgqs_shmem_startup" , "file_name": "/usr/pgsql-10/lib/pg_qualstats.so" } , { "address": 7294540 , "build_id": "ccb8f7becee23b3f6649dd3bd8d4a78c665a68ef" , "build_id_offset": 3100236 , "function_name": "CreateSharedMemoryAndSemaphores" , "file_name": "/usr/pgsql-10/bin/postgres" } , { "address": 7003987 , "build_id": "ccb8f7becee23b3f6649dd3bd8d4a78c665a68ef" , "build_id_offset": 2809683 , "function_name": "PostmasterMain" , "file_name": "/usr/pgsql-10/bin/postgres" } , { "address": 4700799 , "build_id": "ccb8f7becee23b3f6649dd3bd8d4a78c665a68ef" , "build_id_offset": 506495 , "function_name": "main" , "file_name": "/usr/pgsql-10/bin/postgres" } ] } ] }

asr1901 commented 6 years ago

GDB output: [New LWP 32202] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/'. Program terminated with signal 11, Segmentation fault.

0 0x00007fed0a540f5c in pgqs_shmem_startup () at pg_qualstats.c:1390

1390 pgqs->lock = &(locks[0]).lock; Missing separate debuginfos, use: debuginfo-install postgresql10-server-10.0-1PGDG.rhel7.x86_64

asr1901 commented 6 years ago

pg qual stats version (from yum): pg_qualstats10-1.0.2-1.rhel7.x86_64

rjuju commented 6 years ago

Program terminated with signal 11, Segmentation fault.

0 0x00007fed0a540f5c in pgqs_shmem_startup () at pg_qualstats.c:1390

1390 pgqs->lock = &(locks[0]).lock;

That's really weird, and that code exists in pg_qualstats since pg 9.6.

Can you show the content of the locks variable?

(gdb) p locks
$1 = (LWLockPadded *) 0x7fffe4d03780
asr1901 commented 6 years ago

[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/'. Program terminated with signal 11, Segmentation fault.

0 0x00007fcc0d6e3f5c in pgqs_shmem_startup () at pg_qualstats.c:1390

1390 pgqs->lock = &(locks[0]).lock; Missing separate debuginfos, use: debuginfo-install postgresql10-server-10.0-1PGDG.rhel7.x86_64 (gdb) p locks $1 = (LWLockPadded *) 0x7fcc01dcd880 (gdb)

rjuju commented 6 years ago

So GetNamedLWLockTranche() did return the tranche.

What about pgqs?

(gdb) p pgqs
$2 = (pgqsSharedState *) 0x7fffeda06400
(gdb) p *pgqs
$3 = {lock = 0x0, querylock = 0x0, sampledlock = 0x0, sampled = 0x7fffeda06418 ""}
asr1901 commented 6 years ago

(gdb) p pgqs $2 = (pgqsSharedState *) 0xc71dd80

rjuju commented 6 years ago

Sorry I edited my previous note to also check the dereferenced var:

(gdb) p *pgqs
$3 = {lock = 0x0, querylock = 0x0, sampledlock = 0x0, sampled = 0x7fffeda06418 ""}

Just in case can you also check

(gdb) p &(locks[0]).lock
$4 = (LWLock *) 0x7fffe4d03780
(gdb) p (locks[0]).lock
$5 = {tranche = 65, state = {value = 536870912}, waiters = {head = 2147483647, tail = 2147483647}}
asr1901 commented 6 years ago

(gdb) p pgqs Cannot access memory at address 0xc71dd80 (gdb) p &(locks[0]).lock $3 = (LWLock ) 0x7fcc01dcd880 (gdb) p (locks[0]).lock $4 = {tranche = 66, state = {value = 536870912}, waiters = {head = 2147483647, tail = 2147483647}}

rjuju commented 6 years ago

Ah, so the pointer returned by ShmemInitStruct() isn't valid. I'm not sure how this can happen, and I'm not sure what pg_qualstats did wrong to make that happen.

Is there anything else except pg_qualstats in shared_preload_libraries?

asr1901 commented 6 years ago

shared_preload_libraries = 'pg_stat_statements, powa, pg_stat_kcache,pg_qualstats'

asr1901 commented 6 years ago

same issue happens when pg_qualstats is there alone though, I tested both ways. also tried on a different machine, same thing.

rjuju commented 6 years ago

Ah ok I get it now, version 1.0.2 is not compatible with pg10. Sorry i didn't realize the version you were using before.

If you can compile pg_qualstats, current HEAD should work without any issue. I'll make a new release shortly in any case.

asr1901 commented 6 years ago

OK, will try to download the latest now.

asr1901 commented 6 years ago

Have it working using the HEAD revision. Thank you so much for your help! Please let me know when the new release is out and in the pgdg repo.

rjuju commented 6 years ago

Good news!

Yes I'll let you know when 1.0.3 will be out.

rjuju commented 6 years ago

I just released version 1.0.3! Sorry for the delay. I also warned Devrim, the yum pgdg packager. The updated package should be available within a few days.

rjuju commented 6 years ago

packages for 1.0.3 and 1.0.4 are available for quite some time now, so I'm closing this issue.