powa-team / powa-archivist

powa-archivist: the powa PostgreSQL extension
http://powa.readthedocs.io/
PostgreSQL License
51 stars 20 forks source link

Segmentation fault during powa_take_snapshot() on PostgreSQL 9.6.17 #31

Closed DidierSterbecq closed 3 years ago

DidierSterbecq commented 4 years ago

Hi We use PoWA 3.2 (3.2.0-2) with our PostgreSQL 9.6.x databases. We update recently to PostgreSQL 9.6.17 and since then we have segmentation Fault during execution of function powa_take_snapshot(), and after little time PostgreSQL entering Recovery mode. Database Server is on Red Hat 7.6. PostgreSQL log file extract is :

2020-05-12 00:41:20 CEST [53719]: [1594-1] LOG: 00000: worker process: powa (PID 13153) was terminated by signal 11: Segmentation fault 2020-05-12 00:41:20 CEST [53719]: [1595-1] DETAIL: Failed process was running: SELECT powa_take_snapshot() 2020-05-12 00:41:20 CEST [53719]: [1596-1] LOCATION: LogChildExit, postmaster.c:3581 2020-05-12 00:41:20 CEST [53719]: [1597-1] LOG: 00000: terminating any other active server processes

Then several repeated warnings as : 2020-05-12 00:41:20 CEST [19985]: [5-1] WARNING: 57P02: terminating connection because of crash of another server process 2020-05-12 00:41:20 CEST [19985]: [6-1] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

Thanks by advance for your help.

rjuju commented 4 years ago

Hello,

Is there any chance that you could get a stack trace of the generated coredump, if any? (see https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD).

Otherwise, could you provide:

DidierSterbecq commented 4 years ago

Errors arise on production platform, so we disable Powa collector to avoid repeated crash. Today we try to create a test platform to reproduce that.

For the information :

powa=# \dx List of installed extensions Name | Version | Schema | Description -------------------+---------+------------+----------------------------------------------------------- btree_gist | 1.2 | public | support for indexing common datatypes in GiST pg_qualstats | 1.0.7 | public | An extension collecting statistics about quals pg_stat_kcache | 2.1.1 | public | Kernel statistics gathering pg_stat_statements | 1.4 | public | track execution statistics of all SQL statements executed plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language powa | 3.2.0 | public | PostgreSQL Workload Analyser-core

powa=# select distinct module from powa_functions where enabled ; module

powa_stat_user_functions pg_qualstats pg_stat_statements powa_stat_all_relations pg_stat_kcache (5 rows)

Didier Sterbecq Ingéniérie de Brique Technique - SIT/S2IP/ACI/IBT

RATP - Département des systèmes d'information et de télécommunications Lac NA40 - NYLG 102 Esplanade de la Commune de Paris 93160 Noisy-le-Grand Tél : 01 58 78 81 78 GSM : 06 17 45 17 32 didier.sterbecq@ratp.frmailto:didier.sterbecq@ratp.fr

De : Julien Rouhaud notifications@github.com Envoyé : vendredi 15 mai 2020 15:18 À : powa-team/powa-archivist powa-archivist@noreply.github.com Cc : STERBECQ Didier didier.sterbecq@ratp.fr; Author author@noreply.github.com Objet : Re: [powa-team/powa-archivist] Segmentation fault during powa_take_snapshot() on PostgreSQL 9.6.17 (#31)

Hello,

Is there any chance that you could get a stack trace of the generated coredump, if any? (see https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD).

Otherwise, could you provide:

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/powa-team/powa-archivist/issues/31#issuecomment-629230773, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APS7UTEEWRPPZBVD7A2DG5DRRU6HVANCNFSM4NBR67EA.

rjuju commented 4 years ago

Ok!

I'll also try to reproduce the same environment. Just in case, are you using pgdg rpm packages or installing from source?

DidierSterbecq commented 4 years ago

We use pgdg rpm packages.

Didier Sterbecq Ingéniérie de Brique Technique - SIT/S2IP/ACI/IBT

RATP - Département des systèmes d'information et de télécommunications Lac NA40 - NYLG 102 Esplanade de la Commune de Paris 93160 Noisy-le-Grand Tél : 01 58 78 81 78 GSM : 06 17 45 17 32 didier.sterbecq@ratp.frmailto:didier.sterbecq@ratp.fr

De : Julien Rouhaud notifications@github.com Envoyé : vendredi 15 mai 2020 15:59 À : powa-team/powa-archivist powa-archivist@noreply.github.com Cc : STERBECQ Didier didier.sterbecq@ratp.fr; Author author@noreply.github.com Objet : Re: [powa-team/powa-archivist] Segmentation fault during powa_take_snapshot() on PostgreSQL 9.6.17 (#31)

Ok!

I'll also try to reproduce the same environment. Just in case, are you using pgdg rpm packages or installing from source?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/powa-team/powa-archivist/issues/31#issuecomment-629251334, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APS7UTCTF6FNJSVHBIGMY3LRRVDARANCNFSM4NBR67EA.

rjuju commented 4 years ago

Ok!

Since you're using RHEL 7 you should be able to use coredumpctl to get a stack trace.

DidierSterbecq commented 4 years ago

Hi,

I do not reproduce the error in our test environment. I will try another test case. To be followed.

Didier Sterbecq Ingéniérie de Brique Technique - SIT/S2IP/ACI/IBT

RATP - Département des systèmes d'information et de télécommunications Lac NA40 - NYLG 102 Esplanade de la Commune de Paris 93160 Noisy-le-Grand Tél : 01 58 78 81 78 GSM : 06 17 45 17 32 didier.sterbecq@ratp.frmailto:didier.sterbecq@ratp.fr

De : Julien Rouhaud notifications@github.com Envoyé : vendredi 15 mai 2020 16:30 À : powa-team/powa-archivist powa-archivist@noreply.github.com Cc : STERBECQ Didier didier.sterbecq@ratp.fr; Author author@noreply.github.com Objet : Re: [powa-team/powa-archivist] Segmentation fault during powa_take_snapshot() on PostgreSQL 9.6.17 (#31)

Ok!

Since you're using RHEL 7 you should be able to use coredumpctl to get a stack trace.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/powa-team/powa-archivist/issues/31#issuecomment-629267106, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APS7UTDUPORMTT4J7J3TNPDRRVGVPANCNFSM4NBR67EA.

rjuju commented 4 years ago

Oh BTW you mentioned that the issue started right after upgrading postgres. Is there any chance that you also actually upgraded pg_qualstats to v2.0.1 at the same time, and loaded this version when you restarted postgres and afterward didn't upgraded the extension? If yes that probably a duplicate of https://github.com/powa-team/pg_qualstats/issues/30, which has been fixed in v2.0.2.

DidierSterbecq commented 4 years ago

That should be one good track as our upgrade process include O/S libraries and PostgreSQL extensions. I check the case for pg_qualstats.

Didier Sterbecq Ingéniérie de Brique Technique - SIT/S2IP/ACI/IBT

RATP - Département des systèmes d'information et de télécommunications Lac NA40 - NYLG 102 Esplanade de la Commune de Paris 93160 Noisy-le-Grand Tél : 01 58 78 81 78 GSM : 06 17 45 17 32 didier.sterbecq@ratp.frmailto:didier.sterbecq@ratp.fr

De : Julien Rouhaud notifications@github.com Envoyé : mardi 26 mai 2020 12:31 À : powa-team/powa-archivist powa-archivist@noreply.github.com Cc : STERBECQ Didier didier.sterbecq@ratp.fr; Author author@noreply.github.com Objet : Re: [powa-team/powa-archivist] Segmentation fault during powa_take_snapshot() on PostgreSQL 9.6.17 (#31)

Oh BTW you mentioned that the issue started right after upgrading postgres. Is there any chance that you also actually upgraded pg_qualstats to v2.0.1 at the same time, and loaded this version when you restarted postgres and afterward didn't upgraded the extension? If yes that probably a duplicate of powa-team/pg_qualstats#30https://github.com/powa-team/pg_qualstats/issues/30, which has been fixed in v2.0.2.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/powa-team/powa-archivist/issues/31#issuecomment-633943970, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APS7UTH73VB47QGNQMTJ44LRTOK6LANCNFSM4NBR67EA.

rjuju commented 4 years ago

Hello,

do you have any update on this issue?

rjuju commented 3 years ago

Hearing no news, I'm assuming that this was due to pg_qualstats 2.0.1 and upgrading to 2.0.2 fixed the problem. Feel free to reopen this issue if needed.