powa-team / powa

PostgreSQL Workload Analyzer
http://powa.readthedocs.io/
PostgreSQL License
777 stars 57 forks source link

background worker "powa" (PID 2536) was terminated by signal 11: Segmentation fault #131

Closed aleszeleny closed 4 years ago

aleszeleny commented 4 years ago

Hello,

we are experiencing in the last few weeks. Unfortunately haven't isolate reliable test case yet, so I can provide only observations available. Symptoms are the same - issue subject error message in PostgreSQL log (and therefore server restart). The common workaround is to remove powa from shared_preload_libraries, once this is done, the failing job proceeded - this worked for all cases where we suffered from segfault issue.

OS: Ubuntu 18.04.2 LTS Postgres versions where the issue occurs:

shared_preload_libraries: Same on servers/versions listed above - pg_stat_statements,pg_stat_kcache,powa,pg_qualstats,pg_partman_bgw,pg_cron,pg_prewarm

As soon as powa was removed from shared_preload_libraries the job proceeded.

One case was a function creating and dropping thousand of temporary tables (once I spot this, the developer was asked to refactor the code), the second issue was from a java based ETL tools performing upsert (several hundred thousands of rows - I have no access to the java ETL details to say whether there were some batches or a transaction per row). It never happened immediately, always after a few minutes of the workload runs.

I there something I can test or setup better logging to identify the issue root cause or any workaround to avoid the segfaults while having powa in place?

Thanks Ales

rjuju commented 4 years ago

Hello, Is there any chance that this started after a package upgrade, more precisely having pg_qualstats 2.0.0 or 2.0.1 installed, server restarted, but the pg_qualstats extension not being dropped and created (so still in version 1.X.Y)?

If yes, you're hitting https://github.com/powa-team/pg_qualstats/issues/30. This has been fixed and released, although I'm not sure when a debian package will be available.

In the meantime, dropping and creating pg_qualstats extension will also fix the problem.

aleszeleny commented 4 years ago

Thanks for the tip, I'll try to find a better reproducible test and reinstall pg_qualstats.

aleszeleny commented 4 years ago

After one of our two tests, it seems it helped. Once I'll have a while will try to run the second test case as well. As usual, it seems to be my user error, because once you point the potential version issue, I've checked qualstats extension and binaries and they were different. The problem was not caused by pgdg ubuntu packages, we have deployed new server using ansible-playbook and it installed qualstats in latest version while on production DB server is an older version and database was restored from a physical backup. That is the way, how old extension version landed into the database wile more recent binaries were installed on OS level.

rjuju commented 4 years ago

Thank for the confirmation! Well, pg_qualstats should have provided backward compatibility with v1.x version and properly detected mismatch with future version, but I failed to do that in v2.0.0 and v2.0.1, it's definitely not a user error.

rjuju commented 4 years ago

For the record debian packages for v2.0.2 are now available on pgdg repository, see https://www.postgresql.org/message-id/E1jgoIm-00027Z-4y@atalia.postgresql.org.

aleszeleny commented 4 years ago

I've run all remaining know test-cases causing segfault and once database extension was re/created so it matches binaries from the installed OS-level package all test succeed.

Thanks a lot for Your help!

P.S.: thanks for the new pg_qualstats package and for powa4 packages, I like the remote mode option, but I did not want to use home compiled extensions with the one form pgdg packages, on prod, so except some testing, I was waiting for the version 4. Great work!

rjuju commented 4 years ago

Thank you very much, I'm glad that you like this new remote mode option!