powa-team / powa

PostgreSQL Workload Analyzer
http://powa.readthedocs.io/
PostgreSQL License
768 stars 57 forks source link

segmentation fault in pg_qualstats #52

Closed lazystone closed 5 years ago

lazystone commented 9 years ago

[628594.420330] postgres[24488]: segfault at 8 ip 00007f295308a03c sp 00007ffef54b16a0 error 4 in pg_qualstats.so[7f2953088000+5000]

The only hint - it fails on a very long transaction with a lot of inserts.

rdunklau commented 9 years ago

Would you happen to have a log ? At least of the last executed query ? Thank you.

lazystone commented 9 years ago

2015-06-04 03:02:58 CEST LOG: server process (PID 24488) was terminated by signal 11: Segmentation fault 2015-06-04 03:02:58 CEST DETAIL: Failed process was running: copy (select * from service_passenger_departure_view) to '/tmp/extract_and_build_database/service_passenger_departure.dmp'

rdunklau commented 9 years ago

What is the definition of the view, and the underlying tables column types, for column involved in a JOIN or WHERE clause ?

Does it crash every time you run this query, or only during your long transaction ?

Thank you.

lazystone commented 9 years ago

I've substituted field names by their types:

CREATE OR REPLACE VIEW service_passenger_departure_view AS 
 SELECT DISTINCT psd.integer_column,
    pbd.integer_column,
    st.character_varying_2,
    st.character_varying_64,
    s.character_varying_10,
    s.character_varying_256,
    st.integer_column,
    s.character_varying_40,
    ps.boolean_column,
    ps.character_varying_3,
    ps.timestamp_with_time_zone,
    ps.integer_column,
    ps.timestamp_with_time_zone
   FROM st, s, pbd, psd, ps
  WHERE ps.integer_column IS NOT NULL AND st.inter_column = (s.inter_column + 0) AND s.integer_column = (ps.integer_column + 0) AND pbd.integer_column = (ps.integer_column + 0) AND psd.integer_column = (ps.integer_column + 0);
lazystone commented 9 years ago

I guess that it fails every time. But can't say for sure right now.

lazystone commented 9 years ago

Tried to enable it once more - fails on another query now:

16:02:30 CEST LOG:  server process (PID 49306) was terminated by signal 11: Segmentation fault
2015-06-04 16:02:30 CEST DETAIL:  Failed process was running: /* load collection se.paxport.paxoutput.core.model.DocumentEntity.documentReceivers */ select documentre0_.document_id as document1_2_0_, documentre0_.address as address2_5_0_, documentre0_.as_attachment as as_attac3_5_0_, documentre0_.carbon_copy as carbon_c4_5_0_, documentre0_.name as name5_5_0_, documentre0_.receive_only_reportable_data as receive_6_5_0_, documentre0_.receiver_id as receiver7_5_0_, documentre0_.sent_at as sent_at8_5_0_, documentre0_.split_mail_pnl as split_ma9_5_0_, documentre0_.type as type10_5_0_, documentre0_.use_adl_all_group as use_adl11_5_0_ from t_document_receiver documentre0_ where documentre0_.document_id=$1
rjuju commented 9 years ago

I tried to reproduce your issue, but without success. Could you provide a self-contained test or a dump of the required data ? (you can send it at powa (at) dalibo (dot) com if you need privacy).

Can you also specify the exact postgresql and pg_qualstats versions ?

Regards.

lazystone commented 9 years ago

I will try to enable this extension again next week. The problem is that it's reproducible on production server only. Postgresql version was 9.4.3 at the moment of failure, now it's the latest one - 9.4.4. pg_qualstats seems to be version 0.0.6

rdunklau commented 9 years ago

Instead of enabling it on the production server, could you send us the whole database schema, as well as logs from the instance when it crashed ?

What are your configuration options for pg_qualstats ?

rjuju commented 8 years ago

Hello,

Do you have any update on this issue ?

lazystone commented 8 years ago

Hello, No. I haven't gotten a chance to test it again. I guess you can close the issue. If something comes up, then I'll create another one.

rjuju commented 8 years ago

Ok, thanks for the answer!

visualphoenix commented 8 years ago

Just coming here to add a "me too."

I'm getting segmentation faults using PoWA. I havent managed to isolate it yet specifically to pg_qualstats yet.

powa-archivist 3.0.1 pg_track_settings 1.0.0 pg_stat_kcache at git d7734fb8ce60f8d8ce3fa9efae627c9c730c9488 pg_qualstats at git 1ab0e68398376d8518b4cd6bfa4eae418762a7d2

visualphoenix commented 8 years ago

Additionally - sadly I can not provide the database schema or data.

rjuju commented 8 years ago

You should be able to see in postgrsql logs the query that triggered this segfault. Can you share it?

visualphoenix commented 8 years ago

Give me some time to spin up a nonprod db.

Do you have any preference for Postgres log config to aid this debugging? I have most logging off.

On Thursday, February 11, 2016, Julien Rouhaud notifications@github.com wrote:

Reopened #52 https://github.com/dalibo/powa/issues/52.

— Reply to this email directly or view it on GitHub https://github.com/dalibo/powa/issues/52#event-546842290.

Raymond "VisualPhoenix" Barbiero

rjuju commented 8 years ago

The only relevant info than you'll have in the log is the query that caused the segmentation fault. We only need this and a definition of the relations involved (and of course ideally sample data to reproduce this issue).

If you need privacy, you can send any data to powa(AT)dalibo.com.

perdiesman commented 6 years ago

I know this is an old thread, but I just installed powa on Enterprise DB 9.6 and I am having the same issue. I have narrowed it down to when I am joining to a particular foreign table.

2018-05-03 09:43:37 EDT LOG: server process (PID 9102) was terminated by signal 11: Segmentation fault 2018-05-03 09:43:37 EDT DETAIL: Failed process was running: select ...

I tried recreating the table as a local table and querying against that and it works fine, just not as a foreign table. Are there any limitations on foreign tables with powa?

rjuju commented 6 years ago

@mbroxson which version of pg_qualstats are you using? I'm not aware of any issue with foreign tables. A quick local test doesn't cause any segfault using a foreign table. Can you provide a way to reproduce it?

perdiesman commented 6 years ago

I'm using pg_qualstats 1.0.4. I'm trying to get a way to reproduce it, but it only seems to be happening with this one foreign table so trying to replicate is proving difficult. I'll post something as soon as I can reproduce reliably

rjuju commented 6 years ago

that's indeed strange. Maybe setting pg_qualstats.sample_rate = 1 could help to reproduce?

perdiesman commented 6 years ago

finally got a schema setup that I can send you to reproduce. I have two databases set up on the same server and used a fdw to link the two.

First database can be anything, run powa_test.sql and it will load a powa_test schema with tables that match the layout of our database. Second database I called powa_test_foreign and set a postgres_fdw server up as local_powa_test_foreign, then you can run the powa_test_foreign.sql

The query that I am trying to run is in powa_test_query.sql and fails even without data in these tables.

powa_test.tar.gz

rjuju commented 6 years ago

Thanks for the test case, I can reproduce the issue. The problematic query can be reduced to

select 1
from powa_test.sdr t
join powa_test.p on p.int_id = t.int_1
join powa_test.iui on iui.varchar_1 = p.varchar_17;

I'm looking at it.

rjuju commented 5 years ago

I'm very I took me so long to come back to this. I just pushed https://github.com/powa-team/pg_qualstats/commit/6fb70ee68bd8bab98a102927b10c29f3a46e06e5 which should fix the issue.

Thanks a lot to @lazystone @visualphoenix and @mbroxson to report the issue and @mbroxson for the reproducer. It was super helpful.

Sorry again for the time needed to fix this.