powa-team / powa

PostgreSQL Workload Analyzer
http://powa.readthedocs.io/
PostgreSQL License
768 stars 57 forks source link

WARNING caused by execution_count > 10^19 (or negative!) #101

Closed Krysztophe closed 5 years ago

Krysztophe commented 6 years ago

(@rjuju: we've already discussed this 2 weeks ago but I went nowhere)

Powa : 3.1.1 pg_qualstats: 1.0.2 PG 9.6.5 on x86_64

Only powa_statements contains this 5824-parameters query for the queryid:

queryid | 4254500497
query   | select anonymized as y0_ from ... this_ left outer join ... informatio1_ on this_.intirc_id=informatio1_.intirc_id where ( anonymized in ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, 
...$999, $1000) 
or this_.intint_code_vision_apporteur in ($1001, ....      ...and so on...      $5823)) 
and informatio1_.anonymzed2 like $5824
rjuju commented 6 years ago

I don't see obvious bug in pg_qualstats about execution_count counter. It's internally an uint64, so yes its representation as a bigint can be negative, and no check is done to make sure there's no overflow safeguard.

However, powa does reset pg_qualstats counters on each snapshot, so in order to get negative values in stored counters, you'd have to process more than 2^63 rows between two snapshots. so more than 3e16 row per second, which is quite unlikely :)

Can you confirm where you find negative execution count ? (pg_qualstats() output, or in powa_qualstats_constvalues_history_current or ?)

Does the 20 min interval show an unexpected high amount of activity in other tables ?

The easiest way to get though this (assuming you don't want to keep the data for debugging) would be to delete the apparently inconsistent rows. owever, since aggregation apparently didn't work since a month, doing this would probably result in a huge aggregation activity (the whole month) followed by a huge delete activity of the just aggregated data. If this is an option for you, it'd be way better to call powa_reset() and "loose" the current interval.

Krysztophe commented 6 years ago
rjuju commented 5 years ago

@Krysztophe after re reading this, I see that it's pg_qualstats 1.0.2. v1.0.3 includes https://github.com/powa-team/pg_qualstats/commit/e70233cbc035345fa49b4561215900aba78c1eba which is probably the source cause of this issue. I probably looked at current head instead of 1.0.2 when I checked this issue, and forgot about this fix.

I'd suggest to upgrade to 1.0.6, and do the required cleanup explained before, this should fix the problem.

There are too many open issues on the project, so I mark this one as closed, feel free to reopen it if the suggested fix doesn't work as expected.