powa-team / powa-archivist

powa-archivist: the powa PostgreSQL extension
http://powa.readthedocs.io/
PostgreSQL License
51 stars 20 forks source link

duplicate key value violates unique constraint "powa_statements_pkey" #47

Closed gjedeer closed 2 years ago

gjedeer commented 2 years ago

I've tried adding this server twice, and there's always this error. Remote setup, pg14 on Powa server and the monitored server, latest Powa from pgdg.

The Powa server is already monitoring 4 other servers so I think it's set up correctly.

Error in GUI:

    db3.cwatch.io: powa_take_snapshot(13): function "powa_statements_snapshot" failed: duplicate key value violates unique constraint "powa_statements_pkey"

Error when I try to execute this function from CLI:

powa=# select powa_take_snapshot(13);
WARNING:  powa_take_snapshot(13): function "powa_statements_snapshot" failed:
              state  : 23505
              message: duplicate key value violates unique constraint "powa_statements_pkey"
              detail : Key (srvid, queryid, dbid, userid)=(13, 3012963866342342194, 16605, 364028942) already exists.
              hint   : 
              context: SQL statement "WITH capture AS(
        SELECT *
        FROM powa_statements_src(_srvid)
    ),
    mru as (UPDATE powa_statements set last_present_ts = now()
            FROM capture
            WHERE powa_statements.queryid = capture.queryid
              AND powa_statements.dbid = capture.dbid
              AND powa_statements.userid = capture.userid
              AND powa_statements.srvid = _srvid
    ),
    missing_statements AS(
        INSERT INTO public.powa_statements (srvid, queryid, dbid, userid, query)
            SELECT DISTINCT _srvid, queryid, dbid, userid, query
            FROM capture c
            WHERE NOT EXISTS (SELECT 1
                              FROM powa_statements ps
                              WHERE ps.queryid = c.queryid
                              AND ps.dbid = c.dbid
                              AND ps.userid = c.userid
                              AND ps.srvid = _srvid
            )
    ),

    by_query AS (
        INSERT INTO public.powa_statements_history_current
            SELECT _srvid, queryid, dbid, userid,
            ROW(
                ts, calls, total_exec_time, rows,
                shared_blks_hit, shared_blks_read, shared_blks_dirtied,
                shared_blks_written, local_blks_hit, local_blks_read,
                local_blks_dirtied, local_blks_written, temp_blks_read,
                temp_blks_written, blk_read_time, blk_write_time,
                plans, total_plan_time,
                wal_records, wal_fpi, wal_bytes
            )::powa_statements_history_record AS record
            FROM capture
    ),

    by_database AS (
        INSERT INTO public.powa_statements_history_current_db
            SELECT _srvid, dbid,
            ROW(
                ts, sum(calls),
                sum(total_exec_time), sum(rows), sum(shared_blks_hit),
                sum(shared_blks_read), sum(shared_blks_dirtied),
                sum(shared_blks_written), sum(local_blks_hit),
                sum(local_blks_read), sum(local_blks_dirtied),
                sum(local_blks_written), sum(temp_blks_read),
                sum(temp_blks_written), sum(blk_read_time), sum(blk_write_time),
                sum(plans), sum(total_plan_time),
                sum(wal_records), sum(wal_fpi), sum(wal_bytes)
            )::powa_statements_history_record AS record
            FROM capture
            GROUP BY dbid, ts
    )

    SELECT count(*)                     FROM capture"
PL/pgSQL function powa_statements_snapshot(integer) line 15 at SQL statement
SQL statement "SELECT powa_statements_snapshot(13)"
PL/pgSQL function powa_take_snapshot(integer) line 63 at EXECUTE
 powa_take_snapshot 
--------------------
                  1
(1 row)

The statements snapshot doesn't have any entries for this server:

powa=# select * from powa_statements where srvid=13;
 srvid | queryid | dbid | userid | query | last_present_ts 
-------+---------+------+--------+-------+-----------------
(0 rows)
rjuju commented 2 years ago

Hi,

I already saw similar report but was never able to reproduce it locally or find the root problem unfortunately.

What does this query return?

SELECT DISTINCT srvid, queryid, dbid, userid, query
FROM powa_statements_src(13)
WHERE srvid = 13
AND queryid = 3012963866342342194
AND dbid = 16605
AND userid = 364028942;
gjedeer commented 2 years ago

@rjuju thank you for the quick reply!

powa=# SELECT DISTINCT srvid, queryid, dbid, userid, query
powa-# FROM powa_statements_src(13)
powa-# WHERE srvid = 13
powa-# AND queryid = 3012963866342342194
powa-# AND dbid = 16605
powa-# AND userid = 364028942;
ERROR:  column "srvid" does not exist
LINE 1: SELECT DISTINCT srvid, queryid, dbid, userid, query
                        ^
powa=# 
gjedeer commented 2 years ago

And a list of columns, if it helps:

powa=# select * from powa_statements_src(13);
              ts               |  userid   |   dbid    |       queryid        |                                                                                                                                                                                                 

                                                                                                                                                                                                                 query                                                          

                                                                        | calls |    total_exec_time    | rows  | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written | local_blks_hit | local_blks_read | local_blks_dirtied | local_blks_written | 
temp_blks_read | temp_blks_written |    blk_read_time    |   blk_write_time   | plans | total_plan_time | wal_records | wal_fpi | wal_bytes 
rjuju commented 2 years ago

Ah yes sorry that column isn't return, so just

SELECT DISTINCT queryid, dbid, userid, query
FROM powa_statements_src(13)
WHERE queryid = 3012963866342342194
AND dbid = 16605
AND userid = 364028942;
gjedeer commented 2 years ago
powa=# SELECT DISTINCT queryid, dbid, userid, query
powa-# FROM powa_statements_src(13)
powa-# WHERE queryid = 3012963866342342194
powa-# AND dbid = 16605
powa-# AND userid = 364028942;
       queryid       | dbid  |  userid   |           query            
---------------------+-------+-----------+----------------------------
 3012963866342342194 | 16605 | 364028942 | select current_setting($1)
 3012963866342342194 | 16605 | 364028942 | SELECT current_setting($1)
(2 rows)

...sooo, case sensitivity maybe?

rjuju commented 2 years ago

oh wow ok, that explains everything. Apparently this queryid is unpopular enough so it's been discarded, but then reexecuted again and collected again by pg_stat_statements, but with a different case as the client issued an equivalent but not identical statement. That's indeed possible, although not very likely.

You could run this on your powa database to fix the situation:

DELETE FROM powa_statements_src_tmp
WHERE srvid = 13 AND dbid = 16605 AND queryid = 3012963866342342194 AND userid = 364028942
AND query = 'select current_setting($1)';

ie. remove one of the versions. The rest should work without problem (unless the same problem exist for another query). I'll fix the extension to make sure that only one version is retained in that case. Dropping one is not a problem as the queries should be equivalent (unless you're really unlucky and have a real hash collision happening when the entry is discarded).

gjedeer commented 2 years ago

Thank you! It worked, there were no more instances of that problem. And if it happens again, I know how to solve it until the next release.

16 rows deleted, BTW.

rjuju commented 2 years ago

Great news! Yes this problem is quite unlikely to happen (which is why I couldn't have it fixed until now) so hopefully you won't face it again.

Also, I just pushed a fix for that, so you could hot-patch the powa_statements_snapshot() function on the powa repository if needed: https://github.com/powa-team/powa-archivist/commit/f636868ee0148579ad441556d5c77f1643725a0f#diff-a18be2bbc79c829a7f673d5cfe0044c4e5dd248661e575edc20ecedca0499cd6R2539-R2548 instead.

I will publish a new release soon, thanks a lot for the report and the help to diagnose the problem!

banlex73 commented 2 years ago

had the same issue and in my case, same queryID but slightly different queries: SELECT 1 FROM ONLY "public"."persons" x and SELECT $2 FROM ONLY "public"."persons" x

PS: fixed you mentioned works like a charm, merci @rjuju

rjuju commented 2 years ago

@banlex73 thanks a lot for the confirmation!

I will take care of the release this weekend.

banlex73 commented 2 years ago

de rien

сб, 12 лют. 2022 р. о 00:17 Julien Rouhaud @.***> пише:

@banlex73 https://github.com/banlex73 thanks a lot for the confirmation!

I will take care of the release this weekend.

— Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-archivist/issues/47#issuecomment-1037059587, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYDOWVJQFHVEHCFO5SLU2YJQPANCNFSM5LV33UYA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

rjuju commented 2 years ago

I just released version 4.1.3!