powa-team / powa-archivist

powa-archivist: the powa PostgreSQL extension
http://powa.readthedocs.io/
PostgreSQL License
56 stars 20 forks source link

Problem with remote collection since toplevel was added #57

Closed frost242 closed 12 months ago

frost242 commented 1 year ago
          So, all powa-archivist extensions were updated to 4.2.0. But now I came to another issue with the collector :
2023-11-09 09:47:15,815 hostname:5432 DEBUG : Calling public.powa_statements_src(0)...
2023-11-09 09:47:15,839 hostname:5432 WARNING: Error while inserting data:
invalid input syntax for type bigint: "t"
CONTEXT:  COPY powa_statements_src_tmp, line 1, column queryid: "t"

In the repository DB log :

2023-11-09 09:47:15 CET pid=2545344 : 172.23.101.3(35058) - powa CONTEXT:  COPY powa_statements_src_tmp, line 1, column queryid: "t"
2023-11-09 09:47:15 CET pid=2545344 : 172.23.101.3(35058) - powa STATEMENT:  COPY powa_statements_src_tmp FROM stdin

To my surprise, this happens alos on PostgreSQL 15 instances.

Function powa_statements_src outputs the toplevel column before queryid, and in powa_statements_src_tmp, toplevel is the last column of the table. Isn't there a mismatch in the column order between the data output function and the integration in the powa repo ? See : https://github.com/powa-team/powa-archivist/blob/b5835fe0e5cc122b80309c0d0bcdd07e55621838/powa--4.2.0.sql#L2419C1-L2419C1 And : https://github.com/powa-team/powa-archivist/blob/b5835fe0e5cc122b80309c0d0bcdd07e55621838/powa--4.2.0.sql#L756

Originally posted by @frost242 in https://github.com/powa-team/powa-collector/issues/18#issuecomment-1803445516

frost242 commented 1 year ago

Just did a quick hack on the repository server and recreated the extension (yep, history is lost, but it's a non-production system, it's OK). The toplevel was physically moved at the same place as in pg_stat_statements :

root@powahp1:~# diff /usr/share/postgresql/16/extension/powa--4.2.0.sql  ~postgres/powa--4.2.0.sql
761d760
<     toplevel boolean NOT NULL,
783c782,783
<     wal_bytes numeric NOT NULL
---
>     wal_bytes numeric NOT NULL,
>     toplevel boolean NOT NULL

Then a quick test with one remote instance works OK :

2023-11-09 10:39:22,895 hostname:5432 DEBUG : Connecting on repository...
2023-11-09 10:39:22,910 hostname:5432 DEBUG : Connected.
2023-11-09 10:39:22,915 hostname:5432 DEBUG : Working on module pg_stat_kcache
2023-11-09 10:39:22,915 hostname:5432 DEBUG : Calling public.powa_kcache_src(0)...
2023-11-09 10:39:22,921 hostname:5432 DEBUG : Working on module pg_stat_statements
2023-11-09 10:39:22,921 hostname:5432 DEBUG : Calling public.powa_databases_src(0)...
2023-11-09 10:39:22,923 hostname:5432 DEBUG : Working on module powa_stat_user_functions
2023-11-09 10:39:22,923 hostname:5432 DEBUG : Calling public.powa_user_functions_src(0)...
2023-11-09 10:39:22,924 hostname:5432 DEBUG : Working on module pg_stat_bgwriter
2023-11-09 10:39:22,924 hostname:5432 DEBUG : Calling public.powa_stat_bgwriter_src(0)...
2023-11-09 10:39:22,925 hostname:5432 DEBUG : Working on module pg_qualstats
2023-11-09 10:39:22,925 hostname:5432 DEBUG : Calling public.powa_qualstats_src(0)...
2023-11-09 10:39:22,932 hostname:5432 DEBUG : Calling SELECT pg_qualstats_reset()...
2023-11-09 10:39:22,933 hostname:5432 DEBUG : Working on module powa_stat_all_relations
2023-11-09 10:39:22,933 hostname:5432 DEBUG : Calling public.powa_all_relations_src(0)...
2023-11-09 10:39:22,935 hostname:5432 DEBUG : Working on module pg_stat_statements
2023-11-09 10:39:22,935 hostname:5432 DEBUG : Calling public.powa_statements_src(0)...
2023-11-09 10:39:22,949 hostname:5432 DEBUG : Calling powa_take_snapshot(1)...
2023-11-09 10:39:22,977 hostname:5432 DEBUG : Committing transaction

The UI seems to behave correctly, but I didn't do an extended test.

Originally posted by @frost242 in https://github.com/powa-team/powa-collector/issues/18#issuecomment-1803487443

rjuju commented 1 year ago

You're right, there's another problem here :(. Unfortunately I missed it as I'm mostly working on the v5 for now, where this problem has been fixed in some other refactoring.

frost242 commented 1 year ago

Do you need me to provide a PR to update the extension to version 4.2.2 ?

rjuju commented 1 year ago

I don't think that your approach is possible unfortunately. The new version should be both installable and upgradable, and should give the same result each time (I have a tool to automatically check it).  It means that new column has to be added at the end of the table. I don't how I missed it, I'm pretty sure that I tested a full remote setup with the 4.2.x version.

I will work on that, and do more thorough testing to avoid yet another big bug like that on the 4.2.x series.

frost242 commented 1 year ago

Thanks. Please let me know here, as I've hacked the extension to use the correct column order and I want to put the extension in good shape for the next upgrade.

rjuju commented 1 year ago

I just pushed d61e829b7af37b234766585e3a2db615c97870cb which should fix the problem and work for both installed and upgrade extensions. Do you have some non-production system where you can double check it?

JosefMachytkaNetApp commented 12 months ago

One of our clients tested that commit because they have the same problem after upgrading to 4.2.0, but they have got an error:

ERROR:  syntax error at or near "timestamp"
LINE 2:     OUT ts timestamp with time zone,

We will try to reproduce it on our lab to give you more info.

rjuju commented 12 months ago

Thanks for the report @JosefMachytkaNetApp . I couldn't reproduce that locally, if you have more information on what commands were used, and which version of powa-archivist, it would be great!

jw1u1 commented 12 months ago

@rjuju

CREATE OR REPLACE FUNCTION powa_statements_src(IN _srvid integer,
    OUT ts timestamp with time zone,
    OUT userid oid,
    OUT dbid oid,
    OUT queryid bigint,
    OUT query text,
    OUT calls bigint,
    OUT total_exec_time double precision,
    OUT rows bigint,
    OUT shared_blks_hit bigint,
    OUT shared_blks_read bigint,
    OUT shared_blks_dirtied bigint,
    OUT shared_blks_written bigint,
    OUT local_blks_hit bigint,
    OUT local_blks_read bigint,
    OUT local_blks_dirtied bigint,
    OUT local_blks_written bigint,
    OUT temp_blks_read bigint,
    OUT temp_blks_written bigint,
    OUT blk_read_time double precision,
    OUT blk_write_time double precision,
    OUT plans bigint,
    OUT total_plan_time float8,
    OUT wal_records bigint,
    OUT wal_fpi bigint,
    OUT wal_bytes numeric,
    OUT toplevel boolean
)
RETURNS SETOF record
STABLE
AS $PROC$
DECLARE
    v_pgss integer[];
BEGIN
    IF (_srvid = 0) THEN
        SELECT regexp_split_to_array(extversion, E'\\.') INTO STRICT v_pgss
        FROM pg_extension
        WHERE extname = 'pg_stat_statements';

        IF (v_pgss[1] = 1 AND v_pgss[2] >= 10) THEN
            RETURN QUERY SELECT now(),
                pgss.userid, pgss.dbid, pgss.queryid, pgss.query,
                pgss.calls, pgss.total_exec_time,
                pgss.rows, pgss.shared_blks_hit,
                pgss.shared_blks_read, pgss.shared_blks_dirtied,
                pgss.shared_blks_written, pgss.local_blks_hit,
                pgss.local_blks_read, pgss.local_blks_dirtied,
                pgss.local_blks_written, pgss.temp_blks_read,
                pgss.temp_blks_written, pgss.blk_read_time, pgss.blk_write_time,
                pgss.plans, pgss.total_plan_time,
                pgss.wal_records, pgss.wal_fpi, pgss.wal_bytes, pgss.toplevel
            FROM pg_stat_statements pgss
            JOIN pg_database d ON d.oid = pgss.dbid
            JOIN pg_roles r ON pgss.userid = r.oid
            WHERE pgss.query !~* '^[[:space:]]*(DEALLOCATE|BEGIN|PREPARE TRANSACTION|COMMIT PREPARED|ROLLBACK PREPARED)'
            AND NOT (r.rolname = ANY (string_to_array(
                        powa_get_guc('powa.ignored_users', ''),
                        ',')));
        ELSIF (v_pgss[1] = 1 AND v_pgss[2] >= 8) THEN
            RETURN QUERY SELECT now(),
                pgss.userid, pgss.dbid, pgss.queryid, pgss.query,
                pgss.calls, pgss.total_exec_time,
                pgss.rows, pgss.shared_blks_hit,
                pgss.shared_blks_read, pgss.shared_blks_dirtied,
                pgss.shared_blks_written, pgss.local_blks_hit,
                pgss.local_blks_read, pgss.local_blks_dirtied,
                pgss.local_blks_written, pgss.temp_blks_read,
                pgss.temp_blks_written, pgss.blk_read_time, pgss.blk_write_time,
                pgss.plans, pgss.total_plan_time,
                pgss.wal_records, pgss.wal_fpi, pgss.wal_bytes, true::boolean
            FROM pg_stat_statements pgss
            JOIN pg_database d ON d.oid = pgss.dbid
            JOIN pg_roles r ON pgss.userid = r.oid
            WHERE pgss.query !~* '^[[:space:]]*(DEALLOCATE|BEGIN|PREPARE TRANSACTION|COMMIT PREPARED|ROLLBACK PREPARED)'
            AND NOT (r.rolname = ANY (string_to_array(
                        powa_get_guc('powa.ignored_users', ''),
                        ',')));
        ELSE
            RETURN QUERY SELECT now(),
                pgss.userid, pgss.dbid, pgss.queryid, pgss.query,
                pgss.calls, pgss.total_time,
                pgss.rows, pgss.shared_blks_hit,
                pgss.shared_blks_read, pgss.shared_blks_dirtied,
                pgss.shared_blks_written, pgss.local_blks_hit,
                pgss.local_blks_read, pgss.local_blks_dirtied,
                pgss.local_blks_written, pgss.temp_blks_read,
                pgss.temp_blks_written, pgss.blk_read_time,pgss.blk_write_time,
                0::bigint, 0::double precision,
                0::bigint, 0::bigint, 0::numeric, true::boolean

            FROM pg_stat_statements pgss
            JOIN pg_database d ON d.oid = pgss.dbid
            JOIN pg_roles r ON pgss.userid = r.oid
            WHERE pgss.query !~* '^[[:space:]]*(DEALLOCATE|BEGIN|PREPARE TRANSACTION|COMMIT PREPARED|ROLLBACK PREPARED)'
            AND NOT (r.rolname = ANY (string_to_array(
                        powa_get_guc('powa.ignored_users', ''),
                        ',')));
        END IF;
    ELSE
        RETURN QUERY SELECT pgss.ts,
            pgss.userid, pgss.dbid, pgss.queryid, pgss.query,
            pgss.calls, pgss.total_exec_time,
            pgss.rows, pgss.shared_blks_hit,
            pgss.shared_blks_read, pgss.shared_blks_dirtied,
            pgss.shared_blks_written, pgss.local_blks_hit,
            pgss.local_blks_read, pgss.local_blks_dirtied,
            pgss.local_blks_written, pgss.temp_blks_read,
            pgss.temp_blks_written, pgss.blk_read_time, pgss.blk_write_time,
            pgss.plans, pgss.total_plan_time,
            pgss.wal_records, pgss.wal_fpi, pgss.wal_bytes, pgss.toplevel
        FROM powa_statements_src_tmp pgss WHERE srvid = _srvid;
    END IF;
END;
$PROC$ LANGUAGE plpgsql; 

paste it in psql or load it with psql -f returns 114: ERROR: syntax error at or near "timestamp" LINE 2:  OUT ts timestamp with time zone,

jw1u1 commented 12 months ago

Version is: powa_15.x86_64 4.1.4-1.rhel7

rjuju commented 12 months ago

This works fine locally:

rjuju=# CREATE OR REPLACE FUNCTION powa_statements_src(IN _srvid integer,
    OUT ts timestamp with time zone,
    OUT userid oid,
    OUT dbid oid,
    OUT queryid bigint,
    OUT query text,
    OUT calls bigint,
    OUT total_exec_time double precision,
    OUT rows bigint,
    OUT shared_blks_hit bigint,
    OUT shared_blks_read bigint,
    OUT shared_blks_dirtied bigint,
    OUT shared_blks_written bigint,
    OUT local_blks_hit bigint,
    OUT local_blks_read bigint,
    OUT local_blks_dirtied bigint,
    OUT local_blks_written bigint,
    OUT temp_blks_read bigint,
    OUT temp_blks_written bigint,
    OUT blk_read_time double precision,
    OUT blk_write_time double precision,
    OUT plans bigint,
    OUT total_plan_time float8,
    OUT wal_records bigint,
    OUT wal_fpi bigint,
    OUT wal_bytes numeric,
    OUT toplevel boolean
)
RETURNS SETOF record
STABLE
AS $PROC$
DECLARE
    v_pgss integer[];
BEGIN
    IF (_srvid = 0) THEN
        SELECT regexp_split_to_array(extversion, E'\\.') INTO STRICT v_pgss
        FROM pg_extension
        WHERE extname = 'pg_stat_statements';

        IF (v_pgss[1] = 1 AND v_pgss[2] >= 10) THEN
            RETURN QUERY SELECT now(),
                pgss.userid, pgss.dbid, pgss.queryid, pgss.query,
                pgss.calls, pgss.total_exec_time,
                pgss.rows, pgss.shared_blks_hit,
                pgss.shared_blks_read, pgss.shared_blks_dirtied,
                pgss.shared_blks_written, pgss.local_blks_hit,
                pgss.local_blks_read, pgss.local_blks_dirtied,
                pgss.local_blks_written, pgss.temp_blks_read,
                pgss.temp_blks_written, pgss.blk_read_time, pgss.blk_write_time,
                pgss.plans, pgss.total_plan_time,
                pgss.wal_records, pgss.wal_fpi, pgss.wal_bytes, pgss.toplevel
$PROC$ LANGUAGE plpgsql;;nts_src_tmp pgss WHERE srvid = _srvid;s.toplevelme,NSACTION|COMMIT PREPARED|ROLLBACK PREPARED)'
CREATE FUNCTION

psql displays a bit of a mess, but in any case it doesnet complain with the function prototype.

rjuju commented 12 months ago

I noticed that your error message says:

LINE 2:  OUT ts timestamp with time zone,

with a weird character before the OUT. Maybe you have a non-breakable space or something like that that leads to the problem?

marco44 commented 12 months ago

I've been having the same problem, and tested the 4.2.2 git version. It solves the problem. Note: I only did a create extension, not an update, so I didn't test the update path

jw1u1 commented 12 months ago

indeed there were non-printable characters.

rjuju commented 12 months ago

Thanks @marco44 and @jw1u1 for testing!

I just released the version 4.2.2 with that fix. This will hopefully be the last problem on the 4.2.x versions.