powa-team / powa-collector

PoWA Collector daemon for remote snapshot
http://powa.readthedocs.io/
PostgreSQL License
7 stars 4 forks source link

Issue with powa-collector 4.2 with powa < 4.2 #18

Closed frost242 closed 1 year ago

frost242 commented 1 year ago

Hello,

I have upgraded my whole PoWA stack and noticed that powa-collector is not able to handle a missing toplevel column when the distant powa version is < 4.1.

It seems the collector generates a list of column that should be present in the COPY data.

2023-11-09 08:25:39,958 host1:5432 WARNING: Error while inserting data:
missing data for column "toplevel"
CONTEXT:  COPY powa_statements_src_tmp, line 1: "7      2023-11-09 08:25:39.356709+01   17761   17766   6805081989492591965     SELECT..."

Also, column toplevel in table powa_statements_src_tmp has a NOT NULL constraint. Removing this constraint does not help to fix the issue btw, but this will probably make the process to fail in a further step.

Sorry, I can't help here, I'm not Python fluent.

Kind regards

frost242 commented 1 year ago

Also, to fix this, I'll update PoWA on my servers. I should have done that before upgrading the repo. Don't know if the issue is easily fixable.

rjuju commented 1 year ago

Hi,

powa-collector doesn't have any knowledge of the COPY command executed. Your problem here is that you have two different major version of powa on your remote server and your repository server.

The whole architecture is based on the powa-archivist extension taking care of all the backward compatibility issues providing a consistent representation of the data. In this case it means that it generates a constant "true" value for the toplevel field if you have an older pg_stat_statements version installed on your server. It makes everything simpler, as the snapshot function don't have to deal with it, the UI doesn't have to deal with it and so on. The cost of that approach is to have identical major powa-archivist (and powa-web) version, here 4.2.x everywhere.

I however realize now that powa-collector could actually detect such an inconsistency and raise an error rather than consistently failing on queries that it can predict will fail.

frost242 commented 1 year ago

Thanks a lot for your reply and confirmation that I need to update powa-archivist on my servers. That make 340 instances to update, it's OK ! ;)

frost242 commented 1 year ago

So, all powa-archivist extensions were updated to 4.2.0. But now I came to another issue with the collector :

2023-11-09 09:47:15,815 hostname:5432 DEBUG : Calling public.powa_statements_src(0)...
2023-11-09 09:47:15,839 hostname:5432 WARNING: Error while inserting data:
invalid input syntax for type bigint: "t"
CONTEXT:  COPY powa_statements_src_tmp, line 1, column queryid: "t"

In the repository DB log :

2023-11-09 09:47:15 CET pid=2545344 : 172.23.101.3(35058) - powa CONTEXT:  COPY powa_statements_src_tmp, line 1, column queryid: "t"
2023-11-09 09:47:15 CET pid=2545344 : 172.23.101.3(35058) - powa STATEMENT:  COPY powa_statements_src_tmp FROM stdin

To my surprise, this happens alos on PostgreSQL 15 instances.

Function powa_statements_src outputs the toplevel column before queryid, and in powa_statements_src_tmp, toplevel is the last column of the table. Isn't there a mismatch in the column order between the data output function and the integration in the powa repo ? See : https://github.com/powa-team/powa-archivist/blob/b5835fe0e5cc122b80309c0d0bcdd07e55621838/powa--4.2.0.sql#L2419C1-L2419C1 And : https://github.com/powa-team/powa-archivist/blob/b5835fe0e5cc122b80309c0d0bcdd07e55621838/powa--4.2.0.sql#L756

frost242 commented 1 year ago

Just did a quick hack on the repository server and recreated the extension (yep, history is lost, but it's a non-production system, it's OK). The toplevel was physically moved at the same place as in pg_stat_statements :

root@powahp1:~# diff /usr/share/postgresql/16/extension/powa--4.2.0.sql  ~postgres/powa--4.2.0.sql
761d760
<     toplevel boolean NOT NULL,
783c782,783
<     wal_bytes numeric NOT NULL
---
>     wal_bytes numeric NOT NULL,
>     toplevel boolean NOT NULL

Then a quick test with one remote instance works OK :

2023-11-09 10:39:22,895 hostname:5432 DEBUG : Connecting on repository...
2023-11-09 10:39:22,910 hostname:5432 DEBUG : Connected.
2023-11-09 10:39:22,915 hostname:5432 DEBUG : Working on module pg_stat_kcache
2023-11-09 10:39:22,915 hostname:5432 DEBUG : Calling public.powa_kcache_src(0)...
2023-11-09 10:39:22,921 hostname:5432 DEBUG : Working on module pg_stat_statements
2023-11-09 10:39:22,921 hostname:5432 DEBUG : Calling public.powa_databases_src(0)...
2023-11-09 10:39:22,923 hostname:5432 DEBUG : Working on module powa_stat_user_functions
2023-11-09 10:39:22,923 hostname:5432 DEBUG : Calling public.powa_user_functions_src(0)...
2023-11-09 10:39:22,924 hostname:5432 DEBUG : Working on module pg_stat_bgwriter
2023-11-09 10:39:22,924 hostname:5432 DEBUG : Calling public.powa_stat_bgwriter_src(0)...
2023-11-09 10:39:22,925 hostname:5432 DEBUG : Working on module pg_qualstats
2023-11-09 10:39:22,925 hostname:5432 DEBUG : Calling public.powa_qualstats_src(0)...
2023-11-09 10:39:22,932 hostname:5432 DEBUG : Calling SELECT pg_qualstats_reset()...
2023-11-09 10:39:22,933 hostname:5432 DEBUG : Working on module powa_stat_all_relations
2023-11-09 10:39:22,933 hostname:5432 DEBUG : Calling public.powa_all_relations_src(0)...
2023-11-09 10:39:22,935 hostname:5432 DEBUG : Working on module pg_stat_statements
2023-11-09 10:39:22,935 hostname:5432 DEBUG : Calling public.powa_statements_src(0)...
2023-11-09 10:39:22,949 hostname:5432 DEBUG : Calling powa_take_snapshot(1)...
2023-11-09 10:39:22,977 hostname:5432 DEBUG : Committing transaction

The UI seems to behave correctly, but I didn't do an extended test.

frost242 commented 1 year ago

New issue opened in powa-archivist. I close this one. Thanks for the help on this issue !