powa-team / powa

PostgreSQL Workload Analyzer
http://powa.readthedocs.io/
PostgreSQL License
768 stars 57 forks source link

PoWA4 #118

Closed Ikrar-k closed 3 years ago

Ikrar-k commented 5 years ago

I am getting this error on PoWA 4. Also the TAB "Background Writer" is always refreshing, May be due to below missing table.

2019-08-01 08:08:53.115 EDT [19746] CONTEXT: PL/pgSQL function powa_take_snapshot(integer) line 73 at RAISE 2019-08-01 08:22:20.239 EDT [23031] ERROR: relation "powa_stat_bgwriter_history" does not exist at character 3329

Also Hypopg and powa in configuration showing errored(crossed). powa4

rjuju commented 5 years ago

Hello,

The required table has been added in powa-archivist 3 or 4 days ago. Did you do a full install using the last source? If yes, a DROP/CREATE EXTENSION powa could should fix your problem.

Hypopg and powa aren't stat extension, so they're indeed not sampled, so the output is intended, although quite unfriendly :(

Ikrar-k commented 5 years ago

Hi Julien,

It worked. Thank you!!

Ikrar-k commented 5 years ago

I am not seeing any issue on local machine after recreating the PoWA extension, But it not able to create a snapshot on remote node with the below error:

powa_take_snapshot(2): function "powa_stat_bgwriter_snapshot" failed: relation "powa_stat_bgwriter" does not exist

I have checked and this table does not exist on any of the servers.

rjuju commented 5 years ago

Oh right, there was an error in the source function, thanks a lot for spotting it!

This should be fixed inhttps://github.com/powa-team/powa-archivist/commit/18a94a07108762b89a3557378a0241ec381beb06. This requires an install / drop extension / create extension on your repository server, of manually fixing the powa_stat_bgwriter_src function

Ikrar-k commented 5 years ago

Hi Julien,

I am not seeing any issue with the snapshot now, But powa-web is blank for background writer.

powa=# SELECT powa_stat_bgwriter_snapshot(2); powa_stat_bgwriter_snapshot

(1 row)

powa=#

Background Writer

rjuju commented 5 years ago

Le ven. 2 août 2019 à 08:08, Ikrar-k notifications@github.com a écrit :

Hi Julien,

I am not seeing any issue with the snapshot now,

good news!

But powa-web is blank for background writer. powa=# SELECT powa_stat_bgwriter_snapshot(2); powa_stat_bgwriter_snapshot

(1 row)

powa=#

[image: Background Writer] https://user-images.githubusercontent.com/51739691/62348241-181f6a80-b51a-11e9-9b21-e9d9f7236a4d.PNG

that's probably intended, the UI works by computing delta from multiple snapshots, so you need at least two snapshots to display a point, and three to start drawing lines.

https://user-images.githubusercontent.com/51739691/62348241-181f6a80-b51a-11e9-9b21-e9d9f7236a4d.PNG

Ikrar-k commented 5 years ago

Thanks for the above explanation.

Now i am started seeing one more error related to pg_track_settings.

select powa_take_snapshot(4); WARNING: powa_take_snapshot(4): function "pg_track_settings_snapshot_settings" failed: state : 23502 message: null value in column "ts" violates not-null constraint detail : Failing row contains (4, null, allow_system_table_mods, null, t, null). hint : context: SQL statement "WITH src AS ( SELECT * FROM public.pg_track_settings_settings_src(_srvid) ), dropped AS ( SELECT l.srvid, l.name FROM public.pg_track_settings_list l LEFT JOIN src s ON s.name = l.name WHERE l.srvid = _srvid AND s.name IS NULL ), mark_dropped AS ( INSERT INTO public.pg_track_settings_history (srvid, ts, name, setting, setting_pretty, is_dropped) SELECT srvid, _snap_ts, name, NULL, NULL, true FROM dropped ) DELETE FROM public.pg_track_settings_list l USING dropped d WHERE d.name = l.name AND d.srvid = l.srvid AND l.srvid = _srvid" PL/pgSQL function pg_track_settings_snapshot_settings(integer) line 16 at SQL statement SQL statement "SELECT pg_track_settings_snapshot_settings(4)" PL/pgSQL function powa_take_snapshot(integer) line 63 at EXECUTE powa_take_snapshot

              1
rjuju commented 5 years ago

Nice catch! This was I think an oversight in pg_track_settings, it should be fixed in https://github.com/rjuju/pg_track_settings/commit/bc0c061739c10100c7a1f3acb8f872a2a3e9b0db. In your case, install / drop extension / create extension on the repository server only should be enough to fix this issue.

Ikrar-k commented 5 years ago

HI Julien,

The extension activate is not working now.

powa=# SELECT powa_deactivate_extension(4, 'pg_track_settings'); powa_deactivate_extension

t (1 row)

powa=# SELECT powa_activate_extension(4, 'pg_track_settings'); ERROR: column "extname" does not exist LINE 5: AND extname = _extname ^ QUERY: UPDATE powa_functions SET enabled = true WHERE enabled = false AND srvid = _srvid AND extname = _extname CONTEXT: PL/pgSQL function powa_activate_extension(integer,text) line 15 at SQL statement powa=#

Ikrar-k commented 5 years ago

Also Block written and IO is always zero. Is there any other extension required to activate this.

image

rjuju commented 5 years ago

There was an issue for activating extensions that are already activated. This should be fixed as of https://github.com/powa-team/powa-archivist/commit/c71e7a9fa78028b635b934a6403fcd15f57fd65e. Thank again for the report :)

rjuju commented 5 years ago

IO figures are only available if you activate track_io_timing (see https://www.postgresql.org/docs/current/runtime-config-statistics.html#GUC-TRACK-IO-TIMING).

For block written, this is probably because none of the query that ran since you created powa extension flushed any buffer or allocated new blocks.

Ikrar-k commented 5 years ago

Thank you for the explanation.

Every time I recreate a powa extension after modification, I started seeing errors like below.

  1. powa_take_snapshot(1): function "powa_qualstats_snapshot" failed: insert or update on table "powa_qualstats_quals" violates foreign key constraint "powa_qualstats_quals_srvid_fkey"
  1. Also while purging the server is showing foreign constraint issue:

powa=# SELECT powa_delete_and_purge_server(1);

ERROR: update or delete on table "powa_servers" violates foreign key constraint "powa_wait_sampling_history_current_srvid_fkey" on table "powa_wait_sampling_history_current" DETAIL: Key (id)=(1) is still referenced from table "powa_wait_sampling_history_current". CONTEXT: SQL statement "DELETE FROM public.powa_servers WHERE id = _srvid" PL/pgSQL function powa_delete_and_purge_server(integer) line 9 at SQL statement

This can be resolved by deleting srvid manually from the below tables:

powa=# DELETE FROM public."powa_wait_sampling_history_current_db" WHERE srvid=1; DELETE 57 powa=# powa=# DELETE FROM public."powa_wait_sampling_history_current" WHERE srvid=1; DELETE 86 powa=# SELECT powa_delete_and_purge_server(1); powa_delete_and_purge_server

t (1 row)

  1. So if I purge the next id before adding a new remote server, resolve the first issue. (violates foreign key constraint "powa_qualstats_quals_srvid_fkey")
Ikrar-k commented 5 years ago

Also after setting _track_iotiming I started seeing the output for IO RW.

But even after multiple checkpoints between snapshots, The Blocks Written is always zero.

Ikrar-k commented 5 years ago

Sorry, Purging the next ID didn't resolve the issue. It's persistent.

rjuju commented 5 years ago

I fixed the issue for powa_delete_and_purge_server(), there was missing specifications on some FK constraints, thanks!

For the powa_qualstats_quals_srvid_fkey issue, I already saw the problem but I'm struggling to reproduce it. I'll continue to look at it.

Ikrar-k commented 5 years ago

HI Julien,

I reinstalled the PoWA4 and I am still seeing blocks written to zero, Even i have updated the data more than the shared buffers.

DB Writer issue

rjuju commented 5 years ago

I can't reproduce that locally. Could you check if pg_stat_statements report the same behavior?

You can for instance compare the results of this query to make that there are actually blocks detected as written by the queries:

select queryid, userid, dbid, shared_blks_written from pg_stat_statements where shared_blks_written > 0;

For the bgwriter, it seems that you don't perform enough write to trigger it.

Ikrar-k commented 5 years ago

Yeah sorry,

There was not enough data to write to database.

Ikrar-k commented 5 years ago

HI Julien,

I don't see any issue at the moment. This can be closed. Thanks for all the help.

rjuju commented 5 years ago

Great, thanks a lot for the spotted problems! Feel free to open other issues if needed.

ghost commented 4 years ago

Hello i am having the following problem

alias: powa_take_snapshot ($ id): function "powa_qualstats_snapshot" failed: insert or update on table "powa_qualstats_quals" violates foreign key constraint "powa_qualstats_quals_srvid_queryid_dbid_userid_fkey"

This problem only occurs when monitoring production and standby All servers are at powa version 4.0.1 monitor triggers collections production standby

Is there any adjustment to be made in the configuration or is this a problem in the extension code?

rjuju commented 4 years ago

Hello @danilo19ee

Yes this is a know behavior. I checked the code multiple time but couldn't spot any problem. I think that this is only a side effect of sampling data that comes from shared memory feed by the extensions rather than table data, which means that the content of each view isn't transactionnal at all. Since there are FK dependencies between some tables where we store the data but now when they're gathered in shared_memory, this can happen. However, this should only happen for the first snapshot and is automatically fixed with the next ones. Isn't that the case with your setup?

ghost commented 4 years ago

Thanks for the quick reply @rjuju, not even oracle responds so fast lol Yes, exactly what happens. Only the warning that bothers me. I am monitoring normally, the tool is excellent. But with each screen update the alert is displayed. There is a table that contains the errors. Where can I clean and the message stops showing?

ghost commented 4 years ago

This error is very strange, I just accessed my monitoring this morning and the problem just disappeared like magic. I am really interested in understanding and solving this problem if you need any support I am available. I realized that the error occurred only in a certain situation. I registered 3 servers in the remote monitoring in almost the same minute using the function

SELECT powa_register_server (hostname => '$ hostaname', port => $ port, alias => 'example-prim' :: text, username => 'powa' :: text, password => '****' :: text, dbname => 'powa' :: text, frequency => 120, powa_coalesce => 100, retention => '00: 00: 00 ':: interval, allow_ui_connection => true, extensions => '{pg_stat_kcache, pg_qualstats, pg_wait_sampling}');

a day before I had set up a test environment where I registered the production monitoring and it took me about 30 minutes setting up the standby and only after I enabled the monitoring of it. In this situation the problem did not occur. Is there any relation to the fact that the registration of the servers occurred in the same minute? Because the first snapshot will occur based on the time of registration.

Note. Today in the morning I restarted the powa-web and powa-collector service once again, I have noticed that after registering a new host the behavior of the tool is better after restarting both services.

rjuju commented 4 years ago

The alert present in all pages is voluntary, as there should never be usually, so you don't want to miss them.

Also, as I mentioned this issue should fix itself after another snapshot, or maybe multiple depending on how much qual and queries you store and how much you need to evict. So the problem disappearing like magic is also surprisingly expected behavior. Do you know if in your case after like 3 or 4 snapshots the issue was still present on the UI?

I'll also try to have another look on the issue, in case I missed something there.

ghost commented 4 years ago

It does not count the number of snapshots or identify the problem, but it may not have a maximum of four. Then the errors stop occurring. I'm working with postgres 12.2 to be the data repository for all monitoring. I already did some tests with postgres 11.4 and it worked without any errors. I am having a problem related to the database version.

you will upgrade to version 12.3 of postgres and follow up. In a second step, you will test the same environment running in version 11.4. Because the problem does seem to be linked to some memory management. For behaving intermittently

rjuju commented 3 years ago

The issue should be fixed as of https://github.com/powa-team/powa-archivist/commit/0e8915f928ed298a73f908b4564aa05ebfc0cc19

Feel free to reopen this issue if needed.