powa-team / powa-web

PoWA user interface
http://powa.readthedocs.io/
73 stars 31 forks source link

Growing repository database. #114

Closed TezkaRabota closed 2 years ago

TezkaRabota commented 3 years ago

Hi,

I have powa configuration with 20 monitored Postgresql servers and dedicated repository database server. From 20 monitored servers 15 are working ok, but from other 5 servers I get a message:
prsipg02m: powa_take_snapshot(7): function "powa_qualstats_snapshot" failed: insert or update on table "powa_qualstats_quals" violates foreign key constraint "powa_qualstats_quals_srvid_fkey"

Sidefect of this is that data from this servers are not deleted from central repository after retention time. I tried to reinstall powa on this clients and on central repository server with no effect.

Any idea what could cause this problem?

Regards Domen

rjuju commented 3 years ago

Hello,

The error on powa_qualstats_quals is already known, as as far as I know is due to the lack of consistency when sampling data from the various datasources. It should also "fix" itself automatically during the next snapshots.

Could you give more detail on the rentention issue? For the record the current behavior is that retention is applied when doing a snapshot. So if you removed some remote servers, no snapshot will be done and it's expected that you'll keep up to the configured retention amount of data until you explicitly remove the remote server (as in DELETE FROM powa_servers WHERE id =...). This is done this way as removing server is probably not a common practice, and people might want to conserve some historical data.

TezkaRabota commented 3 years ago

By my expirience the error doesn't fix itself. Whenever Whenever I check powa web error is there and always for the same five postgresql servers.

If the retention is applied when doing snapshot, than it is obvious that data is never purged, because powa_qualstats_quals error is permanent for this five servers. It is also interesting that repository database is growing, but on powa web I can see only data for last 24 hours (as expected) which is my retention time.

rjuju commented 3 years ago

Oh that's strange. I'll look at the code again and see if I can spot any problem.

About retention, that shouldn't be the case. The code should handle any error happening during the snapshot, log it in some table so you can see the information on the UI and continue with the rest of the snapshot. So even if there's an error while doing the pg_qualstats part of the snapshot, the rest should be executed as expected, including the purge of old data.

Are you sure that the problem is actually that the retention isn't applied rather than for instance autovacuum issue on the underlying tables?

TezkaRabota commented 3 years ago

I can see over Powa Web that autovacuum is active all the time on powa repository database.

I'm sending you two graphs. On graph1 you can see diska space usage on powa repository over two days (Mon-Wed) when I tested five problematic servers. As you can see space usage didn't stop to rise after retention period of 24 hours. On graph2 you can see disk space usage on powa repository in week 35, when I added 5 problematic servers to powa repository. You an see that behaviour of powa repositroy was normal in prevous weeks 33 and 34.

I hope this will help.

image Graph2

rjuju commented 3 years ago

Thanks! Do you know what happened on the server every time the disk usage went low?

TezkaRabota commented 3 years ago

I don't understand your question. If you mean powa repository server, this happens after I get a "powa_qualstats_quals" message on five problematic servers. I told you, that after retention period data for this five servers are not delited from powa repository database. The graphs present disk usage of disk where powa repository database resides.

banlex73 commented 3 years ago

When time permits, please check powa%tmp tables - they should be empty, I suspect that once the process failed it left a temporary data set there... you can run truncate on those tables but make sure there's no snapshot running.

чт, 10 вер. 2020 о 04:28 TezkaRabota notifications@github.com пише:

I don't understand your question. If you mean powa repository server, this happens after I get a "powa_qualstats_quals" message on five problematic servers. I told you, that after retention period data for this five servers are not delited from powa repository database. The graphs present disk usage of disk where powa repository database resides.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/114#issuecomment-690188124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYGB6XGDEISVVTHKW2LSFCZ5PANCNFSM4REUFXCQ .

rjuju commented 3 years ago

Agreed with @banlex73, by design the records in powa%tmp tables are only removed after a successful call, so in case of error it's possible to diagnose the issue.

@TezkaRabota You said that you added 5 new remote servers on the monday. What I can see from your graph is that as a consequence the disk usage dropped from 150k to almost 0, then raised to around 100k in 2 days and dropped again. I don't see how adding 5 servers would lead to this pattern, so I'm assuming that something else happened on the server. If not, it seems to me that the retention is applied, but probably not as fast as you want.

Note that by design, the purge is only attempted every coalesce +1 snapshot. So if you have a 5 minutes interval roughly every 8.3 hours, meaning that I'd expect the purge to effective after a bit more than 25 hours. That's still less than 2 days but maybe you have different configuration, or something else is happening.

TezkaRabota commented 3 years ago

Every time when disk (powa repository database) was full, I droped the repository database and create it again. That's why you see raising and droping of disk space.

rjuju commented 3 years ago

I see, thanks.

While looking again at your initial message, I see that it mention a error on powa_qualstats_quals_srvid_fkey constraint. However unless I'm missing something there should be such constraint.

Which version of powa-archivist did you install? Can we see the output of the following psql command on the powa database of the repository server?

TezkaRabota commented 3 years ago

Heree it is:

powa=# \dx List of installed extensions Name | Version | Schema | Description --------------------+---------+------------+----------------------------------------------------------- btree_gist | 1.5 | public | support for indexing common datatypes in GiST pg_qualstats | 2.0.2 | public | An extension collecting statistics about quals pg_stat_kcache | 2.1.1 | public | Kernel statistics gathering pg_stat_statements | 1.6 | public | track execution statistics of all SQL statements executed pg_track_settings | 2.0.0 | public | Track settings changes plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language powa | 4.0.1 | public | PostgreSQL Workload Analyser-core (7 rows)

powa=# \d powa_qualstats_quals Table "public.powa_qualstats_quals" Column | Type | Collation | Nullable | Default ---------+-------------+-----------+----------+--------- srvid | integer | | not null | qualid | bigint | | not null | queryid | bigint | | not null | dbid | oid | | not null | userid | oid | | not null | quals | qual_type[] | | | Indexes: "powa_qualstats_quals_pkey" PRIMARY KEY, btree (srvid, qualid, queryid, dbid, userid) "powa_qualstats_quals_srvid_queryid_idx" btree (srvid, queryid) Foreign-key constraints: "powa_qualstats_quals_srvid_fkey" FOREIGN KEY (srvid, queryid, dbid, userid) REFERENCES powa_statements(srvid, queryid, dbid, userid) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE Referenced by: TABLE "powa_qualstats_constvalues_history_current" CONSTRAINT "powa_qualstats_constvalues_history_current_srvid_fkey" FOREIGN KEY (srvid, qualid, queryid, dbid, userid) REFERENCES powa_qualstats_quals(srvid, qualid, queryid, dbid, userid) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE TABLE "powa_qualstats_constvalues_history" CONSTRAINT "powa_qualstats_constvalues_history_srvid_fkey" FOREIGN KEY (srvid, qualid, queryid, dbid, userid) REFERENCES powa_qualstats_quals(srvid, qualid, queryid, dbid, userid) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE TABLE "powa_qualstats_quals_history_current" CONSTRAINT "powa_qualstats_quals_history_current_srvid_fkey" FOREIGN KEY (srvid, qualid, queryid, dbid, userid) REFERENCES powa_qualstats_quals(srvid, qualid, queryid, dbid, userid) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE TABLE "powa_qualstats_quals_history" CONSTRAINT "powa_qualstats_quals_history_srvid_fkey" FOREIGN KEY (srvid, qualid, queryid, dbid, userid) REFERENCES powa_qualstats_quals(srvid, qualid, queryid, dbid, userid) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE

TezkaRabota commented 3 years ago

Hi, I have another clue that might help to find out what is wrong. There are five servers that make problems: srv01, srv02, srv03, srv04, srv05. srv01 was included in powa few months ago and it worked OK. Other servers are all vmware clones of srv01. Postgresql database was initialized after clonning. Could it be that there is some OS or DB identifier that was cloned (it means it is the same on all five servers now) and is used by powa to distinguishe between servers in collecto database?

Regards, Domen

banlex73 commented 3 years ago

powa has a unique server identifier (powa_servers PK), so, everytime when you register a new target in powa it gets a unique ID. I would recommend you to find top biggest tables on powa repository database, just run this query below: select pg_relation_size(tablename::text), pg_size_pretty(pg_relation_size(tablename::text)),tablename

чт, 17 вер. 2020 о 05:44 TezkaRabota notifications@github.com пише:

Hi, I have another clue that might help to find out what is wrong. There are five servers that make problems: srv01, srv02, srv03, srv04, srv05. srv01 was included in powa few months ago and it worked OK. Other servers are all vmware clones of srv01. Postgresql database was initialized after clonning. Could it be that there is some OS or DB identifier that was cloned (it means it is the same on all five servers now) and is used by powa to distinguishe between servers in collecto database?

Regards, Domen

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/114#issuecomment-694205254, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYB7HWCC255TCDHXYULSGIABRANCNFSM4REUFXCQ .

TezkaRabota commented 3 years ago

Somehow your comand didn't work, so I use another one and here are results:

table_schema | table_name | total_size | data_size | external_size --------------+------------------------------------+------------+-----------+--------------- public | powa_all_relations_history_current | 88 GB | 76 GB | 12 GB public | powa_all_relations_history | 6608 MB | 6113 MB | 495 MB public | powa_all_relations_src_tmp | 5759 MB | 5758 MB | 1584 kB public | powa_statements_src_tmp | 4242 MB | 796 MB | 3446 MB public | powa_wait_sampling_history_current | 1953 MB | 1398 MB | 556 MB public | powa_statements_history_current | 1505 MB | 1183 MB | 322 MB public | powa_kcache_metrics_current | 1153 MB | 954 MB | 198 MB public | powa_wait_sampling_history | 762 MB | 604 MB | 158 MB public | powa_statements_history | 480 MB | 418 MB | 62 MB public | powa_kcache_metrics | 442 MB | 353 MB | 89 MB

rjuju commented 3 years ago

I'm sorry for the late answer I'm a bit busy lately. Thanks @TezkaRabota for the various information.

So, I don't think it can be related to a cloned VM, especially if the databases were initialized after the cloning. One reason for that is that the server id isn't even known by the foreign server, it's entirely handled by powa-collector and added when retrieving the data.

Thanks also @banlex73! So 2 of the biggest tables are _src_tmp tables. Can you do a count(*) on those tables to see if it's due to bloat or rows not being removed?

Coming back to the displayed error, it's due to this constraint:

"powa_qualstats_quals_srvid_fkey" FOREIGN KEY (srvid, queryid, dbid, userid) REFERENCES powa_statements(srvid, queryid, dbid, userid) MATCH FULL ON UPDATE CASCADE ON DELETE CASCADE

So somehow you have quals the belongs to queries that apparently don't exist in the remote servers' pg_stat_statement. There's some code to try to avoid that (there's a priority for the snapshot function and pg_stat_statements' one are executed first), so there's something else going on.

Can you check if the queryid reported in the error message exists in the remote server's pg_stat_statements view? I think this issue can happen if you use temporary tables with quals, and if the underlying query get evicted.

I think the only way forward is to simply give up on those quals during the snapshot. I'll work on a new version shortly. In the meantime, you can probably disable pg_qualstats support for the problematic remove servers as a temporary workaround.

banlex73 commented 3 years ago

thank you for sharing it... what I see, powa fails to process statements at least.. most likely you see an error in UI ... What i would do: find that error in postgres log on repository database...you might see something like:

Error while calling public.powa_..

so, just find the Root cause of the issue and implement a workaround ... one day it will be fixed or contribute to the project..

чт, 17 вер. 2020 о 22:07 TezkaRabota notifications@github.com пише:

Somehow your comand didn't work, so I use another one and here are results:

table_schema | table_name | total_size | data_size | external_size

--------------+------------------------------------+------------+-----------+--------------- public | powa_all_relations_history_current | 88 GB | 76 GB | 12 GB public | powa_all_relations_history | 6608 MB | 6113 MB | 495 MB public | powa_all_relations_src_tmp | 5759 MB | 5758 MB | 1584 kB public | powa_statements_src_tmp | 4242 MB | 796 MB | 3446 MB public | powa_wait_sampling_history_current | 1953 MB | 1398 MB | 556 MB public | powa_statements_history_current | 1505 MB | 1183 MB | 322 MB public | powa_kcache_metrics_current | 1153 MB | 954 MB | 198 MB public | powa_wait_sampling_history | 762 MB | 604 MB | 158 MB public | powa_statements_history | 480 MB | 418 MB | 62 MB public | powa_kcache_metrics | 442 MB | 353 MB | 89 MB

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/114#issuecomment-694653491, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYAODYA3OEHSUHE3UO3SGLTKLANCNFSM4REUFXCQ .

TezkaRabota commented 3 years ago

@banlex73 there are no error messages in postgresql log on repositroy database. However I have more than 31 million lines in powa_all_relations_history_current table in just 48 hours for just 15 mananged clients.

rjuju commented 3 years ago

@TezkaRabota yes powa_all_relations* can be quite huge. 31 millions lines in 48h represent ~ 3500 objects per remote server. Do you have servers with a high number of objects or is there a few servers with a very high number of ojects?

If yes you could try to deactivate powa_all_relation for those servers. Alternatively you could try to lower the coalesce value a bit, so you pack less records in the aggregated tables but more often.

TezkaRabota commented 3 years ago

@rjuju but if I have a retention time of 24 hours the table should be at approximatley the same size after first 24 hours of collecting data? But it doesn't stops to grow.

rjuju commented 3 years ago

Unfortunately without more information I can't say what's the problem. What does that query returns on the repository database?

select max((record).ts) - min((record).ts), srvid
FROM powa_all_relations_history_current
GROUP BY 2 ORDER BY 1 DESC
banlex73 commented 3 years ago

Error details you also can find in database: select alias, errors from powa_servers join powa_snapshot_metas m on m.srvid = powa_servers.id where errors is not null; In my case powa_all_relations_history_current has:60 242 989 records and I monitor 29 PG servers (2 days retention) What I noticed, autovacuum cannot deal with this super busy database, tables are getting bloated within a day, so I scheduled daily pg_repack to fix it.

вт, 22 вер. 2020 о 00:16 TezkaRabota notifications@github.com пише:

@banlex73 https://github.com/banlex73 there are no error messages in postgresql log on repositroy database. However I have more than 31 million lines in powa_all_relations_history_current table in just 48 hours for just 15 mananged clients.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/114#issuecomment-696550958, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYDUBAORMEFZM6HDBZDSHBFNVANCNFSM4REUFXCQ .

rjuju commented 3 years ago

@TezkaRabota Sorry for the late reply.

I came up with a simple fix for the powa_qualstats_snapshot() function locally. Could you try to modify the powa_qualstats_snapshot() function (for instance using \ef powa_qualstats_snapshot with psql), changing the capture CTE with the following addition and see if that also solves the issue for you:

   WITH capture AS (
     SELECT *
-    FROM powa_qualstats_src(_srvid)
+    FROM powa_qualstats_src(_srvid) q
+    WHERE EXISTS (SELECT 1
+      FROM powa_statements s
+      WHERE s.srvid = _srvid
+      AND q.queryid = s.queryid
+      AND q.dbid = s.dbid
+      AND q.userid = s.dbid)

If needed here's the full function declaration if you prefer a plain CREATE OR REPLACE order:

CREATE OR REPLACE FUNCTION powa_qualstats_snapshot(_srvid integer) RETURNS void as $PROC$
DECLARE
    result     bool;
    v_funcname text := 'powa_qualstats_snapshot';
    v_rowcount bigint;
BEGIN
  PERFORM powa_log(format('running %I', v_funcname));

  PERFORM powa_prevent_concurrent_snapshot(_srvid);

  WITH capture AS (
    SELECT *
    FROM powa_qualstats_src(_srvid) q
    WHERE EXISTS (SELECT 1
      FROM powa_statements s
      WHERE s.srvid = _srvid
      AND q.queryid = s.queryid
      AND q.dbid = s.dbid
      AND q.userid = s.dbid)
  ),
  missing_quals AS (
      INSERT INTO powa_qualstats_quals (srvid, qualid, queryid, dbid, userid, quals)
        SELECT DISTINCT _srvid AS srvid, qs.qualnodeid, qs.queryid, qs.dbid, qs.userid,
          array_agg(DISTINCT q::qual_type)
        FROM capture qs,
        LATERAL (SELECT (unnest(quals)).*) as q
        WHERE NOT EXISTS (
          SELECT 1
          FROM powa_qualstats_quals nh
          WHERE nh.srvid = _srvid
            AND nh.qualid = qs.qualnodeid
            AND nh.queryid = qs.queryid
            AND nh.dbid = qs.dbid
            AND nh.userid = qs.userid
        )
        GROUP BY srvid, qualnodeid, qs.queryid, qs.dbid, qs.userid
      RETURNING *
  ),
  by_qual AS (
      INSERT INTO powa_qualstats_quals_history_current (srvid, qualid, queryid,
        dbid, userid, ts, occurences, execution_count, nbfiltered,
        mean_err_estimate_ratio, mean_err_estimate_num)
      SELECT _srvid AS srvid, qs.qualnodeid, qs.queryid, qs.dbid, qs.userid,
          ts, sum(occurences), sum(execution_count), sum(nbfiltered),
          avg(mean_err_estimate_ratio), avg(mean_err_estimate_num)
        FROM capture as qs
        GROUP BY srvid, ts, qualnodeid, qs.queryid, qs.dbid, qs.userid
      RETURNING *
  ),
  by_qual_with_const AS (
      INSERT INTO powa_qualstats_constvalues_history_current(srvid, qualid,
        queryid, dbid, userid, ts, occurences, execution_count, nbfiltered,
        mean_err_estimate_ratio, mean_err_estimate_num, constvalues)
      SELECT _srvid, qualnodeid, qs.queryid, qs.dbid, qs.userid, ts,
        occurences, execution_count, nbfiltered, mean_err_estimate_ratio,
        mean_err_estimate_num, constvalues
      FROM capture as qs
  )
  SELECT COUNT(*) into v_rowcount
  FROM capture;

  perform powa_log(format('%I - rowcount: %s',
        v_funcname, v_rowcount));

    IF (_srvid != 0) THEN
        DELETE FROM powa_qualstats_src_tmp WHERE srvid = _srvid;
    END IF;

  result := true;

  -- pg_qualstats metrics are not accumulated, so we force a reset after every
  -- snapshot.  For local snapshot this is done here, remote snapshots will
  -- rely on the collector doing it through query_cleanup.
  IF (_srvid = 0) THEN
    PERFORM pg_qualstats_reset();
  END IF;
END
$PROC$ language plpgsql; /* end of powa_qualstats_snapshot */
TezkaRabota commented 3 years ago

After implementing a change in the function there is no more snapshot error messages. So this problem is solved. But still amount of data in collector's powa database is permanently growing inspite of the fact that it should stay approximately the sam after 24 hours (by retention time of 24 hours).

rjuju commented 3 years ago

That's at least some progress! I pushed that change that'll be available as of v4.0.2.

So, for the increasing size, can you show what are now the 10 biggest tables, with their size? Can you show also both pg_relation_size() and pg_indexes_size().

banlex73 commented 3 years ago

"permanently growing" let's separate the issue if record count is growing (unlikely) it means that purge is not running. If table/index size is growing (what I observe on my powa system) I would recommend to implement pg_repack daily job.

"pg_repack http://reorg.github.com/pg_repack is a PostgreSQL extension which lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes"

Hope it will help

чт, 1 жовт. 2020 о 04:48 TezkaRabota notifications@github.com пише:

After implementing a change in the function there is no more snapshot error messages. So this problem is solved. But still amount of data in collector's powa database is permanently growing inspite of the fact that it should stay approximately the sam after 24 hours (by retention time of 24 hours).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/114#issuecomment-702079904, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYC3W6E67EVZ4GAZG2DSIRT7TANCNFSM4REUFXCQ .

TezkaRabota commented 3 years ago

@rjuju here are data you requested:

              table_name                   | table_size | indexes_size | total_size

-----------------------------------------------+------------+--------------+------------ "public"."powa_all_relations_history_current" | 119 GB | 17 GB | 135 GB "public"."powa_all_relations_history" | 6933 MB | 499 MB | 7431 MB "public"."powa_wait_sampling_history_current" | 2197 MB | 1023 MB | 3219 MB "public"."powa_statements_history_current" | 1818 MB | 356 MB | 2175 MB "public"."powa_all_relations_src_tmp" | 2033 MB | 0 bytes | 2033 MB "public"."powa_kcache_metrics_current" | 1210 MB | 311 MB | 1522 MB "public"."powa_wait_sampling_history" | 945 MB | 263 MB | 1209 MB "public"."powa_statements_src_tmp" | 957 MB | 0 bytes | 957 MB "public"."powa_statements_history" | 523 MB | 52 MB | 575 MB "public"."powa_kcache_metrics" | 460 MB | 72 MB | 532 MB

rjuju commented 3 years ago

@banlex73 indeed, we need to track down what's really happening here.

Since you already found out that in your setup that's a bloat issue, were you able to track down the root cause for that? Maybe vacuum not aggressive enough for some tables or something? If you want to try to dig more on that, feel free to open a new issue.

rjuju commented 3 years ago

@TezkaRabota thanks! So clearly your biggest problem is (still) the heap size of powa_all_relation_history_current. Can you send the output of this query:

SELECT srvid, max((record).ts) - min((record).ts), count(*)
FROM powa_all_relations_history_current
GROUP BY 1 ORDER BY 2 DESC;

that should tell us if there's a retention issue, and a bit more about the number of object per remote server.

TezkaRabota commented 3 years ago

@banlex73 I decided to use pgcompacttable instead of pg_repack, because of lack of free space on data disk. Statistics on the second biggest table is: (powa:public.powa_all_relations_history) Processing results: 490214 pages left (576957 pages including toasts and indexes), size reduced by 2.928GB (2.874GB including toasts and indexes) in total. Cleaning process tooks half an hour. The biggest table (powa_all_relations_history_current) is 20 times bigger. It would take very long to clean it.

TezkaRabota commented 3 years ago

@rjuju here are data you requested:

srvid | ?column? | count -------+-----------------------+---------- 14 | 1 day 17:12:39.954616 | 25427193 19 | 1 day 16:07:53.926241 | 10298728 9 | 1 day 15:03:08.876088 | 16269990 17 | 1 day 04:51:28.273711 | 15762572 18 | 1 day 03:28:06.18608 | 17448136 16 | 1 day 03:25:11.146667 | 18082070 13 | 1 day 01:53:01.53532 | 54918 3 | 1 day 01:41:09.808587 | 139443 8 | 1 day 01:39:17.873825 | 776568 2 | 1 day 01:37:26.744071 | 315674 5 | 1 day 01:28:05.617064 | 359650 11 | 1 day 00:26:14.784226 | 146833 10 | 22:58:58.465006 | 41013 12 | 22:40:23.314718 | 1391924 1 | 22:32:05.814654 | 1702252 4 | 22:30:01.191628 | 1106683 7 | 22:27:40.28044 | 1729392 6 | 19:55:34.296337 | 13314388 15 | 03:22:20.966081 | 1805049 0 | 02:11:20.85245 | 11382

banlex73 commented 3 years ago

I agree, that vacuum should be more aggressive but... in my case I see I/O utilization ~70% on idle system (only collector works) if I add more vacuum I will overload it... so, I decided to schedule pg_repack every night to fix the bloating issue.

пт, 2 жовт. 2020 о 02:53 Julien Rouhaud notifications@github.com пише:

@banlex73 https://github.com/banlex73 indeed, we need to track down what's really happening here.

Since you already found out that in your setup that's a bloat issue, were you able to track down the root cause for that? Maybe vacuum not aggressive enough for some tables or something? If you want to try to dig more on that, feel free to open a new issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/114#issuecomment-702636220, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYDUESEMIB222AO2C2DSIWPIXANCNFSM4REUFXCQ .

banlex73 commented 3 years ago

I am pretty sure, you have performance issue generating almost any reports in powa due to the huge table... so, sooner or later you need to reduce the size.. I would run this script or similar https://wiki.postgresql.org/wiki/Show_database_bloat to get % of bloated objects. In my case I run pg_repack for ay table where % is >30. Also to improve performance on my system I had to partition many tables and twick code to make postgres use partitions (added filters on partition key)

пт, 2 жовт. 2020 о 03:11 TezkaRabota notifications@github.com пише:

@banlex73 https://github.com/banlex73 I decided to use pgcompacttable instead of pg_repack, because of lack of free space on data disk. Statistics on the second biggest table is: (powa:public.powa_all_relations_history) Processing results: 490214 pages left (576957 pages including toasts and indexes), size reduced by 2.928GB (2.874GB including toasts and indexes) in total. Cleaning process tooks half an hour. The biggest table (powa_all_relations_history_current) is 20 times bigger. It would take very long to clean it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/114#issuecomment-702645581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYAG5CSPQ4EBN5EHVHDSIWRNXANCNFSM4REUFXCQ .

banlex73 commented 3 years ago

Those 2 tables "public"."powa_statements_src_tmp" "public"."powa_all_relations_src_tmp" should be empty when powa_collector is not running. I think it's safe to truncate table (make sure that collector is off). Why is it not empty? Every time when powa fails it leave records into _tmp table...

пт, 2 жовт. 2020 о 00:18 TezkaRabota notifications@github.com пише:

@rjuju https://github.com/rjuju here are data you requested:

          table_name                   | table_size | indexes_size | total_size

-----------------------------------------------+------------+--------------+------------ "public"."powa_all_relations_history_current" | 119 GB | 17 GB | 135 GB "public"."powa_all_relations_history" | 6933 MB | 499 MB | 7431 MB "public"."powa_wait_sampling_history_current" | 2197 MB | 1023 MB | 3219 MB "public"."powa_statements_history_current" | 1818 MB | 356 MB | 2175 MB "public"."powa_all_relations_src_tmp" | 2033 MB | 0 bytes | 2033 MB "public"."powa_kcache_metrics_current" | 1210 MB | 311 MB | 1522 MB "public"."powa_wait_sampling_history" | 945 MB | 263 MB | 1209 MB "public"."powa_statements_src_tmp" | 957 MB | 0 bytes | 957 MB "public"."powa_statements_history" | 523 MB | 52 MB | 575 MB "public"."powa_kcache_metrics" | 460 MB | 72 MB | 532 MB

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/powa-team/powa-web/issues/114#issuecomment-702570006, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIHWEYGVYDJ63CWHXMPIW63SIV5EPANCNFSM4REUFXCQ .

rjuju commented 3 years ago

@TezkaRabota thanks for the output!

If I understand correctly, on the server 14 you have around 50k objects (table or index), database-wide. Is that the case? If yes that's quite a lot of objects and it'll always need some space.

As @banlex73 mentioned, could you try to see how bloated the powa_all_relations_history_current table is? You could use the query mentioned in the wiki, or if possible https://www.postgresql.org/docs/current/pgstattuple.html. This module should give more precise results.

rjuju commented 2 years ago

Without update in more than a year I'm closing that issue. Feel free to reopen if needed.