Slow bloat query - Githubissues

gozdal commented 3 years ago

I am trying to use pgmetrics on a big (10TB+), busy (1GB/s RW) database. It takes around 5 minutes for pgmetrics to run. I traced the problem to the "bloat query" spinning in CPU, doing no I/O.

I have traced the problem to the bloated pg_class (the irony: pgmetrics does not collect bloat on pg_catalog): vacuum (full, analyze, verbose) pg_class;

INFO:  vacuuming "pg_catalog.pg_class"
INFO:  "pg_class": found 1 removable, 7430805 nonremovable row versions in 158870 pages
DETAIL:  7429943 dead row versions cannot be removed yet.
CPU 1.36s/6.40u sec elapsed 9.85 sec.
INFO:  analyzing "pg_catalog.pg_class"
INFO:  "pg_class": scanned 60000 of 158869 pages, containing 295 live rows and 2806547 dead rows; 295 rows in sample, 781 estimated total rows
VACUUM

pg_class has so many dead rows because the workload is temp-table heavy (creating/destroying 1M+ temporary tables per day) and has long running analytics queries running for 24h+.

PG query planner assumes that index scan on pg_class will be very quick and plans Nested loop with Index scan. However, the index scan has 7M dead tuples to filter out and the query takes more than 200 seconds.

If I create a temp table from pg_class to contain only the live tuples:

CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON pg_class_alive(relname, relnamespace);
CREATE INDEX pg_class_tblspc_relfilenode_index ON pg_class_alive(reltablespace, relfilenode);
ANALYZE pg_class_alive;

and run the bloat query on pg_class_alive instead of pg_class:

SELECT
   nn.nspname AS schemaname,
   cc.relname AS tablename,
   COALESCE(cc.reltuples,0) AS reltuples,
   COALESCE(cc.relpages,0) AS relpages,
   COALESCE(CEIL((cc.reltuples*((datahdr+8-
     (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8 END))+nullhdr2+4))/(8192-20::float)),0) AS otta
 FROM
    pg_class_alive cc
 JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <> 'information_schema'
 LEFT JOIN
 (
   SELECT
     foo.nspname,foo.relname,
     (datawidth+32)::numeric AS datahdr,
     (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE nullhdr%8 END))) AS nullhdr2
   FROM (
     SELECT
       ns.nspname, tbl.relname,
       SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS datawidth,
       MAX(coalesce(null_frac,0)) AS maxfracsum,
       23+(
         SELECT 1+count(*)/8
         FROM pg_stats s2
         WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename = tbl.relname
       ) AS nullhdr
     FROM pg_attribute att
     JOIN pg_class_alive tbl ON att.attrelid = tbl.oid
     JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
     LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
     AND s.tablename = tbl.relname
     AND s.inherited=false
     AND s.attname=att.attname
     WHERE att.attnum > 0 AND tbl.relkind='r'
     GROUP BY 1,2
   ) AS foo
 ) AS rs
 ON cc.relname = rs.relname AND nn.nspname = rs.nspname
 LEFT JOIN pg_index i ON indrelid = cc.oid
 LEFT JOIN pg_class_alive c2 ON c2.oid = i.indexrelid

it runs in 10s, 20x faster

WDYT about adding such a hack (maybe under an option?) to pgmetrics?

mdevan commented 3 years ago

Thanks for this report. Yes, it was always the intention that alternate bloat queries be supported. While pgmetrics uses this one, there are other queries like this and this for example.

Will investigate this query you proposed, and see how it can be integrated.

As for collecting pg_catalog metrics, perhaps an option can be added, but will take it up as another issue/PR.

agolden3 commented 2 years ago

Is there a way to omit the bloat query from ./pgmetrics reports? I receive the error pretty quickly after execution, and have not been able to generate a report.

pgmetrics: bloat query failed: context deadline exceeded.

mdevan commented 2 years ago

@agolden3 There is not, currently. But I suppose it does make sense to add an omit option for bloat; it can be time consuming.

You can either try increasing the timeout using the -t option, or rebuild pgmetrics with this line commented out.

mdevan commented 2 years ago

@agolden3 An option to omit bloat query is now available in the just released v1.13.0. Use "--omit=bloat".

rapidloop / pgmetrics

Slow bloat query #42