Open mpzrtauio opened 1 year ago
Do you have JIT enabled in PostgreSQL? We were having similar issues and thought it was the exporter as well, but disabling JIT in postgresql solved this issue
https://github.com/CrunchyData/pgmonitor/issues/182#issuecomment-706559384
If you don't want to disable it completely, you can do it for just your monitoring user. That is what we did
ALTER ROLE ccp_monitoring SET jit TO 'off';
We have already turned off JIT for monitoring users, and the issue is still present.
Were you able to narrow it down to any specific queries? That's what helped us figure out it was JIT in our case. Could help diagnose whether it's an upstream issue in PG or something related to the exporter itself.
@keithf4 Apologies for the delayed response. We turned on pg_stat_statements PG extension and started to collect metrics about exporter's query.
SELECT *
FROM pg_stat_statements
where userid::regrole = 'monitoring'::regrole
ORDER BY (shared_blks_hit+shared_blks_dirtied)
userid | dbid | toplevel | queryid | query | plans | total_plan_time | min_plan_time | max_plan_time | mean_plan_time | stddev_plan_time | calls | total_exec_time | min_exec_time | max_exec_time | mean_exec_time | stddev_exec_time | rows | shared_blks_hit | shared_blks_read | shared_blks_dirtied | shared_blks_written |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
6681618 | 16447 | true | -10637612914751176 | SELECT pg_database_size($1) | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 915420 | 4135248.9797929386 | 0.415105 | 179.16520699999998 | 4.517324266230871 | 8.040072400672269 | 915420 | 1832208 | 0 | 0 | 0 |
6681618 | 16447 | true | -2954216581073452720 | SELECT pg_database.datname, tmp.state, tmp2.usename, tmp2.application_name, COALESCE(count,$1) as count, COALESCE(max_tx_duration,$2) as max_tx_duration FROM ( VALUES ($3), ($4), ($5), ($6), ($7), ($8) ) AS tmp(state) CROSS JOIN pg_database LEFT JOIN ( SELECT datname, state, usename, application_name, count(*) AS count, MAX(EXTRACT($9 FROM now() - xact_start))::float AS max_tx_duration FROM pg_stat_activity GROUP BY datname,state,usename,application_name) AS tmp2 ON tmp.state = tmp2.state AND pg_database.datname = tmp2.datname | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 182917 | 26361.279016000426 | 0.103602 | 14.475577 | 0.1441160691242474 | 0.132534524797907 | 5542986 | 548759 | 0 | 0 | 0 |
6681618 | 16447 | true | -182263461300699970 | SELECT pg_database.datname,tmp.mode,COALESCE(count,$1) as count FROM ( VALUES ($2), ($3), ($4), ($5), ($6), ($7), ($8), ($9), ($10) ) AS tmp(mode) CROSS JOIN pg_database LEFT JOIN (SELECT database, lower(mode) AS mode,count(*) AS count FROM pg_locks WHERE database IS NOT NULL GROUP BY database, lower(mode) ) AS tmp2 ON tmp.mode=tmp2.mode and pg_database.oid = tmp2.database ORDER BY 1 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 182917 | 78858.32328600078 | 0.199402 | 63.891402 | 0.43111533256066775 | 1.6316327571492935 | 8231265 | 183430 | 0 | 0 | 0 |
6681618 | 16447 | true | -5932576113725481334 | SELECT pg_database.datname FROM pg_database | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 183084 | 2100.63370199979 | 0.0055 | 5.0638630000000004 | 0.011473606115225648 | 0.029558547476256736 | 915420 | 183084 | 0 | 0 | 0 |
6681618 | 16447 | true | -3196603689379148377 | SELECT * FROM pg_stat_database_conflicts | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 182917 | 323691.1460849984 | 1.04641 | 34.539633 | 1.769606685463864 | 2.1520858588312666 | 914585 | 182917 | 0 | 0 | 0 |
6681618 | 16447 | true | -5444365506547626638 | SELECT * FROM pg_stat_database | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 182916 | 379707.6526510065 | 1.118914 | 36.115644 | 2.075858058622502 | 2.3765647629448967 | 1097496 | 182916 | 0 | 0 | 0 |
6681618 | 16447 | true | 2859277235430319612 | SELECT name, setting, COALESCE(unit, $1), short_desc, vartype FROM pg_settings WHERE vartype IN ($2, $3, $4) AND name != $5 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 182917 | 305611.66955899115 | 1.01731 | 64.357009 | 1.6707669027974452 | 0.7644764082973766 | 49753424 | 1311 | 49 | 0 | 0 |
6681618 | 16447 | true | -3033217080036596150 | SELECT checkpoints_timed, checkpoints_req ,checkpoint_write_time ,checkpoint_sync_time ,buffers_checkpoint ,buffers_clean ,maxwritten_clean ,buffers_backend ,buffers_backend_fsync ,buffers_alloc ,stats_reset FROM pg_stat_bgwriter | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 183084 | 2131866.113088992 | 1.337615 | 74.049995 | 11.644196724394355 | 2.2696155408569916 | 183084 | 0 | 0 | 0 | 0 |
6681618 | 16447 | true | -9113088414397004364 | SELECT *, extract($1 from now() - last_archived_time) AS last_archive_age FROM pg_stat_archiver | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 182917 | 429206.3244129997 | 1.0390110000000001 | 40.687288 | 2.3464539895855387 | 2.9282936754452984 | 182917 | 0 | 0 | 0 | 0 |
6681618 | 16447 | true | -2563412158741520104 | SELECT version() | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 182917 | 1364.2742470001874 | 0.0037 | 2.147126 | 0.007458433316750187 | 0.029891294116721038 | 182917 | 0 | 0 | 0 | 0 |
6681618 | 16447 | true | -3693365941426935874 | SELECT slot_name, database, active, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) FROM pg_replication_slots | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 182917 | 1406.4713770002957 | 0.005001 | 11.418737 | 0.0076891233564949915 | 0.03974505046098803 | 0 | 0 | 0 | 0 | 0 |
6681618 | 16447 | true | 7763026806317272650 | SELECT slot_name, pg_current_wal_lsn() - $1 AS current_wal_lsn, coalesce(confirmed_flush_lsn, $2) - $3, active FROM pg_replication_slots | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 183084 | 2160.2764899999106 | 0.0056 | 9.693518 | 0.011799373456992284 | 0.039906631555195596 | 0 | 0 | 0 | 0 | 0 |
6681618 | 16447 | true | -5644479190607423912 | SELECT *, (case pg_is_in_recovery() when $1 then $2 else pg_current_wal_lsn() end) AS pg_current_wal_lsn, (case pg_is_in_recovery() when $3 then $4 else pg_wal_lsn_diff(pg_current_wal_lsn(), pg_lsn($5))::float end) AS pg_current_wal_lsn_bytes, (case pg_is_in_recovery() when $6 then $7 else pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)::float end) AS pg_wal_lsn_diff FROM pg_stat_replication | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 182917 | 10725.877506000359 | 0.039701 | 14.47156 | 0.05863794784519792 | 0.10586158018482977 | 0 | 0 | 0 | 0 | 0 |
I hope it can give you some extra information. We welcome any suggestion for new measurements ideas.
@mpzrtauio do you have such problem on latest version ? I plan to upgrade, but in case latest version has same issue i'll think twice
did anyone find a resolution for this, I am facing the same issue
What did you do? Run PostgreSQL and postgres_exporter on the same VM (Ubuntu) in Azure. We scrape postgres_exporter endpoint in every 15 sec.
What did you expect to see? Almost constant and low memory usage at Postgres query's process (which belongs to postgres_exporter).
What did you see instead? Under which circumstances?
As you can seen there is a sawtooth in memory consumption. We started to measure our Linux processes memory usage with htop and pg_top. We found Postgres query's process (which belongs to postgres_exporter) eats more and more memory (at every scraping, in our case 1MB / 1 min). When we stopped our postgres_exporter's service then the issue disappeared. We also tried to stop scraping then memory consumption stay constant. And finally we tried to kill Postgres query's idle process and after that occupied memory has been freed.
Our workaround is based on this information. We wrote a bash script which kills Postgres query's process which belongs to exporter in every 15 minutes and in this case memory usage sawtooth is much smaller.
The largest PostgreSQL database size is around 2Tb. We use the following PG extensions:
pg_trgm 1.6
We turn off any data transformation and loading tasks in our ELT system.
Environment
Linux 5.15.0-1038-azure x86_64
postgres_exporter, version 0.12.0 (branch: HEAD, revision: f9a1edbbc6d629663aa658378e1d989293e6e488) build user: root@463311dfdfb6 build date: 20230321-23:52:03 go version: go1.20.2 platform: linux/amd64 tags: netgo static_build
We don't use any flags, we use default settings.
PostgreSQL 14.4 (Ubuntu 14.4-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit