prometheus-community / postgres_exporter

A PostgreSQL metric exporter for Prometheus
Apache License 2.0
2.72k stars 723 forks source link

Exporter container log of replica instance always reports pg_replication_slots pq: recovery is in progress #962

Open dominik0711 opened 10 months ago

dominik0711 commented 10 months ago

What did you do? I have set up a crunchy postgres cluster on my OpenShift cluster with 1 master and 2 replica instances. exporter container is running as sidecar container. All the replicas logs the following error message:

ts=2023-11-16T13:00:00.362Z caller=namespace.go:236 level=info err="Error running query on database \"localhost:5432\": pg_replication_slots pq: recovery is in progress"
ts=2023-11-16T13:00:00.379Z caller=postgres_exporter.go:731 level=error err="queryNamespaceMappings returned 1 errors"

What did you expect to see? Replication slots on Replicas are always inactive and in recovery mode so I don't expect to see any errors here

What did you see instead? Under which circumstances?

All replicas reports the same messages listed here:

ts=2023-11-16T13:00:00.362Z caller=namespace.go:236 level=info err="Error running query on database \"localhost:5432\": pg_replication_slots pq: recovery is in progress"
ts=2023-11-16T13:00:00.379Z caller=postgres_exporter.go:731 level=error err="queryNamespaceMappings returned 1 errors"

Environment

OpenShift 4.11 on Azure

Linux 4.18.0-372.76.1.el8_6.x86_64 x86_64

postgres_exporter, version 0.10.1 (branch: HEAD, revision: 6cff384d7433bcb1104efe3b496cd27c0658eb09) build user: root@eb21848025d7 build date: 20220114-17:20:30 go version: go1.17.6 platform: linux/amd64

        - name: CONFIG_DIR
          value: /opt/cpm/conf
        - name: POSTGRES_EXPORTER_PORT
          value: '9187'
        - name: PGBACKREST_INFO_THROTTLE_MINUTES
          value: '10'
        - name: PG_STAT_STATEMENTS_LIMIT
          value: '20'
        - name: PG_STAT_STATEMENTS_THROTTLE_MINUTES
          value: '-1'
        - name: EXPORTER_PG_HOST
          value: localhost
        - name: EXPORTER_PG_PORT
          value: '5432'
        - name: EXPORTER_PG_DATABASE
          value: postgres
        - name: EXPORTER_PG_USER
          value: ccp_monitoring
        - name: EXPORTER_PG_PASSWORD
          valueFrom:
            secretKeyRef:
              name: flexis-io-dev-scm-billing-monitoring
              key: password

psql (PostgreSQL) 13.6

heitatta commented 8 months ago

It seems to me that pg_current_wal_lsn() function call caused this issue in queries.go

Other collectors use an idiom like: (case pg_is_in_recovery() when 't' then null else pg_current_wal_lsn() end) AS pg_current_wal_lsn, but not in this query. You can't call this function in PostgreSQL sending and receiving replication (this situationr happens in the "child" in parent-child-grandchid replication senario).

To avoid this error, fix this issue or --no-collector.replication_slot option might help.

DJLebedev commented 5 months ago

Same problem. PostgreSQL 14.8, postgres_exporter 0.15.0 And --no-collector.replication_slot does not fix this.

ihordyrman commented 4 months ago

I'm facing the same issue. PostgreSQL: 16.2.0 Exporter: postgres-exporter:v0.15.0

postgres-exporter ts=2024-05-14T02:00:02.451Z caller=namespace.go:236 level=info err="Error running query on database \"192.168.0.3:5432\": pg_replication_slots pq: recovery is in progress"
postgres-exporter ts=2024-05-14T02:00:02.451Z caller=postgres_exporter.go:682 level=error err="queryNamespaceMappings returned 1 errors"
postgres-exporter ts=2024-05-14T02:00:05.266Z caller=namespace.go:236 level=info err="Error running query on database \"192.168.0.2:5432\": pg_replication_slots pq: recovery is in progress"
postgres-exporter ts=2024-05-14T02:00:05.348Z caller=postgres_exporter.go:682 level=error err="queryNamespaceMappings returned 1 errors" 
postgres-exporter ts=2024-05-14T02:00:05.956Z caller=namespace.go:236 level=info err="Error running query on database \"192.168.0.2:5432\": pg_replication_slots pq: recovery is in progress"
postgres-exporter ts=2024-05-14T02:00:05.956Z caller=postgres_exporter.go:682 level=error err="queryNamespaceMappings returned 1 errors"
postgres-exporter ts=2024-05-14T02:00:08.350Z caller=namespace.go:236 level=info err="Error running query on database \"192.168.0.2:5432\": pg_replication_slots pq: recovery is in progress"
dusatvoj commented 2 months ago

Same on patroni cluster with pglogical to another cluster