prometheus-community / postgres_exporter

A PostgreSQL metric exporter for Prometheus
Apache License 2.0
2.8k stars 739 forks source link

exclude-databases does not properly work #588

Open psvampa opened 2 years ago

psvampa commented 2 years ago

What did you do? I ran an exporter process including the auto-discover-databases and exclude-databases flags. Also pulling from a custom query files:

exec postgres_exporter --web.listen-address :9189 --disable-default-metrics --auto-discover-databases --exclude-databases postgres,template0,template1 --extend.query-path /tmp/queries.yaml

/tmp/queries.yaml file includes two different queries:

    pg_database:
      query: "SELECT pg_database.datname, pg_database_size(pg_database.datname) as size_bytes FROM pg_database"
      master: true
      metrics:
        - datname:
            usage: "LABEL"
            description: "Name of the database"
        - size_bytes:
            usage: "GAUGE"
            description: "Disk space used by the database"

    pg_database_2:
      query: "SELECT pg_database.datname, pg_database_size(pg_database.datname) as size_bytes FROM pg_database WHERE datname = current_database()"
      master: true
      metrics:
        - datname:
            usage: "LABEL"
            description: "Name of the database"
        - size_bytes:
            usage: "GAUGE"
            description: "Disk space used by the database"

DBs on my instance:

postgres=# \l
                                 List of databases
   Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
-----------+----------+----------+------------+------------+-----------------------
 nsoengas  | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =Tc/postgres         +
           |          |          |            |            | postgres=CTc/postgres+
           |          |          |            |            | nsoengas=C/postgres
 postgres  | postgres | UTF8     | en_US.utf8 | en_US.utf8 |
 template0 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.utf8 | en_US.utf8 | =c/postgres          +
           |          |          |            |            | postgres=CTc/postgres
(4 rows)

postgres=#

What did you expect to see? Since I am using exclude-databases flag and then excluding postgres,template0,template1 DBs, then my expectation is postgres exporter connecting to nsoengas DB and performing the queries on my custom file (/tmp/queries.yaml). According to those queries, pg_database should retrieve datname and size for every single database on given instance. pg_database_2 should ONLY show datname and size for the database I am connected to. I've included the proper query filter to do that. So, pg_database_2 should connect to "nsoengas" DB and report its datname and size.

What did you see instead? Under which circumstances? pg_database seems to be "showing" what it suppose to show. However, pg_database_2 is showing "postgres" as datname. Since I explicitly included the filter WHERE datname = current_database() AFAIK it is connecting to "postgres" database and retrieving its size It could be seen while scrapping on the exporter port:

[root@nsoengas-db-0 /]# curl http://localhost:9189/metrics | grep pg_database
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP pg_database_2_size_bytes Disk space used by the database
# TYPE pg_database_2_size_bytes gauge
pg_database_2_size_bytes{datname="postgres",server="localhost:5432"} 7.526936e+06
# HELP pg_database_size_bytes Disk space used by the database
# TYPE pg_database_size_bytes gauge
pg_database_size_bytes{datname="nsoengas",server="localhost:5432"} 7.58428e+06
pg_database_size_bytes{datname="postgres",server="localhost:5432"} 7.526936e+06
pg_database_size_bytes{datname="template0",server="localhost:5432"} 7.414276e+06
pg_database_size_bytes{datname="template1",server="localhost:5432"} 7.414276e+06
100 51180    0 51180    0     0  4543k      0 --:--:-- --:--:-- --:--:-- 4543k
[root@nsoengas-db-0 /]#

Environment

[root@nsoengas-db-0 ~]$ uname -srm
Linux 5.4.17-2102.202.5.el7uek.x86_64 x86_64
[root@nsoengas-db-0 ~]$
[root@nsoengas-db-0 /]# postgres_exporter --version
postgres_exporter, version 0.10.0 (branch: HEAD, revision: 57719ba53cac428769aaf3c4c0bb742df3cfca98)
  build user:       root@4dcb2c7f1315
  build date:       20210709-11:49:20
  go version:       go1.16.5
  platform:         linux/amd64
[root@nsoengas-db-0 /]#
exec postgres_exporter --web.listen-address :9189 --disable-default-metrics --auto-discover-databases --exclude-databases postgres,template0,template1 --extend.query-path /tmp/queries.yaml
postgres=# SELECT version();
                                                  version
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.15 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)

postgres=#
dginhoux commented 2 years ago

Hello,

I've the same behavior. I use v0.10 in docker. It return all databases.

guruguruguru commented 2 years ago

Hey, same here. Exclude Parameter doesnt seem to do anything. We use 0.10.0 as well

sasadangelo commented 2 years ago

Hey, same here. According to the documentation when you specify --auto-discover-databases (or env variables PG_EXPORTER_AUTO_DISCOVER_DATABASES) the databases are retrieved with the following query:

SELECT datname FROM pg_database WHERE datallowconn = true AND datistemplate = false and datname != current_database()

so postgres (that is the current database and templateX should be automatically discarded even if you don't put them in the exclude list). I verified the query on my DB and it works, but when executed by the postgres_exporter it doesn't. Can someone explain why this occurs?

sasadangelo commented 2 years ago

I think the answer to this question is here: https://github.com/prometheus-community/postgres_exporter/issues/353 see sfalkon comment.

First of all, you don't need to specify postgres, template and template1 in the exclude list. If you define a connection string referencing the database postgres, i.e. like this: DATA_SOURCE_URI=:/postgres?sslmode=disabled

the autodiscovery will be done using this query:

SELECT datname FROM pg_database WHERE datallowconn = true AND datistemplate = false and datname != current_database()

Now this doesn't mean that you don't see these DBs in your metrics:

because it depends on the king of statistics table the postgres exporter query. For some of them (global data) it is inevitable to see them. This is my understanding. Correct me if I am wrong.

sasadangelo commented 2 years ago

This is the PR that introduced the autodiscovery query mentioned above: https://github.com/prometheus-community/postgres_exporter/pull/297

alekseiplotnikov commented 2 years ago

+1

AWS RDS rdsadmin database is not accessible and it is not excluded when using --exclude-databases rdsadmin parameter.

ts=2022-04-29T11:10:53.607Z caller=log.go:168 level=info err="Error running query on database \"....eu-west-1.rds.amazonaws.com:5432\": pg_database pq: permission denied for database rdsadmin"
pbousek commented 2 years ago

+1 same on 0.11.0 and aws rds - rdsadmin table is still there although excluded

raynigon commented 2 years ago

+1

same on 0.11.1 and azure postgresql. Last version it worked for me was v0.10.0.

sysadmind commented 2 years ago

I think I understand the problem. The builtin pg_database collector is trying to collect metrics on those databases automatically. It appears that the pg_database_size() function requires connect permissions. I think the fix is to pass those excluded databases into the collector module and adjusting the built in query to exclude that list.

avdicl commented 2 years ago

+1. Still have this issue on v0.11.1 with azure psql.

yann-soubeyrand commented 2 years ago

I may be wrong, but regarding the original issue, if the Postgres exporter is connecting to the database using postgres user and no database is specified, the connection will be done to the postgres database. Since the queries specify master: true, they will be done on the connection database only (postgres in this case), hence the result for the pg_database_2_size_bytes metric. In my opinion, the original issue doesn’t indicate that auto-discovery nor database exclusion isn’t working. To get the desired behaviour, the desired database (nsoengas?) should be specified in the connection string. Again, this is my analysis of the original issue based on the elements at my disposal and I may be wrong. In addition, latest comments seem to indicate there’s really an issue with database exclusion, but it’s another issue than this one to me.

The-Seyed commented 1 year ago

Same issue while using --inlude-databases flag; pg_database_size_bytes is being calculated for all databases. v0.11.1 on Docker

yann-soubeyrand commented 1 year ago

@The-Seyed I don’t think you’re in the same situation as the one describe in this report (see my comment above for what I think is the explanation of the observed behaviour, which is not a bug). It seems to me that you’re facing the issue of the newly introduced pg_database collector (https://github.com/prometheus-community/postgres_exporter/releases/tag/v0.11.0) not respecting the --exclude-databases option. A PR is open which should fix this: https://github.com/prometheus-community/postgres_exporter/pull/697.

Tomasz-Kluczkowski commented 1 year ago

this is broken when using helm chart 4.5.0 (v0.11.0)

database: postgres v 11 (azure for postgres single server)

confirmed command for the container:

--extend.query-path=/etc/config.yaml --web.listen-address=:9187 --auto-discover-databases --exclude-databases azure_maintenance

setting:

config:
  autoDiscoverDatabases: true
  excludeDatabases:
    - "azure_maintenance"

Log from pg exporter pod (retried every minute):

ts=2023-06-28T15:05:04.771Z caller=collector.go:196 level=error msg="collector failed" name=database duration_seconds=0.102746443 err="pq: permission denied for database azure_maintenance"
ts=2023-06-28T15:06:04.923Z caller=collector.go:196 level=error msg="collector failed" name=database duration_seconds=0.254488422 err="pq: permission denied for database azure_maintenance"
ts=2023-06-28T15:07:04.772Z caller=collector.go:196 level=error msg="collector failed" name=database duration_seconds=0.102904026 err="pq: permission denied for database azure_maintenance"
ts=2023-06-28T15:08:04.758Z caller=collector.go:196 level=error msg="collector failed" name=database duration_seconds=0.090230632 err="pq: permission denied for database azure_maintenance"
ts=2023-06-28T15:09:04.799Z caller=collector.go:196 level=error msg="collector failed" name=database duration_seconds=0.130927502 err="pq: permission denied for database azure_maintenance"
ts=2023-06-28T15:10:07.409Z caller=collector.go:196 level=error msg="collector failed" name=database duration_seconds=0.187812674 err="pq: permission denied for database azure_maintenance"
ts=2023-06-28T15:11:04.765Z caller=collector.go:196 level=error msg="collector failed" name=database duration_seconds=0.097777735 err="pq: permission denied for database azure_maintenance"

any ideas?

yann-soubeyrand commented 1 year ago

Hello @Tomasz-Kluczkowski, if you have the pg_database collector enabled, either disable it or upgrade your postgres exporter.

Tomasz-Kluczkowski commented 1 year ago

Thx a lot, I reverted to 0.8.0 since trying 0.13.1 breaks some queries due to breaking changes - but then the exclude databases works properly. I will fix the queries at some point and migrate to 0.13+.

sad to see that exclude databases is deprecated though...