Closed btmc closed 1 year ago
Are you using the multi target feature (/probe) or the traditional /metrics endpoint?
To clarify, is this multiple systems scraping the exporter, which is connected to a single postgres server? Or are there multiple postgres servers?
Are you using the multi target feature (/probe) or the traditional /metrics endpoint?
I'm using /metrics endpoint.
To clarify, is this multiple systems scraping the exporter, which is connected to a single postgres server? Or are there multiple postgres servers?
Multiple systems are scraping one exporter connected to one postgres server.
Got the same, after upgrading to 0.14.0
same here. we went back to 0.13.2 and the open connections are back to normal (we went from 200-ish to 1000-ish as soon as 0.14 went up - yes we do have many dbs.)
I think I see the problem now. The instance{} is shared when using /metrics and it's limited to a single connection. I'm working on a fix to clone the instance for each scrape with a separate connection, but it's a bit more tricky to test so it make take a bit of time to work through that.
Experiencing the same
I think I see the problem now. The instance{} is shared when using /metrics and it's limited to a single connection. I'm working on a fix to clone the instance for each scrape with a separate connection, but it's a bit more tricky to test so it make take a bit of time to work through that.
For the moment I thought @btmc is one of my colleagues as we also have 3 vmagent
s scraping the same exporter 😄
But, to complicate the setup even more - we access postgres_exporter
through the exporter_exporter
, which is a dedicated reverse proxy for exporters.
So, in our case, I'm not certain that it's easy to distinguish where the scraping connections are coming from. Well, hopefully, connections from proxy are distinct enough:
tcp ESTAB 0 0 127.0.0.1:9187 127.0.0.1:45690
tcp ESTAB 0 0 127.0.0.1:45688 127.0.0.1:9187
tcp ESTAB 0 0 127.0.0.1:45690 127.0.0.1:9187
tcp ESTAB 0 0 127.0.0.1:9187 127.0.0.1:45688
This was pretty bad - Brought down one of our db servers last night. Any chance you can roll a point release?
@SuperQ this caused downtime on our servers also, when we can expect the release of the fix?
I'm unsure if that is fixed or not, as well as why it happened twice on our systems, but we use version 0.15, the exporter took 100 and then 500 connections (after updating the connections limit).
and the scraping interval is 15 seconds ...
docker images were in use: docker.io/bitnami/postgres-exporter:0.15.0-debian-11-r7 and docker.io/bitnami/postgres-exporter:0.15.0-debian-12-r13 (bitnami/postgresql helm chart in use)
that time the exporter stopped issuing metrics, which may be an important thing, but it stopped in "round" time 22:00 UTC and 23:00 UTC.
@sysadmind maybe open a new issue for this?
What did you do?
I run postgresql-exporter in an environment with three vmagents scraping the exporter.
It happens that they do it almost simultaneously every time: all three HTTP requests come before the first answer starts to be returned, I see that from tcpdump.
At the exporter side I see multiple 'collector failed' errors every scrape round, on random collector modules.
At postgres side I see the following:
At first round of scrapes, there are 3 new connections in postgres, two of them have 'select version()' as their last query and stay idle, one is functional. At every next round of scrapes, there are 2 new additional connections (previous idle connections remain), which are also idle, the first functional connection continues to be used.
I tried to run exporter version 0.13.2 in the same vmagent setup and it was fine: there are two connections at postgres side, which are being reused.
Also there are no leaks when I make HTTP requests one by one on version 0.14.0.
I guess it might be related to
sql.Open
call ininstance.setup
method, which is called on every incoming request in 0.14.0, but only once on collector initialization in 0.13.2.https://github.com/prometheus-community/postgres_exporter/blob/v0.14.0/collector/instance.go#L46
What did you expect to see?
Postgres connections are correctly handled.
What did you see instead? Under which circumstances?
Postgres connections are used up to the limit.
Environment
postgres_exporter flags:
PostgreSQL version: