percona / mongodb_exporter

A Prometheus exporter for MongoDB including sharding, replication and storage engines
Apache License 2.0
1.16k stars 425 forks source link

Can't get metric if first host is not available (others are) #749

Closed elafontaine closed 7 months ago

elafontaine commented 9 months ago

Describe the bug

We use the same configuration of URI for different server's type, so we have a Mongo connection URI string that refer to something akin to mongodb://<REDACTED>:27018,<REDACTED>:27019/admin. The important thing to understand is that on one type of server, the port 27018 will exist, while on the other, the port 27019 will exist. The bug seems to be that only the first host is checked to establish the connection initially, causing the exporter to not expose the metric (see other bug about lost of connectivity which I guess is related). I know that wasn't case until yesterday when bitnami updated their latest version of the container (we use the bitnami container). The behaviour before yesterday was that the nodes I had with the 27019 were reported even though they didn't have anything on port 27018. Same for the node that had the port 27018 and nothing on their port 27019. Now, since the update, I lost the metrics of my nodes with port 27019, but didn't lose those with port 27018.

We tried a couple things ;


Node with Port 27019 behaviour; When we start the bitnami container, we get the following log on repeat for the container that have the port 27019 ;

time="2023-11-22T13:23:05Z" level=error msg="Cannot connect to MongoDB: cannot connect to MongoDB: server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: <REDACTED>:27018, Type: Unknown, Last error: dial tcp <REDACTED>:27018: connect: connection refused }, ] }"

The exporter is then reporting nothing and we aren't receiving any data.


Node with Port 27018 behaviour: When we start the bitnami container, we get the following log ONCE;

time="2023-11-22T12:15:54Z" level=error msg="Cannot connect to MongoDB: cannot connect to MongoDB: server selection error: server selection timeout, current topology: { Type: Unknown, Servers: [{ Addr: <REDACTED>:27019, Type: Unknown, Last error: dial tcp <REDACTED>:27019: connect: connection refused }, ] }"

The exporter is reporting the metric he observe from the port 27018.


I believe the bug is due to the mongo driver, https://github.com/percona/mongodb_exporter/blob/main/exporter/exporter.go#L31 < import https://github.com/percona/mongodb_exporter/blob/main/exporter/exporter.go#L376 < connect (?) but I'm not a specialist so I wanted some confirmation before moving on.

To Reproduce Steps to reproduce the behaviour:

  1. what parameters are being passed to mongodb_exporter A mongo URI with multiple hosts, first one being unreachable
  2. describe steps to reproduce the issue Start the exporter and observe the failure to expose the second host metrics

Expected behaviour A clear and concise description of what you expected to happen. I would expect the metric observed through the second host to be reported

Logs Please provide logs relevant to the issue (see above)

Environment

Additional context (see above)

elafontaine commented 9 months ago

We changed our mongodb-exporter to use 0.39.0 in the mean time.

pflong commented 9 months ago

@elafontaine May be you can try to start an exporter on each port instead of using multi target, it would be much simpler.

you can see my issue, https://github.com/percona/mongodb_exporter/issues/761

gthieleb commented 8 months ago

Sorry for this naive question. I would like to connect the exporter to all instances of a replica set by using a mongodb connection string with multiple ip addresses in url like example in OT. In my oppinion this is something which should also work in versions <0.40 (w/o multi target support). Is this supported by the exporter?

elafontaine commented 8 months ago

@pflong the way our IaC is done, we would like to avoid adding many variables as much as possible. This multi-target thing was working fine previously and did what we expected it to do. The fact that it's stopping after only the first one that failed is worrisome to me as I'm wondering how the failure scenario is being handled.

This feels like a golang standard thing that I'm not aware... Maybe you could point me in the right direction in regards to this as I've seen another library that started to fail in the same way as mongo-exporter.

I reported the issue so that other people could see it and maybe discuss a resolution. For now, we prefered to leave it to 0.39 until there is a resolution (if it ever comes...). However, we would prefer to leave this to "latest" as to get updates, but that would require changes in our IaC that we aren't ready to look into yet.

elafontaine commented 8 months ago

@gthieleb , I believe you may want to look at the documentation (https://github.com/percona/mongodb_exporter#example) as they support mongo connection URI. What we use is "multi-target" and that means that we target 2 endpoints on a local machine. One will exist, the other will not.

For a mongo connection URI, this issue means that if the first host in the mongo connection URI list isn't available for some reason, the exporter will just fail to connect to the 2nd and 3rd instance of the same replicaset (the connection URI string)

elafontaine commented 8 months ago

Could I get somebody else to corroborate what I said ? I want to ensure this is not just a "me" problem because of some settings somewhere that I've missed.

elafontaine commented 7 months ago

@pflong I understand that this is not a standard case, but still, in a deployment context where someone would be deploying both the mongo-exporter and the mongo on a server, if the first mongo in the list isn't available (e.g. roll-forward configs), then the whole thing fails to start. I do not believe this is a desired behaviour in general, but probably linked to the mongo library used.

elafontaine commented 7 months ago

Ok, we figured out partly what is happening.

We use New Relic for "scraping" mongo exporter, but we didn't mention any "target", so we believe only the "first" target was returned. The update to 0.40 seems to have divided the multi-host URI into individual "target" which are available when querying the individual hosts on the GET /scrape?target=<host1:port1> and GET /scrape?target=<host2:port2>.

In other words, we have adjusted to use a single host and port in the end, but by doing so we realized what was happening. BTW, our expectations were that

mongodb://<REDACTED>:27018,<REDACTED>:27019/admin

did mean

mongodb://<REDACTED>:27018/admin,mongodb://<REDACTED>:27019/admin

but never realized that prior to version 0.40, the first one was seen as a single target into the exporter (from what I can figure out). So our downfall was that we didn't use the "target" parameter of the scrape URI. It's all well documented there, but we failed to see a warning that it wasn't the same ;

https://github.com/percona/mongodb_exporter#multi-target-support