percona / mongodb_exporter

A Prometheus exporter for MongoDB including sharding, replication and storage engines
Apache License 2.0
1.18k stars 423 forks source link

mongodb_exporter 0.32.0 could not get response when mongodb instance is down #483

Closed sitoc closed 5 months ago

sitoc commented 2 years ago

Describe the bug Currently, I used the mongodb_exporter 0.32.0 version. mongodb_exporter could not get response when mongodb instance is down, I mean that at least the metric mongodb_up should be set 0 when mongodb crash, so that prometheus alertmanager can report a alarm.

To Reproduce 1.Startup a mongodb_exporter and mongodb instance, this time everything is good, can get every metrics.

  1. I manually shutdown the mongodb instance by using 'db.shutdownServer()'
  2. and then, the mongodb_exporter could not get response. Like this as below:
    
    [root@wmdb10 j-jiajianlong-jk]# time curl http://127.0.0.1:27318/metrics
    curl: (52) Empty reply from server

real 0m10.007s user 0m0.002s sys 0m0.003s


4. If mogodb instance down, I think mongodb_exporter should set the mongodb_up to 0

**Environment**
 - OS : CentOS Linux release 7.4.1708
 - MongoDB version: 4.0
ShashankSinha252 commented 2 years ago

@sitoc Can you provide details about the issue? I see that our standard template for reporting an issue is left untouched.

sitoc commented 2 years ago

@ShashankSinha252 Rewrite the issue details please check it, looking forward to your answer, thanks a lot

ShashankSinha252 commented 2 years ago

This is a valid bug. We are looking into it.

johnsonchuu commented 2 years ago

Hi all, similar issue happened to me. But my case can be reproduced when no matter the mongo is on or off. I can visit 9216 page but if I go to localhost:9216/metrics it always returned empty reply. image

monotek commented 2 years ago

For me the exporter crashes as soon as you try to access the metrics endpoint and no mongodb is running.

 docker run -it --rm --net=host -e MONGODB_URI=mongodb://127.0.0.1:27017  percona/mongodb_exporter:0.32.0 

When i do a:

curl http://localhost:9216/metrics

I get the following error in exporter logs:

level=info ts=2022-06-02T10:21:33.917Z caller=tls_config.go:195 msg="TLS is disabled." http2=false
ERRO[0023] Cannot connect to MongoDB: cannot connect to MongoDB: server selection error: context deadline exceeded, current topology: { Type: Single, Servers: [{ Addr: 127.0.0.1:27017, Type: Unknown, Last error: connection() error occured during connection handshake: dial tcp 127.0.0.1:27017: connect: connection refused }, ] } 
2022/06/02 10:21:57 http: panic serving [::1]:40218: runtime error: invalid memory address or nil pointer dereference
goroutine 44 [running]:
net/http.(*conn).serve.func1()
    /opt/hostedtoolcache/go/1.17.8/x64/src/net/http/server.go:1802 +0xb9
panic({0xafc820, 0x11ffce0})
    /opt/hostedtoolcache/go/1.17.8/x64/src/runtime/panic.go:1047 +0x266
go.mongodb.org/mongo-driver/mongo.newDatabase(0x0, {0xbbaf32, 0x5}, {0x0, 0x30, 0xc00006f000})
    /home/runner/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/mongo/database.go:47 +0x5c
go.mongodb.org/mongo-driver/mongo.(*Client).Database(...)
    /home/runner/go/pkg/mod/go.mongodb.org/mongo-driver@v1.8.4/mongo/client.go:837
github.com/percona/mongodb_exporter/exporter.getClusterRole({0xd501e0, 0xc000201560}, 0x40d054)
    /home/runner/work/mongodb_exporter/mongodb_exporter/exporter/topology_info.go:170 +0x8b
github.com/percona/mongodb_exporter/exporter.(*topologyInfo).loadLabels(0xc00011b440, {0xd501e0, 0xc000201560})
    /home/runner/work/mongodb_exporter/mongodb_exporter/exporter/topology_info.go:103 +0xeb
github.com/percona/mongodb_exporter/exporter.newTopologyInfo({0xd501e0, 0xc000201560}, 0x0)
    /home/runner/work/mongodb_exporter/mongodb_exporter/exporter/topology_info.go:73 +0x8b
github.com/percona/mongodb_exporter/exporter.(*Exporter).Handler.func1({0xd4d5b8, 0xc0003200e0}, 0xc0002b0500)
    /home/runner/work/mongodb_exporter/mongodb_exporter/exporter/exporter.go:302 +0x3a5
net/http.HandlerFunc.ServeHTTP(0x0, {0xd4d5b8, 0xc0003200e0}, 0x463e4e)
    /opt/hostedtoolcache/go/1.17.8/x64/src/net/http/server.go:2047 +0x2f
net/http.serverHandler.ServeHTTP({0xc0002ddc50}, {0xd4d5b8, 0xc0003200e0}, 0xc0002b0500)
    /opt/hostedtoolcache/go/1.17.8/x64/src/net/http/server.go:2879 +0x43b
net/http.(*conn).serve(0xc00033e1e0, {0xd50218, 0xc0002dd500})
    /opt/hostedtoolcache/go/1.17.8/x64/src/net/http/server.go:1930 +0xb08
created by net/http.(*Server).Serve
    /opt/hostedtoolcache/go/1.17.8/x64/src/net/http/server.go:3034 +0x4e8

0.31.0 works without problems.

denisok commented 2 years ago

@monotek thanks for update to the helm! we have the #265 for that, so it will close it.

denisok commented 2 years ago

the panic probably fixed by: https://github.com/percona/mongodb_exporter/pull/388/commits/9c743bcfaf1c2f12ae736a8a5856518e95023532 https://jira.percona.com/browse/PMM-9757

monotek commented 2 years ago

But should'nt that already be released in: https://github.com/percona/mongodb_exporter/releases/tag/v0.31.1 ?

olivierboudet commented 2 years ago

Hello, same issue here using helm chart bitnami/mongodb 12.1.19 with metrics enabled.

thopewell commented 2 years ago

I think I'm seeing a similar problem, if there is no primary node, the exporter panics - I would have thought it could still collect metrics from a secondary, enough to be able to detect mongo is unhleathy via metrics.

I'm using helm chart v3.1.1 and the helm chart values.yaml looks like:

prometheus-mongodb-exporter:
  mongodb:
    uri: mongodb://admin:somepassword@mongodb01:27017,mongodb02:27017
  image:
    tag: "0.31.1"
  service:
    annotations: 
      prometheus.io/port: "9216"
      prometheus.io/scrape: "true"
  serviceMonitor:
    enabled: false
  extraArgs:
    - --collect-all
    - --no-mongodb.direct-connect

(there is actually a 3rd mongo instance in the replica set but I've temporarily disabled it and removed it from the rs to make testing a bit easier).

If I stop mongod on mongo1, the exporter panics (I'm not sure why mongo2 doesn't become PRIMARY and what effect that would have)

Before:

rs01:PRIMARY> rs.status().members.forEach(function(z){printjson(z.name);printjson(z.stateStr);})
"mongodb02:27017"
"PRIMARY"
"mongodb01:27017"
"SECONDARY"

Stop the service on mongo1, I see the exporter panics:

time="2022-08-31T17:04:41Z" level=error msg="Cannot connect to MongoDB: cannot connect to MongoDB: server selection error: context canceled, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: mongodb01:27017, Type: Unknown, Last error: connection() error occured during connection handshake: dial tcp 10.80.6.120:27017: connect: connection refused }, { Addr: mongodb02:27017, Type: RSSecondary, Average RTT: 1938756 }"
2022/08/31 17:04:41 http: panic serving 100.64.164.253:60736: runtime error: invalid memory address or nil pointer dereference
goroutine 183792 [running]:

I would have thought the exporter would still be able to collect metrics from mongo2:

rs01:PRIMARY> rs.status().members.forEach(function(z){printjson(z.name);printjson(z.stateStr);})
"mongodb02:27017"
"SECONDARY"
"mongodb01:27017"
"(not reachable/healthy)"
denisok commented 2 years ago

@ShashankSinha252 are we planning this for 0.35 ? have we identified the fix?

elghazal-a commented 1 year ago

I'm seeing similar issue with 0.35.0, the exporter crashes when mongodbdb is down. I'd expect to have mongodb_up=0

rageofgods commented 1 year ago

Hey, any updates on this?

igroene commented 5 months ago

fixed by https://github.com/percona/mongodb_exporter/pull/653