percona / mongodb_exporter

A Prometheus exporter for MongoDB including sharding, replication and storage engines
Apache License 2.0
1.18k stars 423 forks source link

Add getReplicationInfo to available metrics #270

Closed caiorcferreira closed 4 months ago

caiorcferreira commented 3 years ago

The easiest way to track oplog size and oplog window is through db.getReplicationInfo().

They are critical metrics missing on the project right know.

percona-csalguero commented 3 years ago

Hello,

Would you like to create a Jira ticket with more detailed information? You can do that at our project's Jira and also, if you want to provide a fix and need help to start, just ping me and I'll be glad to help you.

Thanks. Regards

caiorcferreira commented 3 years ago

Hi @percona-csalguero,

I was setuping up the project locally in order to provide this feature but got some problems with the tests.

Steps:

  1. Run make test-cluster
  2. Run make test

Output of the tests:

goroutine 356 [IO wait]:
internal/poll.runtime_pollWait(0xd42c900, 0x77, 0xc00001c180)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xc00038c798, 0x77, 0x4d0f500, 0xc000286ba0, 0xc00038c780)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitWrite(...)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/internal/poll/fd_poll_runtime.go:96
internal/poll.(*FD).WaitWrite(...)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/internal/poll/fd_unix.go:498
net.(*netFD).connect(0xc00038c780, 0x4d0f520, 0xc000286ba0, 0x0, 0x0, 0x4cff020, 0xc00009c3e0, 0x0, 0x0, 0x0, ...)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:152 +0x257
net.(*netFD).dial(0xc00038c780, 0x4d0f520, 0xc000286ba0, 0x4d14c20, 0x0, 0x4d14c20, 0xc0004c5440, 0x0, 0x1, 0xc00079a390)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/sock_posix.go:149 +0xff
net.socket(0x4d0f520, 0xc000286ba0, 0x4a2e7e7, 0x3, 0x2, 0x1, 0x0, 0x0, 0x4d14c20, 0x0, ...)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/sock_posix.go:70 +0x1c0
net.internetSocket(0x4d0f520, 0xc000286ba0, 0x4a2e7e7, 0x3, 0x4d14c20, 0x0, 0x4d14c20, 0xc0004c5440, 0x1, 0x0, ...)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/ipsock_posix.go:141 +0x141
net.(*sysDialer).doDialTCP(0xc00038c700, 0x4d0f520, 0xc000286ba0, 0x0, 0xc0004c5440, 0x49454a0, 0x5349530, 0x0)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/tcpsock_posix.go:65 +0xc2
net.(*sysDialer).dialTCP(0xc00038c700, 0x4d0f520, 0xc000286ba0, 0x0, 0xc0004c5440, 0x4069df0, 0xc00079a5a0, 0xb691af3c)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/tcpsock_posix.go:61 +0xd7
net.(*sysDialer).dialSingle(0xc00038c700, 0x4d0f520, 0xc000286ba0, 0x4d04fe0, 0xc0004c5440, 0x0, 0x0, 0x0, 0x0)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/dial.go:581 +0x60a
net.(*sysDialer).dialSerial(0xc00038c700, 0x4d0f520, 0xc000286ba0, 0xc00009bd30, 0x1, 0x1, 0x0, 0x0, 0x0, 0x0)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/dial.go:549 +0x14f
net.(*Dialer).DialContext(0xc000286b40, 0x4d0f4a0, 0xc000645b40, 0x4a2e7e7, 0x3, 0xc00003caf0, 0x10, 0x0, 0x0, 0x0, ...)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/dial.go:426 +0x6d8
go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*connection).connect(0xc00038b180, 0x4d0f4a0, 0xc000645b40)
    /Users/caioferreira/workspace/b2w/persistencia/mongodb_exporter/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology/connection.go:136 +0x242
go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).setupHeartbeatConnection(0xc0004648c0, 0x0, 0x0)
    /Users/caioferreira/workspace/b2w/persistencia/mongodb_exporter/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology/server.go:590 +0x115
go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).check(0xc0004648c0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /Users/caioferreira/workspace/b2w/persistencia/mongodb_exporter/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology/server.go:637 +0x92a
go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).update(0xc0004648c0)
    /Users/caioferreira/workspace/b2w/persistencia/mongodb_exporter/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology/server.go:493 +0x350
created by go.mongodb.org/mongo-driver/x/mongo/driver/topology.(*Server).Connect
    /Users/caioferreira/workspace/b2w/persistencia/mongodb_exporter/vendor/go.mongodb.org/mongo-driver/x/mongo/driver/topology/server.go:198 +0x201

goroutine 494 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000572480, 0xc0001a7f80, 0xc0005719e0, 0xc000571980)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 493 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc0005723c0, 0xc0001a7e80, 0xc000571740, 0xc0005716e0)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 601 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000164f60, 0xc000334a00, 0xc000400ea0, 0xc000400e40)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 510 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000286420, 0xc000453e80, 0xc0000ff380, 0xc0000ff320)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 521 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000164a20, 0xc000334300, 0xc000181ce0, 0xc000181c80)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 531 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000572840, 0xc00022e480, 0xc00023c660, 0xc00023c600)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 505 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000286000, 0xc000453080, 0xc0000fe720, 0xc0000fe6c0)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 530 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000572780, 0xc00022e380, 0xc00023c3c0, 0xc00023c360)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 478 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc0005e2ae0, 0xc0005f4100, 0xc0005eeb40, 0xc0005eeae0)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 516 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc0005e28a0, 0xc0005c1e00, 0xc000181620, 0xc0001815c0)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 533 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000572900, 0xc00022e580, 0xc00023c900, 0xc00023c8a0)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 545 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000572d80, 0xc00022eb80, 0xc00023d7a0, 0xc00023d740)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 564 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc0005e2d20, 0xc0005f4400, 0xc0005ef200, 0xc0005ef1a0)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 509 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000286300, 0xc000453d80, 0xc0000ff0e0, 0xc0000ff080)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 480 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc0005e2ba0, 0xc0005f4200, 0xc0005eed80, 0xc0005eed20)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 511 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc0002865a0, 0xc000453f80, 0xc0000ff620, 0xc0000ff5c0)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 546 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000286660, 0xc00038c080, 0xc0000ff860, 0xc0000ff800)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 506 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc0002860c0, 0xc000453a80, 0xc0000fe960, 0xc0000fe900)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 518 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc000164900, 0xc000334200, 0xc000181aa0, 0xc000181a40)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f

goroutine 534 [select]:
net.(*netFD).connect.func2(0x4d0f520, 0xc0005729c0, 0xc00022e680, 0xc00023cba0, 0xc00023cb40)
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:129 +0xba
created by net.(*netFD).connect
    /Users/caioferreira/.asdf/installs/golang/1.14.6/go/src/net/fd_unix.go:128 +0x22f
FAIL    github.com/percona/mongodb_exporter/exporter    32.199s
?       github.com/percona/mongodb_exporter/internal/tu [no test files]
FAIL

This is logged repeatedly, hence I belive this is due to some loop. Do you have any clues?

Env:

percona-csalguero commented 3 years ago

Hello. Could you try with a newer Go version and modules enabled? I cannot see the errors:

go test -v -timeout 30s ./...
=== RUN   TestBuildExporter
time="2021-05-21T09:49:27-03:00" level=debug msg="Compatible mode: true"
time="2021-05-21T09:49:27-03:00" level=debug msg="Connection URI: mongodb://usr:pwd@127.0.0.1/"
--- PASS: TestBuildExporter (0.00s)
PASS
ok      github.com/percona/mongodb_exporter     0.008s
=== RUN   TestCollStatsCollector
--- PASS: TestCollStatsCollector (0.07s)
=== RUN   TestDebug
--- PASS: TestDebug (0.00s)
=== RUN   TestDiagnosticDataCollector
--- PASS: TestDiagnosticDataCollector (0.03s)
=== RUN   TestAllDiagnosticDataCollectorMetrics
--- PASS: TestAllDiagnosticDataCollectorMetrics (0.05s)
=== RUN   TestConnect
=== RUN   TestConnect/Connect_without_SSL
=== RUN   TestConnect/Test_per-request_connection
=== RUN   TestConnect/Test_global_connection
--- PASS: TestConnect (0.49s)
    --- PASS: TestConnect/Connect_without_SSL (0.01s)
    --- PASS: TestConnect/Test_per-request_connection (0.27s)
    --- PASS: TestConnect/Test_global_connection (0.21s)
=== RUN   TestGeneralCollector
time="2021-05-21T09:49:27-03:00" level=error msg="error while checking mongodb connection: client is disconnected. mongo_up is set to 0"
--- PASS: TestGeneralCollector (0.00s)
=== RUN   TestIndexStatsCollector
--- PASS: TestIndexStatsCollector (0.12s)
=== RUN   TestSanitize
=== RUN   TestSanitize/With_building
=== RUN   TestSanitize/Without_building
--- PASS: TestSanitize (0.00s)
    --- PASS: TestSanitize/With_building (0.00s)
    --- PASS: TestSanitize/Without_building (0.00s)
=== RUN   TestMetricName
--- PASS: TestMetricName (0.00s)
=== RUN   TestPrometeusize
--- PASS: TestPrometeusize (0.00s)
=== RUN   TestMakeRawMetric
--- PASS: TestMakeRawMetric (0.00s)
=== RUN   TestRawToCompatibleRawMetric
--- PASS: TestRawToCompatibleRawMetric (0.00s)
=== RUN   TestReplsetStatusCollector
--- PASS: TestReplsetStatusCollector (0.00s)
=== RUN   TestReplsetStatusCollectorNoSharding
--- PASS: TestReplsetStatusCollectorNoSharding (0.00s)
=== RUN   TestSecondaryLag
    secondary_lag_test.go:58: This is failing in GitHub actions. Cannot make secondary to lag behind
--- SKIP: TestSecondaryLag (0.00s)
=== RUN   TestServerStatusDataCollector
--- PASS: TestServerStatusDataCollector (0.02s)
=== RUN   TestTopologyLabels
--- PASS: TestTopologyLabels (0.00s)
=== RUN   TestWalkTo
--- PASS: TestWalkTo (0.00s)
=== RUN   TestMakeLockMetric
--- PASS: TestMakeLockMetric (0.00s)
=== RUN   TestAddLocksMetrics
--- PASS: TestAddLocksMetrics (0.00s)
=== RUN   TestSumMetrics
=== RUN   TestSumMetrics/timeAcquire
=== RUN   TestSumMetrics/timeAcquire#01
--- PASS: TestSumMetrics (0.00s)
    --- PASS: TestSumMetrics/timeAcquire (0.00s)
    --- PASS: TestSumMetrics/timeAcquire#01 (0.00s)
=== RUN   TestCreateOldMetricFromNew
--- PASS: TestCreateOldMetricFromNew (0.00s)
PASS
ok      github.com/percona/mongodb_exporter/exporter    0.825s
?       github.com/percona/mongodb_exporter/internal/tu [no test files]
Iliyass commented 3 years ago

any update on this? I'm interested in these metrics.

Thanks

daniel-shuy commented 2 years ago

Isn't this already captured by the getDiagnosticData collector under local.oplog.rs.stats (aliased as oplog_stats)? see https://github.com/percona/mongodb_exporter/blob/v0.34.0/exporter/testdata/get_diagnostic_data.json#L3-L20

jeffersongirao commented 2 years ago

@daniel-shuy, I don't think so. For instance, tFirst and tLast from db.getReplicationInfo() does not seem to be captured anywhere. I've ended up here when trying to calculate the oplog window, pretty much the same context as the op.

daniel-shuy commented 2 years ago

@jeffersongirao If I'm not mistaken, local.oplog.rs.stats_start is the tFirst and local.oplog.rs.stats_end is the tLast.

Maybe someone at Percona can kindly confirm if this is true or not?

jeffersongirao commented 2 years ago

@daniel-shuy thanks for pointing out to that but unfortunately it seems those are different fields. The timestamp values do not correspond to the head and tail entries from the oplog, neither at the exporter or at the source.

vineelyalamarthy commented 2 years ago

Happy to help on this issue. But I need some time.

bvalente commented 7 months ago

As a workaround, to get the oplog window, I divided the size of the local database, by the rate of replication network bytes (available in server status). This gives a rough estimate of the time that replication takes to fill in the oplog.rs collection.

I would prefer to have the getReplicationInfo metrics still.

What would be the best suited project to create a ticket in?

igroene commented 4 months ago

you can calculate the oplog window using the following:

avg by (service_name) (mongodb_mongod_replset_oplog_head_timestamp{service_name=~"$service_name"}-mongodb_mongod_replset_oplog_tail_timestamp{service_name=~"$service_name"})

oplog size is exposed as mongodb_oplog_stats_storageStats_size does this help? or you still see a reason for capturing db.getReplicationInfo() ?