wyukawa / hadoop_exporter

Hadoop exporter
Apache License 2.0
52 stars 64 forks source link

"nil pointer dereference" exception in exporter's Collect func resp.Body.Close() when http connection failed with connection refused #5

Open lhoss opened 6 years ago

lhoss commented 6 years ago

bug detected in journalnode-exporter (shares exact same logic) from Datatamer fork. Before the stacktrace below, got a useful error log that shows the export got a connection refused, just before the NIL pointer exception:

time="2017-10-23T08:33:52Z" level=error msg="Get http://localhost:8480/jmx: dial tcp 127.0.0.1:8480: getsockopt: connection refused" file="journalnode_exporter.go" line=97

exception full stack trace:

panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x402c8b]

goroutine 335365 [running]:
panic(0x816880, 0xc820010080)
/usr/lib/go-1.6/src/runtime/panic.go:464 +0x3e6
main.(*Exporter).Collect(0xc8200b8120, 0xc8202fe1e0)
/opt/prometheus/downloads/git/hadoop-exporter-52d6c8331c03775ae8d8b92e834024e1cd8959d8/journalnode/journalnode_exporter.go:99 +0x135b
vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func2(0xc820258190, 0xc8202fe1e0, 0x7f8c9296baa0, 0xc8200b8120)
/opt/prometheus/downloads/git/hadoop-exporter-52d6c8331c03775ae8d8b92e834024e1cd8959d8/src/vendor/github.com/prometheus/client_golang/prometheus/registry.go:433 +0x58
created by vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
/opt/prometheus/downloads/git/hadoop-exporter-52d6c8331c03775ae8d8b92e834024e1cd8959d8/src/vendor/github.com/prometheus/client_golang/prometheus/registry.go:434 +0x360
miono commented 5 years ago

I suggest the following solution: Add an up-gauge that get's set to 1 or 0 in the Collect-function. If the http-request returns an error, set this to 0, Collect just that metric and then return from the function.

This way we won't kill the exporter just because the namenode/resourcemanger get's killed, and we will be able to create alerts where we consume the exporters metrics.

I can add this code if you think it's a good idea @wyukawa

miono commented 5 years ago

@wyukawa Any thoughts on this? I have the fix pretty much ready in my head. And a PR could be submitted within a day. What's your thoughts on the approach?