Open lhoss opened 6 years ago
I suggest the following solution: Add an up-gauge that get's set to 1 or 0 in the Collect-function. If the http-request returns an error, set this to 0, Collect just that metric and then return from the function.
This way we won't kill the exporter just because the namenode/resourcemanger get's killed, and we will be able to create alerts where we consume the exporters metrics.
I can add this code if you think it's a good idea @wyukawa
@wyukawa Any thoughts on this? I have the fix pretty much ready in my head. And a PR could be submitted within a day. What's your thoughts on the approach?
bug detected in journalnode-exporter (shares exact same logic) from Datatamer fork. Before the stacktrace below, got a useful error log that shows the export got a connection refused, just before the NIL pointer exception:
exception full stack trace: