Best practice for "missing" metrics?

SwampFalc commented 4 years ago

Allow me to first explain a bit the work I'm doing. I'm trying to build a nagios check based on the contents of a logfile of a different program I wrote. Now I'm maybe being way too defensive, but I'm trying not to assume anything about the correct format of said log.

This of course means that the Resource I built can yield metrics that represent broken/unknown state. So far I have done this by yielding Metrics whose value is None, but for example, a ScalarContext chokes on that because of course None cannot be compared to the ranges.

So I was pondering writing my own ScalarContext and whether or not Metrics that are None even make sense, when I had the idea that I could just not yield those Metrics.

After all, you can provide as many Contexts as you want to a Check, only those that get matched up to a Metric will become Results.

But as I was continuing along that train of thought, it struck me that this would lead to an issue. Missing Metrics do not lead to Results, so when Check tries to determine its state, there is no way to have a missing Metric lead to a critical state, and there's nothing you can do in a Summary to fix it. Yes I could subclass Check, but at some point I start feeling like I'm rewriting the entire library...

Which to some degree I wouldn't mind doing, but at that point I would probably turn it into a pull request. But of course, if you as the maintainer disagree with the choices made, that would be a lot of wasted effort...

And so we reach the discussion if there should not be an "official" stance about the proper practice in this case.

Option 1 - Metrics whose value is None. If this is chosen as best practice, I would recommend changing the ScalarContext to take this into account, so that people have a proper working example of the best practice.

Option 2 - Do not yield missing metrics. This would then require support for this chosen path in Check, so that a missing metric can properly lead to a Critical state. If this is not chosen, perhaps Check should still be adapted so that each Context has to be paired with a Metric (in addition to vice versa), to more clearly dissuade this approach.

Option 3 - Inspired by #3, actually. Yield a Metric with a different name, and have Contexts with both possible names ready to pair up and give a Result. This approach could do with some support, probably in Check.

Like I said, I'm willing to put my money where my mouth is and help code, but we should first find a consensus on what is the best practice in this case.

mpounsett commented 3 years ago

Apologies for the long delay responding.

I'm trying to understand what the use case is here so that I can understand the problem. Is it that you're trying to parse logs and are not able to find the metrics you're looking for?

If that's the case, then what I think you want is to return UNKNOWN from your check, which you can do by raising an exception. I think this may not be clearly described in the documentation, but if you read closely you'll see that the guarded() decorator will cause the check script to return UNKNOWN if any uncaught exception is raised.

Does that solve your problem?

SwampFalc commented 3 years ago

I'm on holiday, I'll try to remember to get back to this in two weeks.

SwampFalc commented 3 years ago

Since the project has been on the backburner for a few months now, I don't think I can usefully continue this discussion right now. I do vaguely remember finding out about the UNKNOWN return, but something about that didn't quite work out 100%... But I cannot remember exactly what. So feel free to close this, if you wish. I'll come back if/when the project goes forward again.

mpounsett / nagiosplugin

Best practice for "missing" metrics? #30