munin-monitoring / contrib

Contributed stuff for munin (plugins, tools, etc...)
http://munin-monitoring.org
1.05k stars 679 forks source link

ping: Use better default values in case of failure #1441

Closed jonglezb closed 2 weeks ago

jonglezb commented 2 weeks ago

When the ping fails (e.g. unresolvable name, complete probe loss, or any other system error), the default value for ping time and packet loss is currently "U". This is not very numeric-friendly.

Instead, let's use "NaN" by default for ping times, and "100%" by default for packet loss.

kenyon commented 2 weeks ago

Why do you want to make this change?

U is used because it means something special in RRDtool. From https://oss.oetiker.ch/rrdtool/doc/rrdupdate.en.html:

If there is no data for a certain data-source, the letter U (e.g., N:0.1:U:1) can be specified.

Also documented at https://guide.munin-monitoring.org/en/latest/master/network-protocol.html#fetch, https://guide.munin-monitoring.org/en/latest/develop/plugins/plugin-concise.html#fetch, and https://guide.munin-monitoring.org/en/latest/reference/plugin.html under attribute {fieldname}.value.

jonglezb commented 2 weeks ago

Interesting, thanks, I did not know about this special rrdtool value.

We use a Nagios script (check_munin.pl from Julien Rottenberg) that connects to munin-node directly and sends alerts when a value is above a threshold. This script crashes when it receives "U" values, so I thought I needed to fix the plugin generating these values. But it seems that we need to fix our Nagios script instead.