munin-monitoring / munin

Main repository for munin master / node / plugins
http://munin-monitoring.org
Other
1.98k stars 472 forks source link

Clarify warning message when a plugin defines a DS label, but does not submit a DS value #268

Open jeffsilverm opened 9 years ago

jeffsilverm commented 9 years ago

I am running munin 2.0.19.

I'm getting "returned no data for label" errors in /var/log/munin/munin-update.log. What does this mean?

I have run my plugin with munin-run cpu_byproc and munin-run cpu_byproc config and I don't see anything obviously wrong.

munin-run cpu_byproc cpu0_user.value 263411550 cpu0_nice.value 32953 cpu0_system.value 22897578 cpu0_idle.value 4361318137 cpu0_iowait.value 110004158 cpu0_irq.value 5903 cpu0_softirq.value 3855195 ... munin-run cpu_byproc config update_rate 60 graph_title CPU time time by proc graph_args --upper-limit 100 -l 0 graph_vlabel % graph_scale no graph_category system cpu0_user.label cpu0_user cpu0_user.type DERIVE cpu0_user.min 0 cpu0_user.graph no cpu0_nice.label cpu0_nice cpu0_nice.type DERIVE cpu0_nice.min 0 cpu0_nice.graph no cpu0_system.label cpu0_system cpu0_system.type DERIVE cpu0_system.min 0 cpu0_system.graph no cpu0_idle.label cpu0_idle cpu0_idle.type DERIVE cpu0_idle.min 0 cpu0_idle.graph no cpu0_iowait.label cpu0_iowait cpu0_iowait.type DERIVE cpu0_iowait.min 0 cpu0_iowait.graph no cpu0_irq.label cpu0_irq cpu0_irq.type DERIVE cpu0_irq.min 0 cpu0_irq.graph no cpu0_softirq.label cpu0_softirq cpu0_softirq.type DERIVE cpu0_softirq.min 0 cpu0_softirq.graph no ... cpu0.label cpu0 cpu0.sum cpu0_user cpu0_system

I can telnet to the agent port 4949 and do a fetch cpu_byproc, and I see the variables coming across with no errors.

I don't know what cpu0.sum means. I assume that it is adding together the CPU time spent in user mode and CPU time spent in system mode.

The graphs seem to work, so I am wondering if the warning message is spurious? image

I found the warning message in the source code at "UpdateWorker.pm" line 614. Unfortunately, I don't understand the source code well enough to understand what it means - I don't have the context and I am reluctant to reverse engineer something as complicated as munin.

Thank you

Jeff Silverman

ssm commented 9 years ago

This means that the plugin returned a list configuration for labels which is longer than the list of data points.

Example.

If "someplugin config" returns:

foo.label "Foo"
bar.label "Bar"
zoo.label "Zoo"

and "someplugin" returns:

foo.value 1
bar.value 2

Then munin will log that "zoo" did not return a value, as it expected.

ssm commented 9 years ago

The question is, where is the bug?

steveschnepp commented 9 years ago

I think that @jeffsilverm is correct in saying that's it's difficult to understand for an end user. Yet @ssm is also right in asking what we should do.

My take on it would be :

This leads to 2 corollary points :

jeffsilverm commented 9 years ago

It would be sufficient, from my point of view, to clarify the error message. There is nothing wrong with long, descriptive error messages. You might even output the variable(s) that are missing between config and fetch.

Jeff

On Thu, Nov 6, 2014 at 6:56 AM, Steve Schnepp notifications@github.com wrote:

I think that @jeffsilverm https://github.com/jeffsilverm is correct in saying that's it's difficult to understand for an end user. Yet @ssm https://github.com/ssm is also right in asking what we should do.

My take on it would be :

  • I usually prefer to have strict checkings, therefore it's normal that the log has a WARNING severity. It it done on purpose : avoiding typos in plugins.
  • That said, it might be much easier on plugin writers to relax this, and have an optional relaxed_checking configuration variable to be able to silence those. Note that I'm in favor to keep the default.

This leads to 2 corollary points :

  • core plugins that exhibit this have to be fixed, but no need to keep a special state between runs. If the data is volatile enough to change between config and fetch, the plugin should be made dirty-config compatible instead.
  • munin-check should be able to detect those. But as it is a master-only tool, munin-run should have a new --check option and do the various checks on the node system.

— Reply to this email directly or view it on GitHub https://github.com/munin-monitoring/munin/issues/268#issuecomment-61990310 .

Jeff Silverman, linux sysadmin nine two four twentieth avenue east Seattle, WA, nine eight one one two -3507 (2O6) 329-1O94 jeffsilverm@gmail.c0m (note the zero!) http://www.commercialventvac.co http://www.commercialventvac.com/~jeffs/m Read my book, "Failure is Not an Option: How to build reliable computer systems from unreliable parts using Open Source software" http://www.commercialventvac.com/finao/index.html"

wferi commented 9 years ago

I'm hitting this from a different angle. In the second graph of a multigraph plugin, I'm loaning a data source from the first graph for computation purposes only:

graph_order replies=anycast_dns_checks.replies duration
replies.graph no
duration.cdef duration,replies,/,1000000,/

Is there a way to suppress the spurious ... returned no data for label replies warning message?

sumpfralle commented 6 years ago

@wferi: I am not sure, that I can follow your example.

Can you explain, why a field.value needs to be missing in your multigraph plugin? (in case you still remember)

@jeffsilverm wrote:

It would be sufficient, from my point of view, to clarify the error message.

How about this new phrasing of the warning?

- WARN "[WARNING] Service $service on $nodedesignation returned no data for label $ds_name";
+ WARN "[WARNING] Service $service on $nodedesignation announced '$ds_name.label' in its config, but '$ds_name.value' is missing in data";
wferi commented 6 years ago

@sumpfralle: the plugin queries 3 values from the system and produces two graphs from different pairs from them. I've got 3 RRDs, and "borrow" one data source from the first graph into the second with the code shown above, because I don't want to store the common value (replies) twice. The graphs are created just fine, but I get

Missing required attribute 'label' for data source 'replies' in service anycast_dns_duration on X
[WARNING] Service anycast_dns_duration on X returned no data for label replies

on every update run.