munin-monitoring / munin

Main repository for munin master / node / plugins
http://munin-monitoring.org
Other
2k stars 474 forks source link

cpuspeed plugin problems #1617

Closed baranyaib90 closed 6 months ago

baranyaib90 commented 6 months ago

Dear munin developers and mainainers! I would like to share my observations and solutions with you about cpuspeed plugin.

Describe the bug

  1. Graph shows nan value when frequency is under minimum. Example: minimum is 700Mhz, my value: 699964hz.
  2. nan value is misleading! It is a number, just bellow minimum value.
  3. The CPU intensity of munin update procedure raises the frequency over the normal value. So it fakes the measurement and graph is useless by default (see attached images).

Expected behavior

  1. Allow values bellow minimum like in case of: https://github.com/munin-monitoring/munin/commit/76fe8bf5fc0a1830b967a5b1dc7e68e0060ac4f9 So adding MINHZ=$(( $MINHZ - $MINHZ / 10 )) here.
  2. Simple dash character or "oor" (out of range) would be more helpful to debug.
  3. I have added in case of Intel CPU sleep 0.2 before measuring CPU frequencies in the script (here). In my case this 200ms is enough for the Intel N100 CPU to slow down to normal frequency. For me that is an acceptable delay.

To Reproduce Just have an idle system with Alder Lake or newer Intel CPU.

Screenshots & Logs nan value when frequency is bellow minimum. I had to zoom in to provide proof: nan Graph by default: default_mess After adding 200ms sleep before measurement and fixing minhz: after_sleep

Environment

Additional context I have an Intel N100 CPU which changes frequency quite often (plenty time within a second). My setup is idle in most of the time, so mostly the CPU is running in minimum frequency.

steveschnepp commented 6 months ago

This an inherent issue with the way the plugin was writen, as it measures while it runs.

So with a 200ms delay, it give it time to slow down from the initial burst.

baranyaib90 commented 6 months ago

Yes. I'm not interested about the CPU frequency while munin update is running. I want to graph it in normal circumstances.

steveschnepp commented 6 months ago

Then you have to emit the average of the interval between 2 runs.

And that's rewriting the plugin to leverage state, as currently it isn't done.

This driver delivers only instant information about the CPU speed (at the time of the munin data collection). This is not necessarily representative for the real CPU speed history.

steveschnepp commented 6 months ago

I would suggest to look at https://github.com/munin-monitoring/contrib/blob/master/plugins/cpu/multicpu1sec or its C version to craft a cpuspeed1sec

niclan commented 6 months ago

The maddening thing here is that some (many?) years ago there was a cpu governor related /proc (or /sys) file that counted time at each CPU speed level which would be the perfect data source for this, just let rrd track the change in the counter on each speed level.

And I wrote a plugin for it which I can't find.

But it might have been related to a specific version of speed governor and CPU.

steveschnepp commented 6 months ago

Having the kernel tracking it would be perfect indeed

niclan commented 6 months ago

I've looked over everything I could think of on my intel laptop and this seems a pipe dream.

Taking performance metrics while munin-node is running the plugins is going to be a headache forever. And with the current information from /proc and /sys it seems we can only get the "instant" speed reading. I see the frequency changes very often. ... If only the cpu governor had a way to keep stats.

baranyaib90 commented 6 months ago

Just to clarify: I don't want this graph to draw the average frequency of the last 5 minutes. I know that thats not feasible. I'm fine with the instant frequency value. I only wanted to took out the munin update procedure's CPU burst from the picture, because that made a huge noise in my case (ref: "Graph by default" image). My 200ms sleep is good enough to "let the CPU change back to the frequency the system is mostly running". I hope you got my point.

I would recommend to change the plugin according to point 1 and 3 described at "Expected behavior" to mitigate the situation, since there is no easy proper solution. This plugin is enabled at almost every munin installation. It would be nice if the graph would be useful for users instead of being disabled by them.

Anyway: I'm fine with closing this ticket, I just wanted to let you know this. Thank you for checking it!

steveschnepp commented 6 months ago

Can you check #1618 ?

baranyaib90 commented 6 months ago

Yes, I just did. Perfectly fine for me. Thank you very much!