konkor / cpufreq

System Monitor and Power Manager
https://konkor.github.io/cpufreq/
GNU General Public License v3.0
572 stars 59 forks source link

Cpufreq wrongly says "system overload" / 100% whereas "top" or "gnome monitor" says 54% CPU used #106

Open Saroumane opened 5 years ago

Saroumane commented 5 years ago

Cpufreq wrongly says "system overload" and 100% whereas "top" or "gnome monitor" says 54% CPU used

CPU : Intel core i5 2500K Ubuntu 19.04 Kernel 5.0.0-13-generic Driver Intel PState PowerProfile : Balanced Governor : Performance

konkor commented 5 years ago

Gnome monitor doesn't show ROOT processes. It's different value.

EDIT: I'm using kernel's /proc/loadavg value * 100 / active threads. So it could be even higher than 💯 percentages.

Saroumane commented 5 years ago

I don't understand exactly why you speak about "root" processes, but for the record gnome system monitor can show root processes : you have to check "All processes" in preferences (see screenshot) Besides that "top" and "gnome system monitor" give consistent results.

gnome monitor

konkor commented 5 years ago

Okay. How did you understand that 54%?

Saroumane commented 5 years ago

I don't understand your question. I think nothing special about a precise value. Anyway in case it could help, I've done some tests by stressing all 4 cores with "stress-ng". After 262 seconds of 100% usage on all 4 cores, cpufreq says "148%". As you use /proc/loadavg which reflects very slowly system load increase, I understand that cpufreq should displays 400% after a full 15min test. Am I right ? So the question is : as the maximum system load seems to be 400% (for a 4core system), why cpufreq reports "system overload" when I cross 100% threshold ? (which is very easy to reach with only 25% on each core) Edit : For the record I can also have numeric CPU usage on a "400%" scale if I uncheck "Divide CPU usage by CPU count" in gnome system monitor preferences. I personally prefer "100%" scale. And it seems the "system overload" detection uses also a "100%" scale.

konkor commented 5 years ago

I'm showing for 1 minute value. So that value could be higher than CPU cores count. It's depends on consumed CPU cycles by active processes.

But I modified that value a bit to be more human readable and relatives to active CPU threads. CPUFreq LA = Kernel LA * 100 / Ncpu

In your case 148% is 148%*4/100 = 5.92 not 400%.

Gnome Monitor doesn't show more than 100%

Saroumane commented 5 years ago

On the screenshot provided, gnome monitor displays a total of 293% CPU Usage just for 4 processes. But I'm not here to defend gnome :) I just want to improve cpufreq which is already a great tool !

Do you understand the problem I described, about the "system overload" detection ? I understand it does not acknowledge how many cores the system has. In my case, I have 4 cores, so the threshold should be 400%, not 100% Same problem for the "system busy" threshold, which seems to be at 75% instead of "number of cores * 75%".

Screenshot from 2019-05-09 18-29-27

konkor commented 5 years ago

Okay. Looks like Gnome Monitor has changed if it shows more than 25 (in your case for 4 cores).

But you have to understand there is more active processes than foreground stress-ng tests threads :) kernel, dbus, systemd, xorg, video drivers, shell, gnome monitor itself and couple thousands more ;)

BTW I can revert it to just loadavg in percentages like before. But there were issues about to big values like 3200% in some cases.

So I don't really know what is better.

Saroumane commented 5 years ago

Hello, it seems the extension update you released today corrected everything ! It's all more coherent and responsive. Thanks !

Saroumane commented 5 years ago

I spoke too fast... On a long test (> 15min) problems are still there...

konkor commented 5 years ago

System Loading is a kernel parameter depended on quantity of all tasks/processes/threads and their CPU time utilization. You know OS/kernel works via Mainloops (ex. 4 mainloops for CPU with 4 core threads). Each loop has fixed amount of time for execution (ex. 10 msec). Kernel's trying to execute (give time) to all tasks. Some tasks are not active that time and after loading immediately finishing execution because they waiting for some system events, user actions or async IO operations. So kernel could run hundreds of them. But others tasks are busy/active (or blocked by sync IO operations). They could take all loop time and kernel will pause and unload them at the end of loop to give a chance for other tasks in the next loop. So if there is not loaded tasks System Loading is going to be greater then 100% * number_of_cores. I'm also dividing that System Loading on the number of active CPU cores to make it more similar to CPU Utilization and to avoid too big values.

CPU Loading/Utilization is just a proportion of idle to all processor states in the time interval. It can't be greater then 100%.

Saroumane commented 5 years ago

Ok I understand now, I never really got the difference between the 2 concepts. Until now.

konkor commented 5 years ago

@Saroumane 1. So GNOME System Monitor is trying to show average CPU Loading. If you would open the CPU Time column there you will see even processes with 0 usage increasing their CPU time. It means it's not 0 usage at all.

  1. GNOME System Monitor has shorter time interval then 1 minute of kernel loading.

Nice :)

konkor commented 5 years ago

You can find CPU Usage stats in /proc/stat System Loading - /proc/loadavg

Saroumane commented 5 years ago

So maybe to avoid further confusion, you should avoid any "%" symbol in System Load ? (The linux manual don't use it in /proc/loadavg explanations)

konkor commented 5 years ago

But this is percentages value:

System Loading = Needed CPU time / Maximum CPU time (one loop per core)

EDIT: So System Loading equals to CPU Loading until Needed CPU time less then Maximum CPU time for single mainloop.

Saroumane commented 5 years ago

What I mean : it seems common practice to write System Load as decimal value like x.yz and keep % for "CPU use" to avoid confusion. (and not divide by number of cores) Reference (among others) : https://askubuntu.com/questions/532845/what-is-system-load So maybe you should calculate and write System Load as usual OR keep it your way but use a name different than "System Load" ?

Anyway it's your plugin, you do it your way :)

ayoungethan commented 4 years ago

Is this more of an indicator of responsiveness? E.g., if the CPU Load is exceeded, it means that tasks are piling up for the CPU to do and is operating too slowly or with too few resources to handle the load in real time, so the load grows (at least temporarily), and the CPU can become bogged down with a backlog of processing tasks to complete? If so, then it is an important indicator for low latency performance :)

Does this calculation take into account only one FPU per physical core? That is often a deceptive performance measure, as people wonder why they get buffer xruns at <100% CPU utilization, it's because the FPU is >100% utilized even though the rest of the physical core is <100%, I see this as a common question in discussion forums.

konkor commented 4 years ago

@ayoungethan Eh, there are so many questions. I will try to answer to them. I don't think it is matter if there are FPU, CPU, IO or other internal activity in the cores or related on CPU hardware like USB controllers etc. If it's busy core it affects system loading.

I'm using the kernel's loadavg parameter for the last minute of activity but it's slightly modified. This parameter's using kernel itself, all admins to understand what's going on. I'm just multiplying it in 100 to get percentages and dividing it on the active online core threads to get more understandable value for an average user. This value is corresponding to CPU utilization if it's below or equal to 100%. If the value bigger then it shows how much CPU Power is needed to get all active tasks done. So admins/system could up CPU frequency, turn offline cores on, shutdown some processes etc. I think this parameter is more important then just the CPU utilization. Also it's already calculated by the kernel itself. It's like Thermal Throttle (TT) is more important then CPU temperature. If temperature is high but there is no TT it's no big worries. Otherwise, it's telling you that CPU shutdown some cores, lowed performance etc to cool CPU down...