htop-dev / htop

htop - an interactive process viewer
https://htop.dev/
GNU General Public License v2.0
6.53k stars 440 forks source link

CPU list must use available (selected) core numbers #1020

Open sergey-dryabzhinsky opened 2 years ago

sergey-dryabzhinsky commented 2 years ago

Machine Info:

How to reproduce:

What expected to see:

What happens realy:

Possible explanation:

As Proxmox 6.4 uses cgroups-v1, ans lxcfs not covers all sys files there is possible ways to mess things up. Inside container:

  1. /proc/cpuinfo shows right count of available cores: 2. This was htop-2 behaviour as I recall.
  2. /sys/devices/system/cpu/cpu*/online shows RIGHT count of SYSTEM cores available. But not SELECTED by hypervisor.
  3. /proc/1/status line Cpus_allowed_list shows SELECTED by hypervisor cores: Cpus_allowed_list: 7,10. -- This should be accounted.

On hypervisor:

  1. /proc/1/status line Cpus_allowed_list shows AVAILABLE cores: Cpus_allowed_list: 0-11.

Solution (linux)? - parse /proc/1/status for counting of cores available. Selection of cores may be precise: 7,10; or wide range: 0-1. Or even both: 0-1,7,10. So reading /sys/devices/system/cpu/cpu%u/cpufreq/scaling_cur_freq by selected cores will give more accurate information.

sergey-dryabzhinsky commented 2 years ago

And more: cgroups in kernels up to 3.19.3 may be affected: https://github.com/lxc/lxc/issues/427

fasterit commented 2 years ago

Duplicate of #993, solved in #995. Correct?

sergey-dryabzhinsky commented 2 years ago

Wait, I'll check.

sergey-dryabzhinsky commented 2 years ago

Yes! Looks good. Closing.

sergey-dryabzhinsky commented 2 years ago

Thou there is question.

Will htop show right CPU meters, freqs etc - if it not pointed to right available core numbers?

sergey-dryabzhinsky commented 2 years ago

I think as of scanning /sys/devices/system/cpu/cpu%u/cpufreq/scaling_cur_freq - there will be errorneus output.

sergey-dryabzhinsky commented 2 years ago

And still linux/lxc messes with cores info:

# grep 'core id' /proc/cpuinfo 
core id         : 2
core id         : 3
# grep Cpus_allowed_list /proc/1/status
Cpus_allowed_list:      2,9
sergey-dryabzhinsky commented 2 years ago

I suggest to use /proc/cpuinfo only if /proc/1/status has no cores information.

fasterit commented 2 years ago

I think that goes beyond the level of hackery I'd personally support to work around bad design decisions from lxc. I lean more towards disabling CPU temp and frequency support when inside such a container. /DLange

sergey-dryabzhinsky commented 2 years ago

Okay. May be I will make PR one day. Mark issue: future, help wanted.

fasterit commented 2 years ago

It would be good if somebody using lxc would test if the CPU meters reflect the workload running inside such a container.

sergey-dryabzhinsky commented 2 years ago

Well, it not as simple as I think.

# grep 'core id' /proc/cpuinfo 
core id     : 0
core id     : 1
core id     : 2
core id     : 3
# grep 'Cpus_all' /proc/1/status
Cpus_allowed:   fff
Cpus_allowed_list:  0-11

OpenVZ enabled only 4 cores, but allows all 12 to be selected. And there is empty /sys/devices/system/cpu/cpu*. Such a mess all these virt.systems.

sergey-dryabzhinsky commented 2 years ago

@fasterit Yes, It can't be helped. If we use container it became "detached" from HW. No matter if we pin container to selected cores - inside all messed up - /proc/cpuinfo, /sys/devices/system/cpu/cpu*. We can't be sure which real cores container uses.

So disabling features (temp, freq) is simpliest way.

sergey-dryabzhinsky commented 2 years ago

CPU meters reflect the workload

AFAIK cpu cores readed from /proc/cpuinfo will have correponding indexes in /proc/stat. So meters must reflect in-container workload.