Closed fbegyn closed 5 years ago
@fbegyn I checked it out, and found that cpuinfo_cur_freq
is only readable by root, which is not allowed here, as the node_exporter
should be run as user.
See,
# ll /sys/devices/system/cpu/cpu0/cpufreq/*
-r--r--r--. 1 root root 4096 Dec 6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/affected_cpus
-r--------. 1 root root 4096 Dec 6 09:42 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
-r--r--r--. 1 root root 4096 Dec 6 09:42 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq
-r--r--r--. 1 root root 4096 Dec 6 09:42 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq
-r--r--r--. 1 root root 4096 Dec 6 09:42 /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_transition_latency
-r--r--r--. 1 root root 4096 Dec 6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/related_cpus
-r--r--r--. 1 root root 4096 Dec 6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors
-r--r--r--. 1 root root 4096 Dec 6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
-r--r--r--. 1 root root 4096 Dec 6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver
-rw-r--r--. 1 root root 4096 Dec 6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
-rw-r--r--. 1 root root 4096 Dec 6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
-rw-r--r--. 1 root root 4096 Dec 6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
-rw-r--r--. 1 root root 4096 Dec 6 09:46 /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed
# grep . /sys/devices/system/cpu/cpu0/cpufreq/{cpuinfo_cur_freq,scaling_cur_freq}
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq:1200103
/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1200103
Interesting, honestly, I didn't look that closely at the documentation. I assumed they were the same, just a rename between different kernel versions. We originally changed the reading preference to work around bugs in the RHEL kernel permissions for the proc file.
@mjtrangoni That's an old redhat kernel bug, on the upstream kernel and more recent patch versions of the RHEL kernel, the permission problem is fixed.
I guess we need to re-read the kernel docs more carefully.
/cc @rtreffer
Discovered it while comparing node-exporter metrics from a few months ago with the current ones because we couldn't make sense of it. We changed cpu goverenor to perormance
but didn't see the change on the metrics. While the expected behaviour did show up in the data a from a few months ago, it didn't now.
Couldn't wrap my head around it, so dove into the kernel a bit and found this out.
I'm not fully comprehending the use for scaling_cur_freq
, but definitely see a usecase for cpuinfo_cur_freq
.
I'll have to review the docs, but after first glance, you're probably right. We shouldn't be using the scaling_
files. sigh
Totally unrelated, performance
on Intel doesn't really do what you expect on modern cores. It still uses sleep states to throttle down the CPU. Basically it's a noop on sandy bridge and newer from my testing.
@SuperQ This is read-only upstream now. See,
$ nl -b a drivers/cpufreq/cpufreq.c|grep -B5 -A5 "cpuinfo_cur_freq, 0400"
868 return sprintf(buf, "%u\n", limit);
869 }
870 return sprintf(buf, "%u\n", policy->cpuinfo.max_freq);
871 }
872
873 cpufreq_freq_attr_ro_perm(cpuinfo_cur_freq, 0400);
874 cpufreq_freq_attr_ro(cpuinfo_min_freq);
875 cpufreq_freq_attr_ro(cpuinfo_max_freq);
876 cpufreq_freq_attr_ro(cpuinfo_transition_latency);
877 cpufreq_freq_attr_ro(scaling_available_governors);
878 cpufreq_freq_attr_ro(scaling_driver);
And here Dave Jones says why,
Reading this file causes reads from hardware on some cpufreq drivers. This can be a slow operation, so a user could degrade system performance for everyone else by repeatedly cat'ing it.
@mjtrangoni I think a number of distributions patch that file.
Looking at a couple of my machines, it appears cpuinfo_cur_freq
doesn't even exist on on them (>= 4.15).
It seems like my systems have the pcc_cpufreq
0 module loaded.
@SuperQ FWIW, quoting Documentation/admin-guide/pm/cpufreq.rst
:
cpuinfo_cur_freq
Current frequency of the CPUs belonging to this policy as obtained from
the hardware (in KHz).
This is expected to be the frequency the hardware actually runs at.
! If that frequency cannot be determined, this attribute should not
! be present.
@fbegyn Out of curiosity, what platform/driver are you using?
Since the scaling_cur_freq
and cpuinfo_cur_freq
contain different data, I think we should change the library to read both, and return float pointers if values are found.
I've created https://github.com/prometheus/procfs/pull/123 as a possible solution.
@fbegyn Out of curiosity, what platform/driver are you using?
@SuperQ Westmere CPUs and Debian 9 (I believe on 4.18 kernel). Personal system is Dell XPS13 9360 on 4.19
Is there anything more to do on this issue since #123 has been merged?
@pgier I don't think so. I lost track of this issue, but I'm closing it in relation to #123
@SuperQ
Totally unrelated, performance on Intel doesn't really do what you expect on modern cores. It still uses sleep states to throttle down the CPU. Basically it's a noop on sandy bridge and newer from my testing.
This is OT but do you have any more details here? On Ivy Bridge I've been trying to figure out how to get a stable cpu frequency for benchmarking (or even figure out how to know if I'm getting a stable clock speed).
I found setting the governor to performance
stabilized performance somewhat in my benchmarks (and made things significantly faster), but the various metrics from /sys/...
or /proc/cpuinfo
are showing cpu frequency bouncing around wildly still.
There seems to be a lot of misinformation about all of this out there...
@jberryman Yes, it's very complicated, as there's a lot of trickery going on under the hood in the hardware to perform power and thermal envelope management between cores and other parts of the chip. I don't have any specific documentation links off the top of my head. But I would look around for documentation on the various kernel modules that implement the changes. Things like intel_cstate
.
Since PR #94 it has the default behaviour of reading
scaling_cpu_freq
in favor ofcpuinfo_cur_freq
. These 2 represent different metrics in my eyes. The scaling one reflects at which frequency the linux kernel thinks the cpu runs, while cpuinfo reflects the actual HW frequency of the CPU. source: https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txtIMHO cpuinfo should still be the default value to be read here, or these 2 values should maybe represent 2 different metrics. I'd like to know the reasoning for picking the
scaling
over thecpuinfo
statistic.