Open davehayes opened 3 years ago
Thanks for the detailed issue. It'd be nice to get some more info from FreeBSD experts on this one.
I've been searching around various mailing lists. It seems there's information I didn't see before that might help in this case. This information is from the machine that had the bug:
# sysctl -d kern.smp
kern.smp: Kernel SMP
kern.smp.forward_signal_enabled: Forwarding of a signal to a process on a different CPU
kern.smp.topology: Topology override setting; 0 is default provided by hardware.
kern.smp.cores: Number of physical cores online
kern.smp.threads_per_core: Number of SMT threads online per core
kern.smp.cpus: Number of CPUs online
kern.smp.disabled: SMP has been disabled from the loader
kern.smp.active: Indicates system is running in SMP mode
kern.smp.maxcpus: Max number of CPUs that the system was compiled for.
kern.smp.maxid: Max CPU ID.
# sysctl kern.smp
kern.smp.forward_signal_enabled: 1
kern.smp.topology: 0
kern.smp.cores: 16
kern.smp.threads_per_core: 1
kern.smp.cpus: 16
kern.smp.disabled: 0
kern.smp.active: 1
kern.smp.maxcpus: 256
kern.smp.maxid: 31
I think this section of sysctl MIB will tell you all you need to know. I suggest using kern.smp.cores
to limit kern.cp_times
myself.
On a 14.1-RELEASE system, I have the following.
$ sysctl -d kern.smp
kern.smp: Kernel SMP
kern.smp.forward_signal_enabled: Forwarding of a signal to a process on a different CPU
kern.smp.topology: Topology override setting; 0 is default provided by hardware.
kern.smp.cores: Number of physical cores online
kern.smp.threads_per_core: Number of SMT threads online per core
kern.smp.cpus: Number of CPUs online
kern.smp.disabled: SMP has been disabled from the loader
kern.smp.active: Indicates system is running in SMP mode
kern.smp.maxcpus: Max number of CPUs that the system was compiled for.
kern.smp.maxid: Max CPU ID.
$ sysctl kern.smp
kern.smp.forward_signal_enabled: 1
kern.smp.topology: 0
kern.smp.cores: 8
kern.smp.threads_per_core: 1
kern.smp.cpus: 8
kern.smp.disabled: 0
kern.smp.active: 1
kern.smp.maxcpus: 1024
kern.smp.maxid: 7
$ sysctl kern.cp_times
kern.cp_times: 1355469 75965 1390756 319500 49046024 1365745 77106 1395159 315505 49034199 1332836 75027 1263967 529761 48986123 1364558 77122 1393860 321999 49030175 1365142 76037 1404528 312600 49029407 1368658 77946 1384338 309564 49047208 1366558 74164 1401735 315365 49029892 1360227 74035 1370883 357813 49024756
$ sysctl hw.ncpu
hw.ncpu: 8
Host operating system: output of
uname -a
FreeBSD 12.2-STABLE r368820 amd64
node_exporter version: output of
node_exporter --version
node_exporter, version 1.0.1 (branch: release-1.0, revision: 0) build user: root build date:
go version: go1.15.6
node_exporter command line flags
--collector.textfile.directory=/some/where --collector.devstat --collector.ntp
Are you running node_exporter in Docker?
No.
What did you do that produced an error?
So
hw.ncpu
is 16, that's 16 cores.machdep.hyperthreading_allowed: 0
is also set. This is a Ryzen 3950X.node_cpu_seconds_total{ cpu="30", mode="idle" }
... this value is 0. According to our discussion in matrix, that's a bug.It turns out that kern.cp_times is likely the culprit as it has a bunch of 0s appended here:
Here's the
dmesg
information on the CPU I have:What did you expect to see?
I expect to see one cpu label in
node_cpu_seconds_total
per actual CPU, with no cpu label greater than the value ofhw.ncpu
.What did you see instead?
Let C be the value of
hw.ncpu
. I sawnode_cpu_sections_total
with labels from C to 2C by 1, each with n actual value of 0.