Open Derppening opened 3 years ago
Could you provide the output of cat /proc/cpuinfo
and sensors -u
please?
We can't reproduce this on the Zen2 systems some of us have (you have Zen3).
Right, I forgot to mention that I was unable to reproduce the issue with my Zen 2 Renoir APU as well. :^)
@Derppening thank you!
If you look at your own sensors.txt
it seems your system has no temp info on the processor cores themselves.
Can you try to fix that and see whether it fixes htop's display along as well?
I am not sure what you mean. I thought this section is the temperature of the processor cores:
k10temp-pci-00c3
Adapter: PCI adapter
Tctl:
temp1_input: 35.500
Tdie:
temp2_input: 35.500
Tccd1:
temp3_input: 33.750
Tccd2:
temp4_input: 32.000
Does htop not use the k10temp
module for CPU temperature on Ryzen systems, but instead rely on the motherboard-specific temperature information (nct6798
on my ASUS system)?
You have one temp per chiplet (Tccd1/2), one temp for the die (Tdie) and one I can't guess off the top of my head (Tctl). Makes four.
You have 12 CPU cores (24 with hyperthreading), so htop would like to see 12 or 24 temperatures (plus one for the die average but that's optional). And sensors
would need to have to show this. No idea how, we have no Zen3 system around. May be @cgzones has access to one?
My understanding is that Tdie
and Tctl
are only useful for older Zen processors where an offset needs to be applied to the temperature to allow for a more optimal fan curve (See here). It is not used for Zen 3 processors.
It would be nice if there is a way to determine the topology and assign each of Tccd1
and Tccd2
to the cores under the same chiplet. However, if this is a non-trivial fix, I'd preferably like to see the previous htop 3.0.5
behavior where all CPU temperatures just use the same sensor (I presume Tdie
).
I am also noticing TEMP for second CPU not showing. When using 3.0.5 on Ubuntu 21.04 it would report the temp for each core. (not hyperthreads).
I just compiled version 3.1.0. It does not report any TEMP for second CPU. (N/A) In the image below those are cores.
This is broken in 3.0.x as well.
I get this on 3.0.5-7 off Debian:
1[||||| 9.9% 36°C] 7[|| 1.9% N/A] 13[| 1.2% 36°C] 19[ 0.0% N/A]
2[| 0.6% 39°C] 8[ 0.0% N/A] 14[ 0.0% 39°C] 20[ 0.0% N/A]
3[||| 6.9% 40°C] 9[| 0.6% 38°C] 15[ 0.0% 40°C] 21[|| 1.2% 38°C]
4[|| 1.2% N/A] 10[| 0.6% 37°C] 16[ 0.0% N/A] 22[ 0.0% 37°C]
5[| 1.9% N/A] 11[||| 4.9% 38°C] 17[| 0.6% N/A] 23[ 0.0% 38°C]
6[ 0.0% N/A] 12[ 0.0% N/A] 18[| 0.6% N/A] 24[ 0.0% N/A]
And on b7248f6cb82350b683adf42d0fd4ec917397ea05:
1[ 0.0% 36°C] 7[ 0.0% N/A] 13[ 0.0% N/A] 19[ 0.0% N/A]
2[ 0.0% 38°C] 8[ 0.0% N/A] 14[ 0.0% N/A] 20[ 0.0% N/A]
3[ 0.0% 40°C] 9[ 0.0% 37°C] 15[ 0.0% N/A] 21[ 0.0% N/A]
4[ 0.0% N/A] 10[ 0.0% 40°C] 16[ 0.0% N/A] 22[ 0.0% N/A]
5[ 0.0% N/A] 11[ 0.0% 39°C] 17[ 0.0% N/A] 23[||||||||||||||||||||||||||100.0%|N/A]
6[ 0.0% N/A] 12[ 0.0% N/A] 18[ 0.0% N/A] 24[ 0.0% N/A]
So, unrolled, 3.0.5-7 lists the temperature for CPUs 1-3,9-11,13-15,21-23 (total 12, half of them), and b7248f6cb82350b683adf42d0fd4ec917397ea05 for 1-3,9-11 (total 6, a quarter (conspicuously similar to half of the first package? but the CPU distribution is wrong)).
This system is in a 2x6x2 configuration (two sockets, E5645 in each, HT on), lscpu says:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 40 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 44
Model name: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz
Stepping: 2
Frequency boost: enabled
CPU MHz: 1595.949
CPU max MHz: 2395.0000
CPU min MHz: 1596.0000
BogoMIPS: 4787.86
Virtualization: VT-x
L1d cache: 384 KiB
L1i cache: 384 KiB
L2 cache: 3 MiB
L3 cache: 24 MiB
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23
Vulnerability Itlb multihit: KVM: Mitigation: VMX disabled
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 popcnt aes lahf_lm epb pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid dtherm ida arat flush_l1d
sensors
says:
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +35.0°C (high = +79.0°C, crit = +89.0°C)
Core 1: +35.0°C (high = +79.0°C, crit = +89.0°C)
Core 2: +36.0°C (high = +79.0°C, crit = +89.0°C)
Core 8: +34.0°C (high = +79.0°C, crit = +89.0°C)
Core 9: +39.0°C (high = +79.0°C, crit = +89.0°C)
Core 10: +35.0°C (high = +79.0°C, crit = +89.0°C)
coretemp-isa-0001
Adapter: ISA adapter
Core 0: +35.0°C (high = +79.0°C, crit = +89.0°C)
Core 1: +38.0°C (high = +79.0°C, crit = +89.0°C)
Core 2: +40.0°C (high = +79.0°C, crit = +89.0°C)
Core 8: +38.0°C (high = +79.0°C, crit = +89.0°C)
Core 9: +38.0°C (high = +79.0°C, crit = +89.0°C)
Core 10: +38.0°C (high = +79.0°C, crit = +89.0°C)
These are oddly non-continuous and match the populated output ranges (0-2,8-10), at least where one is populated at all.
Okay, had access to an Zen3 system today. Here's some information I collected.
For further details feel free to ask …
So the tl;dr from @BenBE's info here is that a Zen3 system, even with kernel 5.13, does not expose the core temperatures in a way that libsensors is able to read them.
So the tl;dr from @BenBE's info here is that a Zen3 system, even with kernel 5.13, does not expose the core temperatures in a way that libsensors is able to read them.
I don't think that is true. The Zen3 system @BenBE had access to is a 5800H, which is a Zen 3 APU. However, the Zen 3 APU temperature reporting in k10temp
has only been merged into the kernel in 5.15 (see here). For my 5900X, support for the sensor has been added since Linux 5.12 (see here).
In addition, I think 5800H APUs uses 8-core CCDs, and since 5800H only has 8 cores, it might mean that only Tctl
will be exposed. I have cross-checked with a friend's 5900HS laptop (which also has 8 cores) and it does only show Tctl
in sensors
.
The following are sensors -u
and /proc/cpuinfo
outputs for my 5900X:
I don't think that is true.
Hm? Your sensors -u
doesn't give any CPU core temperatures either. Just the chiplet temperatures Tccd1
and Tccd2
.
If you are referring to the temperatures of the individual cores, then yeah, it seems that the k10temp
driver doesn't expose it.
I still think that, if possible, htop
should show Tctl
(or Tdie
if the measurement is available) for the Average CPU [Bar]
, and based on the topology of the processor, show Tccd1
and Tccd2
on all of the cores of each chiplet.
This is with an Intel(R) Core(TM) i3-6100T CPU @ 3.20GHz
:
Avg[| 0.3% 36°C]
0[| 0.0% 36°C] 2[| 1.3% 31°C]
1[ 0.0% 31°C] 3[ 0.0% N/A]
And sensors -u
output:
power_meter-acpi-0
Adapter: ACPI interface
power1:
power1_average: 4294967.295
power1_average_interval: 4294967.295
acpitz-acpi-0
Adapter: ACPI interface
temp1:
temp1_input: 27.800
temp1_crit: 119.000
temp2:
temp2_input: 29.800
temp2_crit: 119.000
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:
temp1_input: 36.000
temp1_max: 84.000
temp1_crit: 100.000
temp1_crit_alarm: 0.000
Core 0:
temp2_input: 31.000
temp2_max: 84.000
temp2_crit: 100.000
temp2_crit_alarm: 0.000
Core 1:
temp3_input: 31.000
temp3_max: 84.000
temp3_crit: 100.000
temp3_crit_alarm: 0.000
pch_skylake-virtual-0
Adapter: Virtual device
temp1:
temp1_input: 37.500
AMD Ryzen 5 5600H with Radeon Graphics / Ubuntu Jammy / 5.15 kernel
AMD Ryzen 5 3600X 6-Core Processor / Ubuntu Jammy / 5.15 kernel
I've just updated my OS to Ubuntu 22.04 LTS, and now my core temps are gone - i'd like them back, to state the obvious.
htop --version htop 3.0.5
Observation: it worked fine on 21.04 Hirsute Hippo, also htop 3.0.5 Deduction: something must've changed, underneath, so htop can't gather per-core temps anymore
Where should i file this bug next? lm-sensors/libsensors5?
How do we drive this issue forward, so it gets resolved in a timely manner? Any duplicates, dependencies - WHAT is the actual issue here?
EDIT: my box is an AMD Ryzen 5800X on B550 (ASUS TUF Gaming B550-Plus) - nct6775, i guess - that's what sensors-detect
told me.
The nct6775 kernel mod is loaded, and sensors
says
~ $ sensors
k10temp-pci-00c3
Adapter: PCI adapter
Tctl: +42.9°C
Tccd1: +32.8°C
it doesn't even tell me voltages anymore - so, is the lm-sensors code to blame?
it doesn't even tell me voltages anymore - so, is the lm-sensors code to blame?
Most likely nct6775
is blocked due to conflicts in ACPI resources (see here).
Your kernel logs should show the following:
nct6775: Enabling hardware monitor logical device mappings.
nct6775: Found NCT6793D or compatible chip at 0x2e:0x290
ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\_GPE.HWM) (20210730/utaddress-204)
ACPI: OSL: Resource conflict; ACPI support missing from driver?
You may try to upgrade to a newer kernel (5.17 is the first kernel version which implements reading from ASUS's own motherboard sensors without using nct6775
), or workaround this by adding acpi_enforce_resources=lax
to the kernel boot parameters (USE AT YOUR OWN RISK).
Either way, the voltage issue does not appear to be related to the CPU temperature issue. The CPU temperature issue seems to stem from the fact that:
Tctl
and Tccd*
is supported by the kernelhtop
reads Tctl
as the CPU's temperature and displays it in the Average CPU
bar (which is correct)htop
reads Tccd1
as CPU core 1 (so the second core)'s temperature and displays it in the CPU 1
barhwmon
and lm-sensors
This algorithm would work on most Intel processors (it does on my i9-9900K), since each core has its own core temperature. However, for AMD CPUs, where temperatures are exposed on a per-CCX level, the new algorithm does not handle this, and so it defaults to the current display of displaying the CCX temperature as the core temperature.
How do we drive this issue forward, so it gets resolved in a timely manner?
I have suggested in an earlier reply that maybe htop can determine the topology of AMD CPUs (i.e. which cores belong to which CCX) using some way (whatever lstopo
uses?), and then assign all cores of each CCX to the temperature of the CCX.
Optimally, this issue should be fixed in the kernel (if Windows can read the temperature of each core, what's stopping Linux from doing so?). Unfortunately, I don't know enough to start tinkering with that, and I don't know any efforts to rectify this as of this moment.
EDIT: I guess another way could be just to blacklist AMD CPUs from using the new temperature detection algorithm, and revert to an older working version?
I have a 5700X and was experiencing this issue myself using zenpower - I "fixed"(worked-around) it by putting
chip "zenpower-pci-00c3"
ignore temp3
into an appropriate sensors.d
file. Obviously this means I'm losing that information but it makes the htop output look more sensible, at least for now. I hope someone can work out how to assign the CCD temperatures to the cores from the topology at some point.
For clarity, my -u
output is;
zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:
in1_input: 0.882
SVI2_SoC:
in2_input: 1.007
Tdie:
temp1_input: 33.125
temp1_max: 95.000
Tctl:
temp2_input: 33.125
Tccd1:
temp3_input: 32.250
SVI2_P_Core:
power1_input: 9.878
SVI2_P_SoC:
power2_input: 3.260
SVI2_C_Core:
curr1_input: 12.517
SVI2_C_SoC:
curr2_input: 2.943
After upgrading from htop 3.0.5 to 3.1.0, htop displays temperatures for only 3 cores, and marks the temperature of other cores as
N/A
. When cross-referencing with the output fromsensors
, it appears that htop may have been treatingTdie
,Tccd1
andTccd2
as three cores.For reference, the following screenshot is taken using
htop 3.1.0
on a Ryzen 9 5900X system.I also ran a
git bisect
and got the following: