Open RMerl opened 5 years ago
I can confirm that on my test pool in 8.0. The module is present but won't load with the "no such device" error. I don't know whether this means that my CPUs are not supported or a more general issue.
Maybe @rushikeshjadhav has an idea?
Since for me it worked with 7.x, and checking on kernel.org's commit log I couldn't see any commit that removed support for older device, I suspect it might be a regression. Could be a missing kernel option for example. I had a quick glance at /proc/config.gz, nothing in particular stood out to me.
I see that the module is present /lib/modules/4.19.0+1/kernel/drivers/hwmon/coretemp.ko
I get same error on my nested XCP-NG but it could be valid as no cpu temp monitors are passed through.
Need to check if there is any BIOS setting that enables this.
If it works on a factory installation of XCP-NG 7.6 and doesn't on XCP-NG 8.0 (same BIOS settings) then we can take a deeper look.
I don't remember if I ever tried it with 7.6, but it was definitely working correctly with 7.4 without any BIOS change.
Hi, Is there any update on this issue?
I have similar PC/Server as @RMerl (Qotom Q355G4 based on i5-5250U). After upgrade to xcp-ng 8.0 lm-sensors are not detecting any sensors. It was working on xcp-ng 7.6 - I checked it before the upgrade.
modprobe results in same error:
[22:56 XCP ~]# modprobe coretemp
modprobe: ERROR: could not insert 'coretemp': No such device
@rushikeshjadhav any idea what could we do in this case? testing a more recent kernel?
@olivierlambert I think we might have to back port coretemp from earlier working kernel to this one and test. @TurtleFX can you bear with us for testing and help in fixing this issue?
@rushikeshjadhav ok, I can try to help in testing.
So it seems there is not much change in coretemp.ko
itself from previous versions. Can you share o/p of # dmidecode -t processor
?
# dmidecode -t processor
# dmidecode 3.0
Getting SMBIOS data from sysfs.
SMBIOS 2.8 present.
Handle 0x0041, DMI type 4, 42 bytes
Processor Information
Socket Designation: SOCKET 0
Type: Central Processor
Family: Core i5
Manufacturer: Intel(R) Corporation
ID: D4 06 03 00 FF FB EB BF
Signature: Type 0, Family 6, Model 61, Stepping 4
Flags:
FPU (Floating-point unit on-chip)
VME (Virtual mode extension)
DE (Debugging extension)
PSE (Page size extension)
TSC (Time stamp counter)
MSR (Model specific registers)
PAE (Physical address extension)
MCE (Machine check exception)
CX8 (CMPXCHG8 instruction supported)
APIC (On-chip APIC hardware supported)
SEP (Fast system call)
MTRR (Memory type range registers)
PGE (Page global enable)
MCA (Machine check architecture)
CMOV (Conditional move instruction supported)
PAT (Page attribute table)
PSE-36 (36-bit page size extension)
CLFSH (CLFLUSH instruction supported)
DS (Debug store)
ACPI (ACPI supported)
MMX (MMX technology supported)
FXSR (FXSAVE and FXSTOR instructions supported)
SSE (Streaming SIMD extensions)
SSE2 (Streaming SIMD extensions 2)
SS (Self-snoop)
HTT (Multi-threading)
TM (Thermal monitor supported)
PBE (Pending break enabled)
Version: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
Voltage: 0.9 V
External Clock: 100 MHz
Max Speed: 1600 MHz
Current Speed: 2500 MHz
Status: Populated, Enabled
Upgrade: Socket BGA1168
L1 Cache Handle: 0x003E
L2 Cache Handle: 0x003F
L3 Cache Handle: 0x0040
Serial Number: NULL
Asset Tag: To Be Filled By O.E.M
Part Number: To Be Filled By O.E.M
Core Count: 2
Core Enabled: 2
Thread Count: 4
Characteristics:
64-bit capable
I checked the kernel.org commit history back when I was looking into this, and saw very little changes to that code indeed. I wonder if it might be a missing/misconfigured kernel config option? Would be worth comparing config.gz between 7.6 and 8.0.
Also, please try following kernel module on your XCP-NG 8 host, to read CPU capability
# wget https://gist.github.com/rushikeshjadhav/220fbe8ea68cef32bfe2a7a6ea99000d/raw/f6043688fd0b67dbeb88b8fa7e22e60acb463522/eax.ko
# insmod eax.ko
# dmesg | tail
# rmmod eax
[253233.175185] Hi!
[253233.175186] No such device
[253233.175187] vendor : 71
[253233.175191] cpuid_eax : 0
Could it be something as simple as a missing /dev node? I haven't studied the coretemp.ko code, so I don't know how it interfaces with the system.
So even though, your CPU has TM (Thermal monitor supported)
, the cpuid_eax (CPUID.06H:EAX.[7]) is returning 0. On a system where its working, the o/p is cpuid_eax : 128
.
Whats your output for # sensors-detect
?
# sensors-detect
# sensors-detect revision 3.4.0-6 (2016-06-01)
# Board: INTEL Corporation Q3XXG4-P
# Kernel: 4.19.0+1 x86_64
# Processor: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz (6/61/4)
This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.
Some south bridges, CPUs or memory controllers contain embedded sensors.
Do you want to scan for them? This is totally safe. (YES/no):
Module cpuid loaded successfully.
Silicon Integrated Systems SIS5595... No
VIA VT82C686 Integrated Sensors... No
VIA VT8231 Integrated Sensors... No
AMD K8 thermal sensors... No
AMD Family 10h thermal sensors... No
AMD Family 11h thermal sensors... No
AMD Family 12h and 14h thermal sensors... No
AMD Family 15h thermal sensors... No
AMD Family 16h thermal sensors... No
AMD Family 17h thermal sensors... No
AMD Family 15h power sensors... No
AMD Family 16h power sensors... No
Intel digital thermal sensor... No
Intel AMB FB-DIMM thermal sensor... No
Intel 5500/5520/X58 thermal sensor... No
VIA C7 thermal sensor... No
VIA Nano thermal sensor... No
Some Super I/O chips contain embedded sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no):
Probing for Super-I/O at 0x2e/0x2f
Trying family `National Semiconductor/ITE'... No
Trying family `SMSC'... No
Trying family `VIA/Winbond/Nuvoton/Fintek'... No
Trying family `ITE'... Yes
Found unknown chip with ID 0x8785
Probing for Super-I/O at 0x4e/0x4f
Trying family `National Semiconductor/ITE'... No
Trying family `SMSC'... No
Trying family `VIA/Winbond/Nuvoton/Fintek'... No
Trying family `ITE'... No
Some systems (mainly servers) implement IPMI, a set of common interfaces
through which system health data may be retrieved, amongst other things.
We first try to get the information from SMBIOS. If we don't find it
there, we have to read from arbitrary I/O ports to probe for such
interfaces. This is normally safe. Do you want to scan for IPMI
interfaces? (YES/no):
Probing for `IPMI BMC KCS' at 0xca0... No
Probing for `IPMI BMC SMIC' at 0xca8... No
Some hardware monitoring chips are accessible through the ISA I/O ports.
We have to write to arbitrary I/O ports to probe them. This is usually
safe though. Yes, you do have ISA I/O ports even if you do not have any
ISA slots! Do you want to scan the ISA I/O ports? (YES/no):
Probing for `National Semiconductor LM78' at 0x290... No
Probing for `National Semiconductor LM79' at 0x290... No
Probing for `Winbond W83781D' at 0x290... No
Probing for `Winbond W83782D' at 0x290... No
Lastly, we can probe the I2C/SMBus adapters for connected hardware
monitoring devices. This is the most risky part, and while it works
reasonably well on most systems, it has been reported to cause trouble
on some systems.
Do you want to probe the I2C/SMBus adapters now? (YES/no):
Found unknown SMBus adapter 8086:9ca2 at 0000:00:1f.3.
Sorry, no supported PCI bus adapters found.
Module i2c-dev loaded successfully.
Next adapter: SMBus I801 adapter at f040 (i2c-0)
Do you want to scan it? (YES/no/selectively):
Client found at address 0x50
Probing for `Analog Devices ADM1033'... No
Probing for `Analog Devices ADM1034'... No
Probing for `SPD EEPROM'... Yes
(confidence 8, not a hardware monitoring chip)
Probing for `EDID EEPROM'... No
Sorry, no sensors were detected.
Either your system has no sensors, or they are not supported, or
they are connected to an I2C or SMBus adapter that is not
supported. If you find out what chips are on your board, check
http://www.lm-sensors.org/wiki/Devices for driver status.
It should show similar to below.
Intel digital thermal sensor... Success!
(driver `coretemp')
Will check more.
I wish I could run 7.6 to do the same tests for an A-B compare against 8.0, unfortunately this server is in production, so that's not really possible in my case.
Let me know if you need any further tests to be done on my current server. We could possibly try inserting a module version with increased debug logging, for instance to see which actual function is returning the No such device error message.
for instance to see which actual function is returning the No such device error message
I did code that in eax.ko
What is your # cat /proc/cpuinfo | grep flags
?
See the first post for the complete cpuinfo output.
On my machine I have exactly same results of # dmidecode -t processor
, eax.ko
and sensors-detect
as @RMerl.
My cpuflags:
# cat /proc/cpuinfo | grep flags
flags : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc cpuid pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp fsgsbase bmi1 avx2 bmi2 erms rdseed adx xsaveopt
flags : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc cpuid pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp fsgsbase bmi1 avx2 bmi2 erms rdseed adx xsaveopt
flags : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc cpuid pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp fsgsbase bmi1 avx2 bmi2 erms rdseed adx xsaveopt
flags : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc cpuid pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch ssbd ibrs ibpb stibp fsgsbase bmi1 avx2 bmi2 erms rdseed adx xsaveopt
The flags
should show dtherm
which is a cpu flag used by DTS/coretemp to read temp.
Were any of you @TurtleFX @RMerl were seeing thermal information in BIOS? e.g. direct temperatures or some temperature related settings?
I can't remember if I did, it`s been a while since I've checked (this is a headless server). I only remember that lm-sensors was working with XCP-NG 7.4.
I can't reboot the server right now, but I could check it later tonight, unless @TurtleFX remembers if he's seen that information on his own end.
I can see temperature in the BIOS:
Ok, found something interesting.. cpuid_fault
which usually occurs on CPU0 and means Intel CPUID faulting is supported. Will find some more info on why cpuid would fault.
@olivierlambert it could be on XenServer 8 as well. Seems this needs to be logged on XS Bug system.
Edit: This is a new feature added to Intel procs and supported above 4.15 Ref: http://xenbits.xenproject.org/docs/xtf/test-cpuid-faulting.html
@RMerl @TurtleFX please install # yum install cpuid --enablerepo=base
and share # cpuid
o/p.
Here you go.
Here is mine cpuid output. turtletx-cpuid-output.txt
@RMerl @TurtleFX Please try this new module with more verbose information.
# wget https://gist.github.com/rushikeshjadhav/220fbe8ea68cef32bfe2a7a6ea99000d/raw/5358bc1a1de826b4df5c6864b1481d5f26eb6844/eax2.ko
# insmod eax2.ko
# dmesg | tail -n 20
# rmmod eax2
Edit : Updated eax2.ko
Here is dmseg results with eax2.ko:
[75441.485233] Hi!
[75441.485235] m->vendor 0
[75441.485236] m->family 0
[75441.485236] m->model 0
[75441.485237] m->feature 448
[75441.485237] c->vendor : 0
[75441.485238] c->family : 6
[75441.485239] c->model 61
[75441.485239] c->vendor id : GenuineIntel
[75441.485240] model name : Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
[75441.485242] cpuid_eax 0 : 0
[75441.485244] cpuid_eax 6 : 0
[75441.485245] X86_VENDOR_ANY 65535 X86_FAMILY_ANY 0 X86_MODEL_ANY 0 X86_FEATURE_ANY 0 X86_FEATURE_DTHERM 448
[75441.485246] Pass 4
[75441.485246] No X86_FEATURE_DTHERM
[75441.485246] Has X86_FEATURE_FPU
[75441.485247] No Suitable Device for Coretemp
Are you using Pool and were this fresh installs or upgrades?
Can you fetch # xe host-cpu-info
and # xl dmesg
?
@stormi can you try the same module for your test pool where coretemp
did not work and also the # xe host-cpu-info
and #xl dmesg
?
In my case, in the beginning it was a single host XenServer (hosting home router and not so important VMs), then it was upgraded to XCP-ng 7.4 or 7.5 (after XenServer licensing/features debacle, don't remember exact version), afterwards it was upgraded to XCP-ng 7.6 and 8.0.
Here is the output of #xe host-cpu-info
:
# xe host-cpu-info
cpu_count : 4
socket_count: 1
vendor: GenuineIntel
speed: 1596.232
modelname: Intel(R) Core(TM) i5-5250U CPU @ 1.60GHz
family: 6
model: 61
stepping: 4
flags: fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc rep_good nopl nonstop_tsc cpuid pni pclmulqdq monitor est ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp fsgsbase bmi1 avx2 bmi2 erms rdseed adx xsaveopt
features: 7ffafbbf-bfebfbff-00000121-2c100800
features_pv: 1fc9cbf5-f6f83203-2191cbf5-00000123-00000001-000c0329-00000000-00000000-00001000-8c000400-00000000-00000000-00000000-00000000
features_hvm: 1fcbfbff-f7fa3223-2d93fbff-00000523-00000001-001c07ab-00000000-00000000-00001000-9c000400-00000000-00000000-00000000-00000000
And the output of # xl dmesg
is in the text file:
TurtleFx-xl-dmesg.txt
I'm looking for dtherm
cpu flag which is masked in some cases.
Can you check if your MB can export temperature info via IPMI?
# yum install freeipmi --enablerepo=base
# ipmi-locate
# ipmi-sensors
Edit: Ignore it if you have in sensors-detect
Probing for `IPMI BMC KCS' at 0xca0... No
Probing for `IPMI BMC SMIC' at 0xca8... No
This machine does not have IPMI and it has those two lines in sensors-detect
.
Please install # yum install cpuid --enablerepo=base
and share # cpuid -r -1
o/p.
Essentially, something like following comes up
# cpuid -r -1
Disclaimer: cpuid may not support decoding of all cpuid registers.
CPU:
0x00000000 0x00: eax=0x0000000d ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
0x00000001 0x00: eax=0x000306c3 ebx=0x03100800 ecx=0x7ffafbff edx=0xbfebfbff
0x00000002 0x00: eax=0x76036301 ebx=0x00f0b5ff ecx=0x00000000 edx=0x00c10000
0x00000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x00000004 0x00: eax=0x1c004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
0x00000004 0x01: eax=0x1c004122 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
0x00000004 0x02: eax=0x1c004143 ebx=0x01c0003f ecx=0x000001ff edx=0x00000000
0x00000004 0x03: eax=0x1c03c163 ebx=0x03c0003f ecx=0x00001fff edx=0x00000006
0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x00042120
0x00000006 0x00: eax=0x00000077 ebx=0x00000002 ecx=0x00000009 edx=0x00000000
0x00000007 0x00: eax=0x00000000 ebx=0x000027ab ecx=0x00000000 edx=0x9c000400
Here, important is 0x00000006 0x00: eax=0x00000077
which tells that Thermal sensor is present in CPU. Ref: https://www.felixcloutier.com/x86/cpuid
In XCP-NG 8, it seems kernel module is not able to read correct value of eax register.
[91155.555157] cpuid(0x06) : eax:0 ebx:0 ecx:0 edx:0
Whereas in XCP-NG 7.x
[716094.720401] cpuid(0x06) : eax:7 ebx:1 ecx:8 edx:0
It could have been because of recent kernel level cpu mitigations, but even after setting mitigations=off
for Dom0 kernel, its not effective.
There is difference in the way kernel 4.4 (XCP-NG 7.x) used to read & understand register EAX than kernel 4.19 (XCP-NG 8).
# cpuid -r -1
Disclaimer: cpuid may not support decoding of all cpuid registers.
CPU:
0x00000000 0x00: eax=0x00000014 ebx=0x756e6547 ecx=0x6c65746e edx=0x49656e69
0x00000001 0x00: eax=0x000306d4 ebx=0x01100800 ecx=0x7ffafbbf edx=0xbfebfbff
0x00000002 0x00: eax=0x76036301 ebx=0x00f0b5ff ecx=0x00000000 edx=0x00c30000
0x00000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x00000004 0x00: eax=0x1c004121 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
0x00000004 0x01: eax=0x1c004122 ebx=0x01c0003f ecx=0x0000003f edx=0x00000000
0x00000004 0x02: eax=0x1c004143 ebx=0x01c0003f ecx=0x000001ff edx=0x00000000
0x00000004 0x03: eax=0x1c03c163 ebx=0x02c0003f ecx=0x00000fff edx=0x00000006
0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 edx=0x11142120
0x00000006 0x00: eax=0x00000077 ebx=0x00000002 ecx=0x00000009 edx=0x00000000
0x00000007 0x00: eax=0x00000000 ebx=0x021c27ab ecx=0x00000000 edx=0x9c000400
0x00000008 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x00000009 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x0000000a 0x00: eax=0x07300403 ebx=0x00000000 ecx=0x00000000 edx=0x00000603
0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 edx=0x00000001
0x0000000b 0x01: eax=0x00000004 ebx=0x00000004 ecx=0x00000201 edx=0x00000001
0x0000000c 0x00: eax=0x00000000 ebx=0x00000001 ecx=0x00000001 edx=0x00000000
0x0000000d 0x00: eax=0x00000007 ebx=0x00000340 ecx=0x00000340 edx=0x00000000
0x0000000d 0x01: eax=0x00000001 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x0000000d 0x02: eax=0x00000100 ebx=0x00000240 ecx=0x00000000 edx=0x00000000
0x0000000e 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x0000000f 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x00000010 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x00000011 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x00000012 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x00000013 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x00000014 0x00: eax=0x00000000 ebx=0x00000001 ecx=0x00000001 edx=0x00000000
0x80000000 0x00: eax=0x80000008 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000121 edx=0x2c100800
0x80000002 0x00: eax=0x65746e49 ebx=0x2952286c ecx=0x726f4320 edx=0x4d542865
0x80000003 0x00: eax=0x35692029 ebx=0x3532352d ecx=0x43205530 edx=0x40205550
0x80000004 0x00: eax=0x362e3120 ebx=0x7a484730 ecx=0x00000000 edx=0x00000000
0x80000005 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x80000006 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x01006040 edx=0x00000000
0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 edx=0x00000100
0x80000008 0x00: eax=0x00003027 ebx=0x00000000 ecx=0x00000000 edx=0x00000000
0x80860000 0x00: eax=0x00000000 ebx=0x00000001 ecx=0x00000001 edx=0x00000000
0xc0000000 0x00: eax=0x00000000 ebx=0x00000001 ecx=0x00000001 edx=0x00000000
lm-sensors 3.5.0 mentions a fix related to kernel 4.19 detection, could this provide a hint, if the actual commit is tracked down?
https://github.com/lm-sensors/lm-sensors/blob/master/CHANGES
I checked and tried the new version. However it did not change the o/p for me. I'm stumbled by the difference between cpuid
raw o/p and kernel call cpuid_eax(0x06)
o/p. Ref: https://elixir.bootlin.com/linux/v4.19/source/arch/x86/include/asm/processor.h#L626
@RMerl @TurtleFX Please get these kernel modules and execute as below I got these working on a host that was not reporting temperature earlier.
# cd /tmp/
# wget "https://gist.github.com/rushikeshjadhav/ef60707111b7b0fefe32c0c0e22effeb/raw/530e4f50eee20760f7fbab411c3287e6ea819b35/coretemp.ko"
# wget "https://gist.github.com/rushikeshjadhav/ef60707111b7b0fefe32c0c0e22effeb/raw/530e4f50eee20760f7fbab411c3287e6ea819b35/cpuid.ko"
# rmmod cpuid ; insmod cpuid.ko
# rmmod coretemp ; insmod coretemp.ko
# sensors-detect
# sensors
# lsmod | egrep 'cpuid|coretemp'
Success here :)
... snip ...
Intel digital thermal sensor... Success!
(driver `coretemp')
... snip ...
# sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +47.0°C (high = +105.0°C, crit = +105.0°C)
Same here. Installed these kernel modules:
# lsmod | egrep 'cpuid|coretemp'
coretemp 16384 0
cpuid 16384 0
Intel digital thermal sensor is detected and temperatures are shown:
# sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +40.0°C (high = +105.0°C, crit = +105.0°C)
Thanks @TurtleFX @RMerl for testing through.
Good job tracking this down.
Out of curiosity, is the issue upstream with Citrix, or upstream with kernel.org?
Looks like upstream kernel.. Checking with kernel community. But it is possible that this is an edge case because of Xen security.
We now know more about it: recent Xen overrides the thermal and power management information to guests, and the control domain also is a (privileged) guest . The reason for the override is that it never really worked correctly: dom0 would get information from the wrong real CPUs.
Unfortunately that change removes useful features.
@rushikeshjadhav may be able to give more details.
Correct. There are two CPU flags which were requirement of coretemp to function. Xen is hiding these from Dom0 as these functionalities are not correctly implemented.
PTS is Intel Package Thermal Sensor which is essentially a socket temperature instead of each core of socket. The MSR from which the temperature values are read is readable via any Domain. So one can have a Special Purpose VM pinned its vCPUs to certain pCPUs and expose thermal data.
Or a modified coretemp driver rpm which just shows Package Temperature from any CPU at its runtime.
I think having a custom version of coretemp through an RPM package would be an acceptable compromise (as long it's well documented for people setting up a new Xen server and looking into monitoring their server's health).
I've made a temporary RPM available at https://github.com/rushikeshjadhav/coretemp - This will tell package temperature. Its a kernel driver so highly experimental.
Update: @rushikeshjadhav's kernel driver is now available in our repositories.
On XCP-ng 8.0 or 8.1:
yum install coretemp-module-alt
The actual package name is coretemp-module-alt (just found it by browsing through the repo).
The actual package name is coretemp-module-alt (just found it by browsing through the repo).
Indeed! I've fixed my comment.
With XCP-NG 7.x, I was using lm-sensors to monitor the CPU temperature of my Qotom server (running on an Intel i5 5250 CPU). This was provided through the coretemp kernel module.
After upgrading to 8.0, I am no longer able to load that module, it returns "No such device", as if it wasn't detecting the CPU.
Output from /proc/cpuinfo: