mjbeverley / inxi

Automatically exported from code.google.com/p/inxi
0 stars 0 forks source link

Ignore wrong CPUTIN and use PECI 0 instead for sensor output #58

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

inxi -s

What is the expected output? What do you see instead?

It is expected that the correct CPU temperature of 43°C instead of 127.5°C is 
displayed. Inxi uses a wrong value from CPUTIN. Although it is reported for the 
nct6775 kernel module that various ASUS mainboards with NCT6776F report a 
unreasonable high temperature, it seems that this problem also affects MSI 
mainboards with NCT6779D (MSI H87M-G43). It is advised to ignore CPUTIN and use 
PECI 0 instead. (see https://github.com/groeck/nct6775 under "Usage Notes")

What version of the product are you using? On what operating system?

inxi 2.1.20-00 (2014-04-08) on Arch Linux x86_64 with Kernel 3.14.1-1-ARCH

Please paste your inxi output below.

Sensors:   System Temperatures: cpu: 127.5C mobo: 38.0C gpu: 34C 
           Fan Speeds (in rpm): cpu: N/A fan-1: 0 fan-2: 718 fan-3: 0 fan-4: 0 fan-5: 0 

Please paste your 'cat /proc/cpuinfo' output below.

processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
stepping    : 3
microcode   : 0x12
cpu MHz     : 832.000
cache size  : 6144 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp 
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 
fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer 
aes xsave avx f16c rdrand lahf_lm abm ida arat xsaveopt pln pts dtherm 
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep 
bmi2 erms invpcid rtm
bogomips    : 6402.30
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
stepping    : 3
microcode   : 0x12
cpu MHz     : 832.000
cache size  : 6144 KB
physical id : 0
siblings    : 4
core id     : 1
cpu cores   : 4
apicid      : 2
initial apicid  : 2
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp 
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 
fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer 
aes xsave avx f16c rdrand lahf_lm abm ida arat xsaveopt pln pts dtherm 
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep 
bmi2 erms invpcid rtm
bogomips    : 6402.30
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
stepping    : 3
microcode   : 0x12
cpu MHz     : 832.000
cache size  : 6144 KB
physical id : 0
siblings    : 4
core id     : 2
cpu cores   : 4
apicid      : 4
initial apicid  : 4
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp 
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 
fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer 
aes xsave avx f16c rdrand lahf_lm abm ida arat xsaveopt pln pts dtherm 
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep 
bmi2 erms invpcid rtm
bogomips    : 6402.30
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model       : 60
model name  : Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz
stepping    : 3
microcode   : 0x12
cpu MHz     : 832.000
cache size  : 6144 KB
physical id : 0
siblings    : 4
core id     : 3
cpu cores   : 4
apicid      : 6
initial apicid  : 6
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat 
pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp 
lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 
fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer 
aes xsave avx f16c rdrand lahf_lm abm ida arat xsaveopt pln pts dtherm 
tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep 
bmi2 erms invpcid rtm
bogomips    : 6402.30
clflush size    : 64
cache_alignment : 64
address sizes   : 39 bits physical, 48 bits virtual
power management:

please paste your 'cat /proc/meminfo' output below.

MemTotal:        8108100 kB
MemFree:         1818136 kB
MemAvailable:    4503396 kB
Buffers:          210356 kB
Cached:          2617660 kB
SwapCached:            0 kB
Active:          3644580 kB
Inactive:        2144496 kB
Active(anon):    2965384 kB
Inactive(anon):    67876 kB
Active(file):     679196 kB
Inactive(file):  2076620 kB
Unevictable:          16 kB
Mlocked:              16 kB
SwapTotal:       8388604 kB
SwapFree:        8388604 kB
Dirty:               216 kB
Writeback:             0 kB
AnonPages:       2961168 kB
Mapped:           383892 kB
Shmem:             72220 kB
Slab:             224964 kB
SReclaimable:     182836 kB
SUnreclaim:        42128 kB
KernelStack:        5904 kB
PageTables:        63212 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    12442652 kB
Committed_AS:    9380104 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      156872 kB
VmallocChunk:   34359550460 kB
HardwareCorrupted:     0 kB
AnonHugePages:    823296 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      279000 kB
DirectMap2M:     8036352 kB
DirectMap1G:           0 kB

please paste your 'sensors' output below.

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +43.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:         +40.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:         +41.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:         +40.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:         +42.0°C  (high = +80.0°C, crit = +100.0°C)

nct6779-isa-0a00
Adapter: ISA adapter
in0:                    +0.88 V  (min =  +0.00 V, max =  +1.74 V)
in1:                    +1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                    +1.11 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                    +0.90 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                    +1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                    +3.42 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                    +3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                    +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                   +0.03 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                   +0.00 V  (min =  +0.00 V, max =  +0.00 V)
in12:                   +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                   +0.74 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                   +0.72 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                     0 RPM  (min =    0 RPM)
fan2:                   713 RPM  (min =    0 RPM)
fan3:                     0 RPM  (min =    0 RPM)
fan4:                     0 RPM  (min =    0 RPM)
fan5:                     0 RPM  (min =    0 RPM)
SYSTIN:                 +38.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  
sensor = CPU diode
CPUTIN:                +127.5°C  (high = +80.0°C, hyst = +75.0°C)  ALARM  
sensor = CPU diode
AUXTIN0:                +23.0°C    sensor = thermistor
AUXTIN1:                +91.0°C    sensor = thermistor
AUXTIN2:               -128.0°C    sensor = thermistor
AUXTIN3:                -12.0°C    sensor = thermal diode
PECI Agent 0:           +43.0°C  
PCH_CHIP_CPU_MAX_TEMP:   +0.0°C  
PCH_CHIP_TEMP:           +0.0°C  
PCH_CPU_TEMP:            +0.0°C  
intrusion0:            ALARM
intrusion1:            ALARM
beep_enable:           disabled

Original issue reported on code.google.com by inkasso....@gmail.com on 27 Apr 2014 at 6:18

GoogleCodeExporter commented 9 years ago
http://www.spinics.net/lists/lm-sensors/msg37308.html
"No. Problem is that PECI does not report an absolute temperature, but the
difference to Tjmax, which is the maximum CPU temperature (reported as critical
temperature by coretemp). The NCT6776F has a register which needs to be 
programmed
to that value. Usually that is done in the BIOS. Looks like the BIOS did not
program the correct value on your board."

http://www.motherboardpoint.com/exactly-pecl-agent-temp-opposed-cpu-temp-t166984
.html

http://askubuntu.com/questions/373154/sensors-command-from-lm-sensors-keeps-repo
rting-the-same-temperature
http://en.wikipedia.org/wiki/Platform_Environment_Control_Interface

"From a control standpoint, the main difference between PECI and the previously 
used thermal monitoring methods is that PECI reports a negative value 
expressing the difference between the current temperature and the thermal 
throttle point (at which the CPU reduces speed or shuts down to prevent damage 
due to overheating) instead of the absolute temperature. For example, for a CPU 
with maximal temperature of 85 °C and a current temperature reading of 35 °C, 
the value reported by PECI would be −50 °C."

I don't believe there's a single solution to your issue. Some users report PECI 
being wrong, too cold, then there's that above statement about what it actually 
reports. I assume lm-sensors takes that and processes it back to something 
closer to reality in terms of raw temp, not diff of temps.

I can't fix one set of data for this without breaking another.

More data is required, sorry. For your specific case, it would fix it, and be 
more accurate, but for another case, it would break it, and make it wrong to do 
that fix.

sensors are a total mess, always have been, and I assume always will be.

Note sure how to handle this one to be honest.

I can see one case where if temp is not set by the time it reads down to PECI 
then use PECI, that would solve one problem of no CPU temp unless PECI.

This needs more data, not just specific failures by individual motherboard 
models to do their bios right. ie, a solution that works for almost everything, 
or at least more than most.

Note that of the samples I saw online, most PECI and CPUTIN values were so 
close as to be essentially the same, which argues against switching to a 
default PECI, particularly if the reports of PECI showing 10 degrees too cold 
are also correct.

Original comment by inxi-...@techpatterns.com on 28 Apr 2014 at 1:47

GoogleCodeExporter commented 9 years ago
http://lists.lm-sensors.org/pipermail/lm-sensors/2012-October/037532.html

"I have checked now and while on the one hand the BIOS report a CPU temperature
of 32C and a MB temperature of 29C, which sounds reasonable, on the other hand
PECI 0 reports 21C, even below room temperature! Note that other values like
AUXTIN and Core 0 to 3 look more compatible, in the range of 30 to 31C. So, it
looks like there is an error of over 10C in the value reported by PECI 0. Is
it the expected behavior?"

There you have a case where core 0 is right and PECI 0 is wrong.

Original comment by inxi-...@techpatterns.com on 28 Apr 2014 at 1:50

GoogleCodeExporter commented 9 years ago
http://lists.lm-sensors.org/pipermail/lm-sensors/2012-October/037565.html

"PECI is actually _very_ inaccurate for low temperatures, so this is not really
surprising."

It's a big mess, for sure. If I fix your issue, I will break unknown numbers of 
others, and some will remain somewhat the same.

It might be worth adding in an extra step or test in case there is no CPU in 
the sensors temp output.

Original comment by inxi-...@techpatterns.com on 28 Apr 2014 at 1:54

GoogleCodeExporter commented 9 years ago
I'm wondering maybe to use PECI only as a last fallback, and to prefer Core 0 
as the master temp. that I believe would solve the issue, though I'll have to 
check through a lot of user data sets to really see. But that might do it, 
given PECI is an intel thing, which generally should show Core 0 temps.

Original comment by inxi-...@techpatterns.com on 28 Apr 2014 at 1:57

GoogleCodeExporter commented 9 years ago
I have a tentative fix, update inxi like this, to get branch/one/inxi

inxi -! 11

make sure it updates to 2.1.23 patch 01-b1

Tnen show: inxi -s

Since the sensors stuff is already convoluted, I made it more so.

It already used a failsafe test of Core0 temp, so I added in two more test 
conditions to assign the temps, with conservative values used.

If the main CPU temp is something, and if PECI temp is something, and if CPU - 
PECI > 20, then use PECI. 

That will handle your specific broken motherboard sensors case.

then, for the case where no CPU type temp is listed, and no core9 exists, use 
peci, that is, peci is the last choice unless primary cpu is > 20 hotter than 
peci.

This should I believe protect against the too low peci values on other systems 
(and too low is dangerous too, because you don't see how hot your cpu actually 
is), and offering another fallback in case core 0 is missing, and temp1 / temp2 
are missing, and cpu is missing.

I think this will work, not positive, which is why I put it in the branches/one 
inxi.

Show the output:

inxi -Is

which will show me y ou updated inxi correctly, and that the -s works.

Once it's confirmed to work, I'll put these changes to trunk/inxi

Original comment by inxi-...@techpatterns.com on 28 Apr 2014 at 2:32

GoogleCodeExporter commented 9 years ago
Thank you very much for your quick response and all that effort to solve and 
explain that problem.

I see that it is very important to be careful that a possible solution for one 
user does not break it for others.

The patch you provided has been applied and inxi -Is shows:

Sensors:   System Temperatures: cpu: 44.0C mobo: 38.0C gpu: 35C 
           Fan Speeds (in rpm): cpu: N/A fan-1: 0 fan-2: 710 fan-3: 0 fan-4: 0 fan-5: 0 
Info:      Processes: 200 Uptime: 2:29 Memory: 2096.1/7918.1MB Client: Shell 
(bash) inxi: 2.1.23-2-b1 

sensors shows:

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +45.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:         +41.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:         +42.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:         +44.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:         +45.0°C  (high = +80.0°C, crit = +100.0°C)

nct6779-isa-0a00
Adapter: ISA adapter
in0:                    +0.88 V  (min =  +0.00 V, max =  +1.74 V)
in1:                    +1.01 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                    +3.39 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                    +1.11 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                    +0.90 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                    +1.06 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                    +3.42 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                    +3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                    +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                   +0.29 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                   +0.00 V  (min =  +0.00 V, max =  +0.00 V)
in12:                   +1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                   +0.74 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                   +0.72 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                     0 RPM  (min =    0 RPM)
fan2:                   710 RPM  (min =    0 RPM)
fan3:                     0 RPM  (min =    0 RPM)
fan4:                     0 RPM  (min =    0 RPM)
fan5:                     0 RPM  (min =    0 RPM)
SYSTIN:                 +38.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  
sensor = CPU diode
CPUTIN:                +127.5°C  (high = +80.0°C, hyst = +75.0°C)  ALARM  
sensor = CPU diode
AUXTIN0:                +23.0°C    sensor = thermistor
AUXTIN1:                +72.0°C    sensor = thermistor
AUXTIN2:               -128.0°C    sensor = thermistor
AUXTIN3:                -43.0°C    sensor = thermal diode
PECI Agent 0:           +44.0°C  
PCH_CHIP_CPU_MAX_TEMP:   +0.0°C  
PCH_CHIP_TEMP:           +0.0°C  
PCH_CPU_TEMP:            +0.0°C  
intrusion0:            ALARM
intrusion1:            ALARM
beep_enable:           disabled

Did I understand correctly that core0 would only be used if CPUTIN was not 
listed (not even with wrong value)? I agree with you that PECI should only be a 
last fallback. If Core0 and CPUTIN were both available, we could evaluate both 
and when, for example, CPUTIN - core0 > 20 we use core0 instead. PECI fallback 
could then be used if there was no core0 to evaluate CPUTIN. But I would 
suggest to give core0 always a higher priority than PECI.

While testing this I noticed that PECI seems to be updated with a small delay.

Original comment by inkasso....@gmail.com on 28 Apr 2014 at 2:34

GoogleCodeExporter commented 9 years ago
This problem is very complex, and I want to thank you for alerting me to PECI, 
which was not handled at all as an option.

I'm testing this now, but another person has a system where it reports a bad:
temp1: <too low by 30 C> and a valid Core0, but then on one of my older 
systems, that fix breaks it, lol.

So the trick is for me to look through maybe 20 data sets (my expanding 
collection of inxi -xx@14 datasets helps me resolve these issues generally by 
simply looking at a lot of systems sensors outputs to see what patterns I can 
use or not use safely.

patch 2 is right for your system and the temp1 bad system, but wrong for one of 
my test systems. It's all about finding a balance, and what numbers to use.

Here's my current logic, roughly, skipping some very hard to explain parts:

if peci and if cpu temps exist, and if cpu temp - peci > 20, use peci

else use cpu temp.

if temp1 and temp2 exist, apply some weird logic, but use one of them

then it gets more complex in terms of guessing.

if after all these, the real temp variable is still null, then use core0temp if 
not null, else use cpupeci temp if exists.

Also, of course, I have to modify the factor of subtraction for c / f, which 
I'll do in patch3, that is, 20C diff is 36F diff.

I'm not sure about trusting core0 because in the past before these newe issues, 
core0 was the LEAST reliable temp, temp1/temp2/cpu temp were the most reliable, 
and on older mobos, that  still applies.

I'll see if I can get this worked out this morning.

Original comment by inxi-...@techpatterns.com on 28 Apr 2014 at 5:52

GoogleCodeExporter commented 9 years ago
inxi 2.1.24 should close this issue. I have significantly tightened and 
expanded the cpu/mb temp handling. 

Sadly, with every set of problems fixed, it is guaranteed that at least some 
users will have their stuff fail. In some cases what they may see as failing 
might mean that inxi is now showing the right values, however. 

It's very hard to actually get this stuff solid, and it's basically not 
possible to get all cases right.

If you want in general to get always right data, then set up the lm-sensors 
config file to use CPU: temp and MB: temp, that inxi will always use as 
absolute, with the PECI test added now.

However, I believe inxi now is right for more people than it was before. It's 
right on my systems now I believe as well.

With this I'm closing the issue, though feel free to confirm with inxi -s that 
everything works. In your case, it should work.

Original comment by inxi-...@techpatterns.com on 28 Apr 2014 at 8:40