ocerman / zenpower

Zenpower is Linux kernel driver for reading temperature, voltage(SVI2), current(SVI2) and power(SVI2) for AMD Zen family CPUs.
GNU General Public License v2.0
452 stars 45 forks source link

AMD EPYC 7351P not working #14

Closed cyboerg42 closed 5 years ago

cyboerg42 commented 5 years ago

System: HPE ProLiant DL325 Gen10 Kernel: 5.0.15-1-pve x86_64 Processor: AMD EPYC 7351P 16-Core Processor (23/1/2) Distro : Proxmox 6.0 (Debian Buster)

if you need more info, just toss a message into my direction.

dmesg error :

[341189.548252] ACPI Error: No handler for Region [POWR] (000000008241729d) [IPMI] (20181213/evregion-132)
[341189.549001] ACPI Error: Region IPMI (ID=7) has no handler (20181213/exfldio-265)
[341189.549680] No Local Variables are initialized for Method [_PMM]
[341189.549680] No Arguments are initialized for method [_PMM]
[341189.549682] ACPI Error: Method parse/execution failed \_SB.PMI0._PMM, AE_NOT_EXIST (20181213/psparse-531)
[341189.550404] ACPI Error: AE_NOT_EXIST, Evaluating _PMM (20181213/power_meter-338)

sensors with k10temp :

k10temp-pci-00d3
Adapter: PCI adapter
Tdie:         +39.8°C  (high = +70.0°C)
Tctl:         +39.8°C  

k10temp-pci-00c3
Adapter: PCI adapter
Tdie:         +44.2°C  (high = +70.0°C)
Tctl:         +44.2°C  

ixgbe-pci-c401
Adapter: PCI adapter
loc2:         +34.0°C  (high = +105.0°C, crit = +95.0°C)

k10temp-pci-00db
Adapter: PCI adapter
Tdie:         +44.2°C  (high = +70.0°C)
Tctl:         +44.2°C  

k10temp-pci-00cb
Adapter: PCI adapter
Tdie:         +40.4°C  (high = +70.0°C)
Tctl:         +40.4°C  

power_meter-acpi-0
Adapter: ACPI interface
power1:        0.00 W  (interval = 300.00 s)

sensors with zenpower :

ixgbe-pci-c401
Adapter: PCI adapter
loc2:         +32.0°C  (high = +105.0°C, crit = +95.0°C)

power_meter-acpi-0
Adapter: ACPI interface
power1:        0.00 W  (interval = 300.00 s)
cyboerg42 commented 5 years ago

acpidump > acpi_tables_dl325_g10_7351P.txt

acpi_tables_dl325_g10_7351P.txt

maybe this helps a bit :)

cyboerg42 commented 5 years ago
kernel_smn_support = 1
0005a008 = 00000002
0005a00c = 0161004e
0005a010 = 01f70000
000598bc = 0fff00ff
0005994c = 00000000
00059954 = 00000a88
00059958 = 00000a84
0005995c = 00000a78
kernel_smn_support = 1
0005a008 = 00000002
0005a00c = 01400008
0005a010 = 01f70000
000598bc = 0fff00ff
0005994c = 00000000
00059954 = 00000000
00059958 = 00000000
0005995c = 00000000
kernel_smn_support = 1
0005a008 = 00000002
0005a00c = 00000000
0005a010 = 00000000
000598bc = 0fff00ff
0005994c = 00000000
00059954 = 00000000
00059958 = 00000000
0005995c = 00000000
kernel_smn_support = 1
0005a008 = 00000002
0005a00c = 00000000
0005a010 = 00000000
000598bc = 0fff00ff
0005994c = 00000000
00059954 = 00000000
00059958 = 00000000
0005995c = 00000000

zenpower debug output

cyboerg42 commented 5 years ago

oh wow, now it's working. sorry for bothering... :)

zenpower-pci-00d3
Adapter: PCI adapter
SVI2_Core:    +1.55 V  
SVI2_SoC:     +1.55 V  
Tdie:         +30.8°C  (high = +70.0°C)
Tctl:         +30.8°C  
SVI2_P_Core:   0.00 W  
SVI2_P_SoC:    0.00 W  
SVI2_C_Core:  +0.00 A  
SVI2_C_SoC:   +0.00 A  

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:    +0.94 V  
SVI2_SoC:     +0.01 V  
Tdie:         +31.8°C  (high = +70.0°C)
Tctl:         +31.8°C  
SVI2_P_Core:  78.48 W  
SVI2_P_SoC:    0.00 W  
SVI2_C_Core: +83.14 A  
SVI2_C_SoC:   +0.00 A  

ixgbe-pci-c401
Adapter: PCI adapter
loc2:         +32.0°C  (high = +105.0°C, crit = +95.0°C)

zenpower-pci-00db
Adapter: PCI adapter
SVI2_Core:    +1.55 V  
SVI2_SoC:     +1.55 V  
Tdie:         +29.8°C  (high = +70.0°C)
Tctl:         +29.8°C  
SVI2_P_Core:   0.00 W  
SVI2_P_SoC:    0.00 W  
SVI2_C_Core:  +0.00 A  
SVI2_C_SoC:   +0.00 A  

zenpower-pci-00cb
Adapter: PCI adapter
SVI2_Core:    +1.15 V  
SVI2_SoC:     +0.01 V  
Tdie:         +31.5°C  (high = +70.0°C)
Tctl:         +31.5°C  
SVI2_P_Core:  13.15 W  
SVI2_P_SoC:    0.00 W  
SVI2_C_Core: +12.47 A  
SVI2_C_SoC:   +0.00 A  

power_meter-acpi-0
Adapter: ACPI interface
power1:        0.00 W  (interval = 300.00 s)

best regards, and thanks for this awesome module!

ocerman commented 5 years ago

There has been some reports before, that a restart is needed for some people. This was probably that case.

I am happy that it works. Even though I can see that some 0 values are being reported, which is propably not correct.

cyboerg42 commented 5 years ago

Hm, i just reloaded the kernel module - and that fixed it somehow. No restart needed.

I thought, that those are the disabled dies on the EPYC Chip - but by the amount of L3, I guess all dies are active. Weird.