ocerman / zenpower

Zenpower is Linux kernel driver for reading temperature, voltage(SVI2), current(SVI2) and power(SVI2) for AMD Zen family CPUs.
GNU General Public License v2.0
451 stars 45 forks source link

Support for 19h family #39

Open tiburcillo opened 3 years ago

tiburcillo commented 3 years ago

Would it be a lot of work to add support for the Zen3 family? I love zenpower on my 2700x, would be cool if my 5600x would also be supported.

Thanks, t

gardotd426 commented 3 years ago

I was also wondering about this.

Currently doesn't work with 5000 series, which isn't that surprising, but I was still hoping it would, since k10temp is just flatout useless with Ryzen 5000 as well.

It worked perfectly with the 3800X on this same motherboard (X570 Taichi) which has the Nuvaton SuperIO chip.

JaffoS1 commented 3 years ago

Would be absolutly great, if this works with the 5000 series!

abucodonosor commented 3 years ago

k10temp supports Zen3 from kernel >=5.10.

gardotd426 commented 3 years ago

k10temp supports Zen3 from kernel >=5.10.

Yeah. It gives Tdie and Tctl temps. That's literally it.

zenpower gives detailed voltage and power draw readings. None of that is available in k10temp.

This is literally all you get for CPU in k10temp on 5.10 for Zen 3:

k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +32.8°C
Tdie:         +32.8°C

Pretty lackluster. There's a reason we're asking for zenpower support. I made my above comment while running 5.10, so I was already well aware of how well it "works" with 5.10.

abucodonosor commented 3 years ago

Oh no, Vcore or Isoc etc for ZEN3 in 5.10?

Well, I can try to add that support but I'm not really familiar with that code. I can look at what 5.10 did and add the IDs, and then change the logic in zenpower_probe().. However, I cannot guarantee that is accurate or will work.

Give me some minutes to figure that :)

abucodonosor commented 3 years ago

@gardotd426

Are you willing to test this patch?

https://crazy.dev.frugalware.org/ZEN3-test.patch

gardotd426 commented 3 years ago

Yep. Tested it, no dice.

From skimming zenpower.c, it seems there's a lot of other areas where support would need to be added, just adding those few lines wouldn't seem to be enough (granted, my knowledge of how zenpower works is limited so this might not be the case).

But yeah, I get the exact same output as before.

zenpower-pci-00c3
Adapter: PCI adapter
Tdie:         +73.5°C  (high = +95.0°C)
Tctl:         +73.5°C

And that much worked without the patch, too (meaning that replacing k10temp w/zenpower gave me the same info just named as zenpower instead of k10temp).

abucodonosor commented 3 years ago

No, there is not much else, it just means the PLANE address is wrong for ZEN3 or the model IDs or both, and that includes the kernel itself. Someone with ZEN3 HW should report to lkml I guess.

There is no support whatso ever for fam 19h in zenpower before the patch, what means it got defaults and it seems to get defaults even now with fam19h added.

Btw are you sure you rmmod zenpower before loading the patched one?

abucodonosor commented 3 years ago

@gardotd426

I think I missed something.. in my patch change data->zen3 = true; to data->zen2 = true, just to test something, the address and calculation look the same on both zen2 & zen3 so it should not really matter.

gardotd426 commented 3 years ago

I'm sure I loaded the right zenpower because I didn't even have it installed before this patch, I'd uninstalled it because it was useless, and was using k10temp. I rmmod-ed k10temp and loaded zenpower after installing. I'll try editing the patch and running again.

gardotd426 commented 3 years ago

Same result, unfortunately. If I knew exactly what was missing I'd bug the guys @ lkml

abucodonosor commented 3 years ago

@gardotd426

k10temp should have Vcore etc. I'll try to find out myself the right offsets for ZEN3, bc I think there is something missing even in mainline.

Unfortunately, I don't have a ZEN3 box yet, prices for a 5950x are way too insane right now :)

gardotd426 commented 3 years ago

Hahah yeah trust me I get it, I was going for the 5900X but you can't buy one anywhere (and I refuse to encourage scalpers), and the only way I could even get the 5800X @ MSRP was through a Newegg combo deal (they aren't selling them individually hardly at all) w/ a 500GB Samsung 980 Pro even though all three of my NVME slots are already taken up with 1GB NVMEs, so I just sold the 980 Pro on ebay for like 10 bucks less than I paid for it.

I still might get a 5900X later for the cores, but a 5800X is perfectly fine and in gaming it's pretty much the same as the 5900X and it definitely doesn't bottleneck my RTX 3090.

If you need help or testing or anything like that I'm happy to do it

abucodonosor commented 3 years ago

@gardotd426

Out of curiosity, what does the kernel report on the CPU?

Something like this should tell:

dmesg | grep CPU0: | grep smpboot
hattedsquirrel commented 3 years ago

Output for 5900X: [ 0.111779] smpboot: CPU0: AMD Ryzen 9 5900X 12-Core Processor (family: 0x19, model: 0x21, stepping: 0x0)

gardotd426 commented 3 years ago

[ 0.109997] smpboot: CPU0: AMD Ryzen 7 5800X 8-Core Processor (family: 0x19, model: 0x21, stepping: 0x0)

abucodonosor commented 3 years ago

I think I see the bug :)

gardotd426 commented 3 years ago

?

abucodonosor commented 3 years ago

@gardotd426

give me a moment to create some theoretical patch just to see if it starts working.

gardotd426 commented 3 years ago

Alrighty

abucodonosor commented 3 years ago

?

Somone committed with the stepping ids :) But the data want the model

aqxa1 commented 3 years ago

Yeah, just tried out your idea, and it's now working. Copy and paste is broken on Firefox Wayland for some reason right now, but there's a heap of data now.

EDIT:

SVI2_Core:     1.55 V
SVI2_SoC:      1.48 V
Tdie:         +44.6°C  (high = +95.0°C)
Tctl:         +44.6°C
Tccd1:        +39.8°C
Tccd2:        +38.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:   17.56 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:   15.87 A
gardotd426 commented 3 years ago

Yeah, just tried out your idea, and it's now working. Copy and paste is broken on Firefox Wayland for some reason right now, but there's a heap of data now.

What did you do?

abucodonosor commented 3 years ago

@gardotd426

https://crazy.dev.frugalware.org/ZEN3-test2.patch

abucodonosor commented 3 years ago

Yeah, just tried out your idea, and it's now working. Copy and paste is broken on Firefox Wayland for some reason right now, but there's a heap of data now.

EDIT:

SVI2_Core:     1.55 V
SVI2_SoC:      1.48 V
Tdie:         +44.6°C  (high = +95.0°C)
Tctl:         +44.6°C
Tccd1:        +39.8°C
Tccd2:        +38.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:   17.56 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:   15.87 A

Yes is broken in the kernel the same way.

I wondered why it pulls default code at all, that is bc the switch(...) data is wrong

hattedsquirrel commented 3 years ago

@abucodonosor With your new patch, it now does something:

# sensors zenpower-*
zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.55 V
SVI2_SoC:    925.00 mV
Tdie:         +30.4°C  (high = +95.0°C)
Tctl:         +30.4°C
Tccd1:        +27.5°C
Tccd2:        +29.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:  543.90 mW
SVI2_C_Core:   0.00 A
SVI2_C_SoC:  882.00 mA
gardotd426 commented 3 years ago

Should we file a bug w/ the kernel?

On Mon, Dec 21, 2020 at 6:49 PM abucodonosor notifications@github.com wrote:

Yeah, just tried out your idea, and it's now working. Copy and paste is broken on Firefox Wayland for some reason right now, but there's a heap of data now.

EDIT:

SVI2_Core: 1.55 V

SVI2_SoC: 1.48 V

Tdie: +44.6°C (high = +95.0°C)

Tctl: +44.6°C

Tccd1: +39.8°C

Tccd2: +38.0°C

SVI2_P_Core: 0.00 W

SVI2_P_SoC: 17.56 W

SVI2_C_Core: 0.00 A

SVI2_C_SoC: 15.87 A

Yes is broken in the kernel the same way.

I wondered why it pulls default code at all, that is bc the switch(...) data is wrong

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ocerman/zenpower/issues/39#issuecomment-749258265, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM5Y334YCCV5X3O2C5CACULSV7NJNANCNFSM4UGCUEKA .

abucodonosor commented 3 years ago

@gardotd426

Yes, and the fix is simple for the kernel, this:


diff --git a/drivers/hwmon/k10temp.c b/drivers/hwmon/k10temp.c
index a250481b5a97..0b4e61bf90f7 100644
--- a/drivers/hwmon/k10temp.c
+++ b/drivers/hwmon/k10temp.c
@@ -541,7 +541,7 @@ static int k10temp_probe(struct pci_dev *pdev, const struct pci_device_id *id)
                data->is_zen = true;

                switch (boot_cpu_data.x86_model) {
-               case 0x0 ... 0x1:       /* Zen3 */
+               case 0x21:      /* Zen3 */
                        data->show_current = true;
                        data->svi_addr[0] = F19H_M01_SVI_TEL_PLANE0;
                        data->svi_addr[1] = F19H_M01_SVI_TEL_PLANE1;

Someone may try and confirm k10temp working too :)

abucodonosor commented 3 years ago

So still some offset ( maybe ) wrong, it may be from ZEN2 code need to check but can't see what is it right now.

SVI2_P_Core: 0.00 W SVI2_C_Core: 0.00 A

Does this do something under load?

aqxa1 commented 3 years ago

k10temp working:

k10temp-pci-00c3
Adapter: PCI adapter
Vcore:         1.55 V
Vsoc:        975.00 mV
Tctl:         +53.2°C
Tdie:         +53.2°C
Tccd1:        +44.8°C
Tccd2:        +40.5°C
Icore:         0.00 A
Isoc:          4.96 A

Looks to be a bit less data than Zenpower, though.

abucodonosor commented 3 years ago

@aqxa1

Thx, so there is the Icore or the SVI2_P_Core in zenpower wrong. Probably wrong offset.

@gardotd426 that should be reported to kernel people too.

I try to find out the right one but that is a pain with the current AMD documentation ;)

aqxa1 commented 3 years ago

And yeah, neither of those do anything for me under load (P_Core and C_Core), but P_SoC and C-SoC are both active.

hattedsquirrel commented 3 years ago

Is there a way we can veryfy the definition of F19H_M01H_SVI_TEL_PLANE0 and PLANE1?

No, under load Core remains at 0W and 0A but the values for SoC rise. From the reading I get, I'd guess that what is reported as SoC is actually the Core. I (foolishly) changed the definitions to

#define F19H_M01H_SVI_TEL_PLANE0            (F17H_M01H_SVI + 0x10)
#define F19H_M01H_SVI_TEL_PLANE1            (F17H_M01H_SVI + 0xC)

and now Core and SoC volatges at least make the impression of being somewhat in the right area. The SoC wattage and ampereage seem plausible, but the Core still reports 0W and 0A

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:   932.00 mV
SVI2_SoC:    994.00 mV
Tdie:         +30.2°C  (high = +95.0°C)
Tctl:         +30.2°C
Tccd1:        +28.2°C
Tccd2:        +28.2°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:    6.73 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:    6.77 A

Edit, my bad, it seems to work. Under load (one core) i get:

SVI2_Core:   963.00 mV
SVI2_SoC:    994.00 mV
Tdie:         +30.8°C  (high = +95.0°C)
Tctl:         +30.8°C
Tccd1:        +31.0°C
Tccd2:        +30.5°C
SVI2_P_Core:   4.44 W
SVI2_P_SoC:    5.56 W
SVI2_C_Core:   4.61 A
SVI2_C_SoC:    5.59 A
gardotd426 commented 3 years ago

Here's my output:

zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.55 V
SVI2_SoC:      1.47 V
Tdie:         +34.5°C  (high = +95.0°C)
Tctl:         +34.5°C
Tccd1:        +43.0°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:   10.76 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:    7.36 A
aqxa1 commented 3 years ago

Yeah, those changes look fairly accurate now:

zenpower-pci-00c3
                   Adapter: PCI adapter
                   SVI2_Core:     1.25 V
                   SVI2_SoC:    988.00 mV
                   Tdie:         +74.5°C  (high = +95.0°C)
                   Tctl:         +74.5°C
                   Tccd1:        +71.5°C
                   Tccd2:        +71.5°C
                   SVI2_P_Core: 132.50 W
                   SVI2_P_SoC:    6.06 W
                   SVI2_C_Core: 106.00 A
                   SVI2_C_SoC:    6.13 A

Values also look correct with k10temp, so definitely an oversight from the kernel devs.

abucodonosor commented 3 years ago

Is there a way we can verify the definition of F19H_M01H_SVI_TEL_PLANE0 and PLANE1?

Well, I trusted 'AMD' people who committed that to the kernel itself. One may think they should know what they are doing but ...


#define F19H_M01H_SVI_TEL_PLANE0            (F17H_M01H_SVI + 0x10)
#define F19H_M01H_SVI_TEL_PLANE1            (F17H_M01H_SVI + 0xC)

One can play with these right, but these are exactly the other way around for ZEN generic, PLANE0 is 0xc while PLANE1 is 0x10.

gardotd426 commented 3 years ago

Wait how did you guys fix the wattage readings?

abucodonosor commented 3 years ago

Yeah, those changes look fairly accurate now:

zenpower-pci-00c3
                   Adapter: PCI adapter
                   SVI2_Core:     1.25 V
                   SVI2_SoC:    988.00 mV
                   Tdie:         +74.5°C  (high = +95.0°C)
                   Tctl:         +74.5°C
                   Tccd1:        +71.5°C
                   Tccd2:        +71.5°C
                   SVI2_P_Core: 132.50 W
                   SVI2_P_SoC:    6.06 W
                   SVI2_C_Core: 106.00 A
                   SVI2_C_SoC:    6.13 A

Cool, then, for now, my patch should be at least a workaround for you guys.

Code needs a bit of refactoring but this is not my call.

Also, we found the bug in k10temp so, fixed 2 things while looking at this :)

Thx everyone for testing :)

abucodonosor commented 3 years ago

Wait how did you guys fix the wattage readings?

Seems to only work under load, or need a while to read something out.

But I've got that on my EPCY box also, sometimes this is ZERO until the box is doing something, however that is ZEN1 :)

aqxa1 commented 3 years ago

No worries, and thanks for looking into it.

hattedsquirrel commented 3 years ago

The values where a pure guess from my side so don't trust them in any way. Under full load the core voltage rises, so that looks okay. But P_Core is reported as 73W while the system consumes 203W out of the wall plug. So something still seems wrong. But thanks for your work so far.

gardotd426 commented 3 years ago

Mine only goes up to 30W under full load, so it's definitely not reading right :/

gardotd426 commented 3 years ago
zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core:     1.55 V
SVI2_SoC:      1.43 V
Tdie:         +62.9°C  (high = +95.0°C)
Tctl:         +62.9°C
Tccd1:        +54.2°C
SVI2_P_Core:   0.00 W
SVI2_P_SoC:   24.44 W
SVI2_C_Core:   0.00 A
SVI2_C_SoC:   17.07 A

This is during a Geekbench benchmark while all cores were turboing at 4.8GHz (these chips are monsters), so yeah...

aqxa1 commented 3 years ago

You need to set these:

#define F19H_M01H_SVI_TEL_PLANE0            (F17H_M01H_SVI + 0x10)
#define F19H_M01H_SVI_TEL_PLANE1            (F17H_M01H_SVI + 0xC)

My core values do seem correct on my system (up to 150W). I'm not sure Core includes the full package (but I could be wrong), so the values could be reported lower than the full power usage.

hattedsquirrel commented 3 years ago

@aqxa1 What processor are you using? One or two CCDs?

aqxa1 commented 3 years ago

@hattedsquirrel 5900x, two CCD. Have been testing by re-compiling Mesa.

abucodonosor commented 3 years ago

@aqxa1

You are correct regarding the defines for PLANE{0,1}. I contacted someone who confirmed they are the same as for ZEN2, so both are wrong mainline too.

Shall I create test3.patch ?

gardotd426 commented 3 years ago

If you don't mind that'd be great

hattedsquirrel commented 3 years ago

@aqxa1 Hm, ok. Same as mine. I can get it up to 74W reported for the cores while the plug power rises by 170W when loading up the cores. After subtracting conversion losses I'd expext the cores to consume between 120-140W, which would also match the PPT limit (142W for the whole package).

abucodonosor commented 3 years ago

@gardotd426

https://crazy.dev.frugalware.org/ZEN3-test3.patch