Open tiburcillo opened 3 years ago
Okay I'm getting readings for all of them now, and it seems at least closer to being right. It's maxing out around 80W and it's a 105W chip (with PBO enabled) but that might just be the load I tested on it. I'll keep watching sensors, but either way it's way more info than it was. Thanks so much
It looks like the k10temp needs some more fixing.. ZEN2 worked by accident it seems.
I've made this patch for both, ZEN2 & ZEN3 Ryzen desktop CPUs.
Regarding the wrong core power reading:
I commented out this one line (line 630 with ZEN3-test3.patch applied)
//data->zen2 = true; /* the code need refactoring but calculation is the same */
and now my power reading matches those readings I get under Windows from HWiNFO and Ryzen Master
zenpower-pci-00c3
Adapter: PCI adapter
SVI2_Core: 1.29 V
SVI2_SoC: 994.00 mV
Tdie: +85.2°C (high = +95.0°C)
Tctl: +85.2°C
Tccd1: +84.5°C
Tccd2: +80.8°C
SVI2_P_Core: 125.82 W
SVI2_P_SoC: 7.89 W
SVI2_C_Core: 97.69 A
SVI2_C_SoC: 7.94 A
So, to me it looks like the calculation is done as in Zen/Zen+. But that is purely derived by obervation. I couldn't find any documents from AMD regarding their registers and how this calculation is done. If someone knows a link, hit me up.
@hattedsquirrel
I don't have any HW nor docs so is pure speculation & based on existing data & register readouts. I can poke someone but not sure whatever this is NDA material or something. TBH, is really sad how AMD is treating Linux users ;(.
There is the report for the kernel folks:
Just FYI guys, it looks like Voltage etc is being removed from k10temp... The reason is weird, but yeah without any help from AMD, I can understand the decision. Probably we the consumers, have to flood AMD support centre with bug reports about the whole situation.
Ok, I figured what all this weirdness is about, even the ID's..
https://lkml.org/lkml/2020/12/21/780 https://lkml.org/lkml/2020/12/22/3
This really pisses me off.
"Sometimes people complain about the readings not being perfect so I figured I'll just completely remove all temp and voltage readings so you get nothing."
What the hell, that sounds like a 5 year old child, not a Linux kernel developer.
Unfortunately I don't know of any easy way to get in contact with AMD in any meaningful way. Customer support would be useless.
On Tue, Dec 22, 2020 at 12:28 AM abucodonosor notifications@github.com wrote:
Ok, I figured what all this weirdness is about, even the ID's..
https://lkml.org/lkml/2020/12/21/780 https://lkml.org/lkml/2020/12/22/3
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ocerman/zenpower/issues/39#issuecomment-749347178, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM5Y334XLYIYJ6RQPGYCG63SWAVAFANCNFSM4UGCUEKA .
Confirmed working ok on a 5950x here. SVI2_P_Core seems halved though, as already mentioned.
All this code should be in the k10temp module though, and supported by AMD. Is there really no documentation on this stuff?
read the thread.
On Tue, Dec 22, 2020 at 5:50 AM Martin Schrodt notifications@github.com wrote:
https://crazy.dev.frugalware.org/ZEN3-test3.patch
Confirmed working ok on a 5950x here. SVI2_P_Core seems halved though, as already mentioned.
All this code should be in the k10temp module though, and supported by AMD. Is there really no documentation on this stuff?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ocerman/zenpower/issues/39#issuecomment-749479154, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM5Y334QCNB3KZWTH7III5LSWB2YVANCNFSM4UGCUEKA .
@spheenik Read this entry: https://github.com/ocerman/zenpower/issues/39#issuecomment-749282228 You have to uncomment the line that sets the multiplication coefficients to Zen2 instead of Zen1.
I did, but I seem to have missed this:
I don't have any HW nor docs so is pure speculation & based on existing data & register readouts. I can poke someone but not sure whatever this is NDA material or something. TBH, is really sad how AMD is treating Linux users ;(.
Sorry.
It's kind of sad that AMD does not bring the official k10temp code up to something meaningful. Certainly seems like the documentation on their side is not readily distributable.
Anyway: Thank you all for the work done here. It's super nice to have CCD temperatures. Coming from a Threadripper 1920x I certainly missed that.
Could some of you comment again which CPU you are using and how accurate you think the power readings are with data->zen2 set to false and set to true? I lost the overview of who uses which setting on which CPU. If it is the same for all of us I could update @abucodonosor's patch.
For me, on a 5900X with data->zen2=false P_Core has a deviation <1% from the values I get with HWiNFO under Windows. P_SoC is identical within the measurement resolution.
@hattedsquirrel
I think I have now a good idea regarding PLANE0/1 registers. It looks like they are Server/Desktop/APU ( this one not sure ), except for ZEN1 which is a mess. However I have no idea about the formulas yet, that needs some experiments.
I've made a patch4 with updated code for NOT yet released EPYCs, bc that is the 'ZEN3' support the AMD people added mainline, just bc they released ZEN3 desktop and yeah they cannot be bothered to support these first.
Also for you and others want to test ZEN 1/2 algo, I've added a zen1_calc module option so you don't need recompile.
modprobe zenpower zen1_calc=1 , should give zen1 calculation, you can check in dmesg.
Confirmed working ok on a 5950x here. SVI2_P_Core seems halved though, as already mentioned.
All this code should be in the k10temp module though, and supported by AMD. Is there really no documentation on this stuff?
Unfortunately, there isn't any documentation and AMD itself is unwilling to either help existing projects, like this one or the mainline k10temp driver or write a zen based temps driver themselves.
Temperatures support is no the only area they are acting like this, see cpufreq code as an example.
This really pisses me off. "Sometimes people complain about the readings not being perfect so I figured I'll just completely remove all temp and voltage readings so you get nothing." What the hell, that sounds like a 5 year old child, not a Linux kernel developer. Unfortunately I don't know of any easy way to get in contact with AMD in any meaningful way. Customer support would be useless.
Well, the kernel people have no choice, they are flooded with bug reports and are unable to really 'fix' the code as long AMD refuses to provide the needed data. Everything else is a wild guess like you can see from this issue.
I'm more pissed about the response of the AMD devel.
See https://marc.info/?l=linux-hwmon&m=160797559810358&w=2
"Even though the results on EPYC servers seem to be correct, the readings of Vol/Amp are less reliable on some client platforms. This can be attributed to many factors, such as the design of power plane or the change of slope coefficient. It is better to remove the info of Vol/Amp from k10temp for now."
Full of s*it. I wait to see when they notify HWINFO, CPUz etc to remove that support cause is broken, also when they stop using this SW themselves in events or presentations.
IOW, they think the Linux community is kinda stupid, and their Linux customers too.
modprobe zenpower zen1_calc=1 , should give zen1 calculation, you can check in dmesg.
Thanks, great work! As expected, with zen1_calc=1 the readings are very accurate for me.
In the kernel discussion I saw a comment about some calibration that should be needed. But honestly, from my point of view the ~1% accurarcy I get this way is more than good enough. And certainly better than having no reading at all.
I'm also dissappointed how AMD handles this and how they add support for not yet relased CPUs but can't do so for those that are already released. And then they even withdraw it all with the questionable reasoning you cited. Even on the Windows side there are wild discussions within the low-idle-power community why the package power as reported by the PPT is always 10-15W higher than Core+SoC combined. Nobody gets less than 15W idle consumption, there is no insight where the power goes to, no information at all on how or if the chipset's power states can be influenced, Ryzen Master software is extremely limited and all AMD is willing to say is that there exist more parameters that can be adjusted but it's all secret sauce and nobody may know. Ask your mainboard manufacturer to tune the hidden settings specifically for your use case is the undertone. Documentation from Intel on their CPUs hasn't always been the clearest either, but at least there is some documentation. For the user it is more easy to tune the the power consumption behaviour of their system as they see fit. With all the risks and benefits. And they are better with providing linux kernel drivers. IMHO AMD is wasting the potential of their otherwise nice platform by keeping users from fully utilizing its possibilities. But sigh it is what it is...
@hattedsquirrel
I'm trying to convince the kernel folks to not drop support but use a module option for now until it gets sorted out. That way both sites should be happy for now, and the features can still get some testing in the kernel.
The thing is, without any support, we never get to perfection or near it, ever.
Waiting for AMD to help is like waiting for pigs to fly, right now.
@hattedsquirrel
Nobody gets less than 15W idle consumption
Is that on x570 only or B550 too?
Thats on B550 (and excludes graphics card and PSU losses of course). I know that the X570 has a higher TDP rating, but so far I haven't seen any actual measurements for either of both in real-world scenarios with all unused ports disabled and L0s and L1 enabled. If you are interested I could measure the Rth of my heatsinks and make an estimation how much thermal energy is dissipated by the CPU and chipset, but then we might want to move that discussion elsewhere ;-)
@hattedsquirrel @ocerman
It looks like the k10temp voltage etc support will go away, PR to Linus is out.
However, Guenter is open to take an amd_voltage driver or similar if someone is willing to maintaining it. That could be a chance to get the bits into the kernel as a separate module.
Thats on B550 (and excludes graphics card and PSU losses of course). I know that the X570 has a higher TDP rating, but so far I haven't seen any actual measurements for either of both in real-world scenarios with all unused ports disabled and L0s and L1 enabled. If you are interested I could measure the Rth of my heatsinks and make an estimation how much thermal energy is dissipated by the CPU and chipset, but then we might want to move that discussion elsewhere ;-)
I kind am :) but no rush with that. Right now I'm thinking about a voltage driver for the kernel :)
"Even though the results on EPYC servers seem to be correct, the readings of Vol/Amp are less reliable on some client platforms. This can be attributed to many factors, such as the design of power plane or the change of slope coefficient. It is better to remove the info of Vol/Amp from k10temp for now."
Full of s*it. I wait to see when they notify HWINFO, CPUz etc to remove that support cause is broken, also when they stop using this SW themselves in events or presentations.
IOW, they think the Linux community is kinda stupid, and their Linux customers too.
Hwinfo also doesn't report the correct values. That's why it has that Power Reporting Deviation value as some motherboard vendors have been found purposefully sending incorrect values to the CPU in order to get higher frequencies. https://www.hwinfo.com/forum/threads/explaining-the-amd-ryzen-power-reporting-deviation-metric-in-hwinfo.6456/
Maybe that's what caused the bad reporting?
@fr33-man
The problem you are referring too is a different problem. Yes some vendors may have done some tricks but that is some +/-2%-5% Deviation on some motherboards. On Linux, they refuse to even add any support what so ever, I mean stock specs etc. What you see here or in the mainline kernel is pure guess works by the community.
No. On some board it's a double digit number which can go as high as 20-50%. As the CPU get it's values from the VRM controller, that might be the issue. Please see the post that i linked.
Here is an practical example recorded on MSI X570 Godlike motherboard, using the most recent 1.93 beta-bios version. For this bios version MSI has declared 280A reference current, when the correct value that produces near 100% result (i.e. no >deviation) and also a matching power draw compared to other boards (same CPU and workload) is 300A. This means that the >board allows 7.14% (300/280) higher power draw for the CPU than AMD specifications state. Compared to the worst violators >(up to 50%) this is minor infraction, so MSI deserves a benefit of a doubt whenever this is intentional or a honest error.
@fr33-man
even if the issue would be 300% on the broken boards it is irrelevant.
Again, on Linux there aren't docs provided by AMD, all formulas to calculate various things are under NDA. IOW, the imperfection on Linux, as is now, comes from 'lack of correct formulas' for voltage, the power consumption etc.
The issue you are describing can be workaround once everything else is within the specs, which isn't and won't really be as long AMD refuses to provide open source docs. Please note not even the sensors registers are really provided by AMD on Linux.
Now that I have access to the values of the PM table from the SMU I tried to align the zenpower with what the PM table reports.
Here are my results:
TL;DR;
Here is the image:
@abucodonosor
Thats on B550 (and excludes graphics card and PSU losses of course). I know that the X570 has a higher TDP rating, but so far I haven't seen any actual measurements for either of both in real-world scenarios with all unused ports disabled and L0s and L1 enabled. If you are interested I could measure the Rth of my heatsinks and make an estimation how much thermal energy is dissipated by the CPU and chipset, but then we might want to move that discussion elsewhere ;-)
I kind am :) but no rush with that. Right now I'm thinking about a voltage driver for the kernel :)
B550 with nothing attached to SATA, USB or the PCIe ports (from my side, maybe some board-internal stuff) consumes about 2.3W. I'd stick a accuracy of +/-0.5W to that because the heat dissipation into the PCB is hard to nail down. Power did not change with processor CC6 state, btw. (as expected).
For those who aim for a low energy consumption system: Don't. Wait for the mobile versions. The desktop CPUs swallow much more power than you'd think. I wrote a waaaaay to long article about it here: https://hattedsquirrel.net/2020/12/power-consumption-of-ryzen-5000-series-cpus/
How can the Vsoc be stock if you're running XMP?
Good point. I got V_SoC=1.0V with XMP off as well as with XMP profile 2 (3000 MHz). With XMP profile 1 (3600 MHz) V_SoC did indeed increase and so did P_SoC, of course. Right now I'm running V_Soc=0.9V @ 3000 MHz without any problems and no further tuning.
@hattedsquirrel
VDDIO_MEM_S3 = DRAM I/O Ring Power Supply, but there should be VDDIO_MEM_S3_SENSE pin for the voltage monitor.
Thank you for writing the article :-)
Good point. I got V_SoC=1.0V with XMP off as well as with XMP profile 2 (3000 MHz). With XMP profile 1 (3600 MHz) V_SoC did indeed increase and so did P_SoC, of course. Right now I'm running V_Soc=0.9V @ 3000 MHz without any problems and no further tuning.
I would run @ 3200 MHz to measure because this is what AMD officially supports, so that should be 'the baseline', eg: with Fabric Clock running @ 1600 Mhz.
No, baseline is at 2133
Yes, there is a sense pin. I took that as indication that they indeed expect a substantial current draw on that net. I didn't bother with mem speed too much since my main goal was to understand why my system uses >50W for web browsing while HWiNFO and alike only show 2+4W (Core+Soc). Once I understood that the CPU consumes so much more than what is shown to us, it became clear that I couldn't reach near-mobile-plattform levels with only tuning voltages and frequencies. There must be more power-down modes involved on those plattforms. As long as I can't access those, it won't become a "daily driver" PC but stay more of an "on demand renderer".
Also I find it "interesting" to see that for example anandtech has an elaborate article on the efficiency of ryzen 5000, but of course they rely on the power values they are able to obtain from software. Apparently they don't know about the additional power draw and thus are rendering a better-than-real image. The same applies to other review sites / channels. Hiding unpleasant statistics might be "normal marketing behaviour" from AMDs side, but still... can't hurt to have software that shows all values. (Don't get me wrong, I still love the speed and power of their 7nm CCDs, but the lack of transparency... not so much.)
Successfully tested patch 4 with @hattedsquirrel 's Zenmonitor patch on a 5800X. Thanks to all involved!
I'm in favor of turning this into a pull request.
Can "zen1_calc=1" be passed to the module when it's loaded via DKMS?
You can always pass options to any module when being loaded, it does not matter whether the module is created by DKMS, manually or included in the kernel.
Just do:
echo "options zenpower zen1_calc=1" > /etc/modprobe.d/zenpower.conf
modprobe will then automatically apply the options when loading the module.
can someone please write up or link to the directions for applying this patch? i have the patch file, but have no idea what to do with it.
I have zenpower and zen monitor installed already using the default instructions, and I just changed from a 3900X to a 5950X. I get temp readings out of Psensor, but I'd like to get zenmonitor working again.
just put it into the folder of the zenpower source code and then run it through the patch util inside the source code directory:
patch -p1 -i ZEN3-test4.patch
The -p1
strips one directory of the path at the beginning (since the paths in the file are given as a/zenpower.c
and b/zenpower.c
), the -i
specifies the input file.
thanks. I assume I'll have to recompile and reinstall then?
yes, of course.
Thanks, I got it working. I had to recompile zenmonitor with a patched source also.
To what extent is the SOC telemetry not trust worthy? I see some discussion about this above. It seems my reading for vSOC is a bit off I think. I’ve got it set to 0.95 in the BIOS, but zenmonitor (and hence zenpower) reports nearly 1.2V. Is this normal? On a 5950x.
If zenpower reads the right registers the voltage reading is reasonably accurate. It certainly isn't off by 20% as in your case. But there is guesswork involved with the register addresses, so there is still room for things to go wrong. Did you set your voltages in the BIOS to fixed values, via the offset mode, or one of the other automated modes? Also, XMP can overwrite the V_SoC and I've seen 1.2V in conjunction with XMP.
While the voltage readings worked well on my 5900X, the power is pretty far off and also very temperature dependent. If you want something to compare your values to, you could give ryzen_monitor a try. That thing is also based on guesswork but uses the SMU as data source. So don't take those values as guaranteed either. Also, it has no lmsensors integration. But at least for me the SMU values correleated very well with physical measurements. Maybe it can help you estimate how accurate the zenpower readings are on your system.
Yes I have ram set to XMP, and yes I have the SOC set to a static value of 0.95. Not using an auto mode for that.
But XMP profiles don’t have SOC voltage in them. Just VDIMM (which it also sets to about 1.45v), so if XMP is somehow interfering with the SOC voltage, this sounds like a bug in the BIOS/AGESA.
Can you double-check with a different software or a multimeter whether your V_SoC really is 0.95V? 1.20V seems to be the default voltage on many mainboards for memory speeds above 3200MHz. If somehow the 1.2V got set in hardware, it would explain the zenpower reading.
i rebooted into windows to check with HWinfo, and it reports the same, so I guess it's reading the right value.
I also noticed that in the BIOS HW monitor section, it lists two SOC values, one around 1.2 that i see from software "CPU VDDCR_SOC", and another that matches my BIOS setting under the label "PREM_VDDCR_SOC"
@IanSteveC As the memory controller is integrated into the CPU package, increase in memory frequency causes an increase in NB/SoC voltage. It's not a bug.
If zenpower reads the right registers the voltage reading is reasonably accurate. It certainly isn't off by 20% as in your case.
As you can see by this message @hattedsquirrel still didn't read Stilt's post on power deviation as he knows more then somebody who tested at least 20 different motherboards #fact. I guess with all the bias in his article, he doesn't mind having some in his motherboard :rofl: Here's one paragraph from Stilt's post:
In short: Some motherboard manufacturers intentionally declare an incorrect (too small) motherboard specific reference value in AGESA. Since AM4 Ryzen CPUs rely on telemetry sourced from the motherboard VRM to determine their power consumption, declaring an incorrect reference value will affect the power consumption seen by the CPU. For instance, if the motherboard manufacturer would declare 50% of the correct value, the CPU would think it consumes half the power than it actually does. In this case, the CPU would allow itself to consume twice the power of its set power limits, even when at stock. It allows the CPU to clock higher due to the effectively lifted power limits however, it also makes the CPU to run hotter and potentially negatively affects its life-span, same ways as overclocking does. The difference compared to overclocking or using AMD PBO, is that this is done completely clandestine and that in the past, there has been no way for most of the end-users to detect it, or react to it.
However your CPU shouldn't get damaged since Ryzen has some foolprofing built into it. On an 2700X and ASrock x570 with a deviation of 60% the frequency just drops as the temperature gets into the 80s
just FYI, im not using PBO/PBO2, and am doing manual settings.
I have removed the XMP setting (and manually set clocks/voltages/timings to what XMP does), and it has not made a difference to measured SOC values, so i guess it has nothing to do with XMP specifically.
i feel like if the BIOS exposes an option for me to set a static value for SoC voltage, and it does not honor that setting but instead does whatever it wants, that qualifies as a bug. how else are you to tweak the OC stability/settings without the ability to accurately set SoC voltage?
From my following of the posts on the OCN Ryzen and memory overclocking threads, you can't trust the AGESA to do the things it says it is doing.
And you can't trust that setting XMP values for the memory does what it says it is doing. The memory controller will autocorrect out of alignment memory timings and not display the changes in the BIOS for example.
@fr33-man Where is this post from? Is that the one from the HWiNFO forum? AFAIK this deviation applies to the reported current and therefore reported power. The reported voltage should be ok. At least that was my understanding. Also we are talking about the SoC here, not the core voltage/current/power.
@IanSteveC My BIOS offers two places to set the SoC voltage. One right on the front page and another one deep down in the menu structure ater a disclaimer and next to tons of very specific settings. Maybe you have two places to adjust it, too.
The duplication of parameter settings in the BIOS is common. It's because the 32MB BIOS' have two separate compartmented sections, one for Matisse and one for Vermeer. So you often have a setting visible on the main pages and also buried deep in the overclocking sections.
Would it be a lot of work to add support for the Zen3 family? I love zenpower on my 2700x, would be cool if my 5600x would also be supported.
Thanks, t