Open KeithMyers opened 3 years ago
Thanks for the info. Apparently ryzen_smu got updated 5 days ago to a new version. I'll have to look into that later. We expect to see v.0.1.0.
Regarding your CPUs: The 3950X will not work out of the box right now. You'd have to create a pm_table mapping first. The 5950X on the other hand should just work fine (given the SMU driver version matches). I tested with the 5900X which is essentially the same chip, but with 4 cores permanentely disabled.
Will continue to watch this repo for updates. Thanks for the quick reply.
I just checked in an update which now works for ryzen_smu v0.1.1 as well. You should now probably get a message about table version not supported.
If you are willing to provide pm_table dumps I can take a look and see how easy it is to guess the changes compared to the existing 3700X table.
You can create dumps by runnign the following script in bash (make sure you have read access to /sys/kernel/ryzen_smu_drv/pm_table
and /sys/kernel/ryzen_smu_drv/pm_table_version
first):
cat /sys/kernel/ryzen_smu_drv/pm_table_version | xxd -p > dump_pm_version
sleep 5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_idle.bin
yes > /dev/null &
sleep 0.5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_1Ta.bin
sleep 5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_1Tb.bin
yes > /dev/null &
sleep 0.5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_2Ta.bin
sleep 5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_2Tb.bin
for i in {1..30}; do (yes > /dev/null &); done
sleep 0.5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_32Ta.bin
sleep 5
cat /sys/kernel/ryzen_smu_drv/pm_table > dump_32Tb.bin
killall yes
Then pack all dump_* files and attach the archive. Thanks.
OK, here is the archive of the dump* files. pm_dump.zip
Could you test this patch? https://hattedsquirrel.net/downloads/ryzen_3950x-01.patch
Your patch file is corrupted at the end. keith@Serenity:~/Downloads/ryzen_monitor/src$ patch < ryzen_3950x-01.patch patching file pm_tables.c patching file pm_tables.h patching file ryzen_monitor.c Hunk #1 succeeded at 275 (offset -2 lines). Hunk #2 succeeded at 303 with fuzz 2 (offset -2 lines). Hunk #3 succeeded at 314 (offset -4 lines). Hunk #4 FAILED at 474. Hunk #5 FAILED at 490. 2 out of 5 hunks FAILED -- saving rejects to file ryzen_monitor.c.rej
pull the newest commits, then try again. I checked in some changes yesterday. Sorry about not mentioning that.
Ok, much better. Works now. ───────────────────────────────────────────────┬────────────────────────────────────────────────╮ │ CPU Model │ AMD Ryzen 9 3950X 16-Core Processor │ │ Processor Code Name │ Matisse │ │ Cores │ 16 │ │ Core CCDs │ 2 │ │ Core CCXs │ 4 │ │ Cores Per CCX │ 4 │ │ SMU FW Version │ v46.67.0 │ │ MP1 IF Version │ v11 │ ╰───────────────────────────────────────────────┴────────────────────────────────────────────────╯ ╭─────────┬────────────┬──────────┬─────────┬──────────┬─────────────┬─────────────┬─────────────╮ │ Core 0 │ 4300 MHz | 5.843 W | 1.275 V | 66.72 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 1 │ 4300 MHz | 6.275 W | 1.275 V | 72.93 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 2 │ 4300 MHz | 5.881 W | 1.275 V | 67.52 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 3 │ 4300 MHz | 6.287 W | 1.275 V | 73.19 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 4 │ 4300 MHz | 5.805 W | 1.275 V | 66.22 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 5 │ 4300 MHz | 6.225 W | 1.275 V | 72.45 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 6 │ 4300 MHz | 5.481 W | 1.275 V | 65.69 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 7 │ 4300 MHz | 5.775 W | 1.275 V | 71.73 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 8 │ 4275 MHz | 5.385 W | 1.275 V | 70.92 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 9 │ 4275 MHz | 4.990 W | 1.275 V | 64.93 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 10 │ 4275 MHz | 5.665 W | 1.275 V | 70.95 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 11 │ 4275 MHz | 5.058 W | 1.275 V | 63.72 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 12 │ 4275 MHz | 5.639 W | 1.275 V | 72.14 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 13 │ 4275 MHz | 5.797 W | 1.275 V | 68.86 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 14 │ 4275 MHz | 5.814 W | 1.275 V | 73.11 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ │ Core 15 │ 4275 MHz | 5.864 W | 1.275 V | 69.81 C | C0: 100.0 % | C1: 0.0 % | C6: 0.0 % │ ╰─────────┴────────────┴──────────┴─────────┴──────────┴─────────────┴─────────────┴─────────────╯ ╭── Core Statistics (Calculated) ───────────────┬────────────────────────────────────────────────╮ │ Highest Effective Core Frequency │ 4300 MHz │ │ Highest Core Temperature │ 73.19 C │ │ Highest Core Voltage │ 1.275 V │ │ Average Core Voltage │ 0.000 V │ │ Average Core CC6 │ 0.00 % │ │ Total Core Power Sum │ 91.7840 W │ ├── Reported by SMU ────────────────────────────┼────────────────────────────────────────────────┤ │ Peak Core Voltage │ 1.275 V │ │ Package CC6 │ 0.00 % │ ╰───────────────────────────────────────────────┴────────────────────────────────────────────────╯ ╭── Electrical & Thermal Constraints ───────────┬────────────────────────────────────────────────╮ │ Peak Temperature │ 75.50 C │ │ SoC Temperature │ 37.55 C │ │ Voltage from Core VRM │ 1.100 V | 1.442 V | 76.27 % │ │ PPT │ 174.971 W | 142 W | 123.22 % │ │ TDC Value │ 113.832 A | 95 A | 119.82 % │ │ TDC Actual │ 90.914 A | 95 A | 95.70 % │ │ EDC │ 139.999 A | 140 A | 100.00 % │ │ THM │ 74.18 C | 95 C | 78.09 % │ │ FIT │ 0 | 258 | 0.01 % │ ╰───────────────────────────────────────────────┴────────────────────────────────────────────────╯ ╭── Memory Interface ───────────────────────────┬────────────────────────────────────────────────╮ │ Coupled Mode │ ON │ │ Fabric Clock (Average) │ 1800 MHz │ │ Fabric Clock │ 1800 MHz │ │ Uncore Clock │ 1800 MHz │ │ Memory Clock │ 1800 MHz │ │ cLDO_VDDM │ 0.9504 V │ │ cLDO_VDDP │ 0.9002 V │ │ cLDO_VDDG │ 1.0477 V │ ╰───────────────────────────────────────────────┴────────────────────────────────────────────────╯ ╭── Power Consumption ──────────────────────────┬────────────────────────────────────────────────╮ │ Total Core Power Sum │ 91.7840 W │ │ VDDCR_SOC Power │ 19.3636 W │ │ GMI2_VDDG Power │ 8.7156 W │ │ L3 Logic Power │ 0.517 W + 0.5365 W │ │ L3 Logic Power │ + 0.386 W + 0.3332 W = 1.7727 W │ │ L3 VDDM Power │ 0.350 W + 0.3510 W │ │ L3 VDDM Power │ + 0.369 W + 0.3652 W = 1.4350 W │ │ │ │ │ VDDIO_MEM Power │ 8.6723 W │ │ IOD_VDDIO_MEM Power │ 0.0000 W │ │ DDR_VDDP Power │ 5.1823 W │ │ VDD18 Power │ 0.8000 W │ │ │ │ │ Calculated Thermal Output │ 137.7255 W │ ├── Additional Reports ─────────────────────────┼────────────────────────────────────────────────┤ │ SoC Power (SVI2) │ 1.094 V | 17.704 A | 19.364 W │ │ Core Power (SVI2) │ 1.275 V | 113.817 A | 145.117 W │ │ Core Power (SMU) │ 145.117 W │ │ Socket Power (SMU) │ 174.9525 W │ │ Package Power (SMU) │ nan W │ ╰───────────────────────────────────────────────┴────────────────────────────────────────────────╯
Okay, cool. Thanks for the help and the screenshot. It also pointed out a bug in the calculation of "Average Core Voltage", which I now fixed. I'll push all changes online now.
Ok, I'll pull the newest commit and test it for the missing average voltage value.
Was reading through the commit and noticed that you are limiting the application only to Ryzen parts.
Ever consider adding Epyc parts? You are hard coding a core limit of 16. My Epyc 7402P has 24 cores.
Would be nice to have the application usable on Epyc parts also.
All good. Average Core Voltage is now populated with actual value.
The only reason Epyc isn't supported right now is that I don't know anything about them. The first step would be to find out which SMN registers to read and to see if they differ to the Ryzen series. Those registers are read to find out how many CCDs there are and which cores are disabled. If you are brave enough you can build and run the attached util and paste its output. (It also depends on the ryzen_smu kernel driver.) Maybe the registeres look simmilar enough to the Ryzen series. smn_debug.tar.gz
I'll give it a shot. Glad to help developers with hardware testing.
Here is the smn_debug output from my AMD Epyc 7402P cpu.
ryzen_smu version string: 0.1.1 fam: 0x17 model: 0x31 logical_cores: 48 threads_per_core: 2 read 05d218: 02850a14, ret = OK read 05d228: 95400000, ret = OK read 05d258: 00000000, ret = OK read 05d21c: 09120a14, ret = OK read 05d22c: 0000002a, ret = OK read 05d25c: 24401e81, ret = OK read 30081800: 00000000, ret = OK read 30081d98: 00000000, ret = OK read 31081800: ffffffff, ret = OK read 31081d98: ffffffff, ret = OK read 32081800: ffffffff, ret = OK read 32081d98: ffffffff, ret = OK read 33081800: ffffffff, ret = OK read 33081d98: ffffffff, ret = OK read 34081800: 00000000, ret = OK read 34081d98: 00000000, ret = OK read 35081800: ffffffff, ret = OK read 35081d98: ffffffff, ret = OK read 36081800: ffffffff, ret = OK read 36081d98: ffffffff, ret = OK
Gave ryzen_monitor a what the hell shot on the Epyc.
ryzen_smu version string: 0.1.1 PM Tables are not supported on this platform.
Oh, thats unfortunate. The error message means that the ryzen_smu doesn't know how to read the PM table from the SMU yet. I looked into the code and the reason seems to be that they don't know which function number to call. Maybe you could reach out to them and see if they can get it going with your help. Once ryzen_smu can read the PM table I'm positive we can get things working on my side as well.
I will do that. Zenpower module works with my 7402P. Zen Monitor also. But it does not work on the 7502 or 7642 with the higher core counts.
I can provide remote access to epyc rome and Milan if that's useful. Also how do I contribute $ to fund further work here? (You should sign up for github sponsor?)
Can you check if ryzen_smu provides /sys/kernel/ryzen_smu_drv/pm_table
and /sys/kernel/ryzen_smu_drv/pm_table/pm_table_version
on your machines? This underlying support needs to be in place before we can start implementing support on our end.
@level1wendell I would like to have access to an Epyc server. How can I reach you?
Email is probably the best bet. Wendell at Level1Techs dot com
On a 7742 something catastrophic happens loading ryzen smu with kernel 5.11 from the pve repo (proxmox). The kernel thinks every pcie device wants vfio-pci for the driver. And other nondeterministic behavior. Never seen anything like that!
Distro of choice? I'll prep the os image for you also and we can do this on a dedicated machine I can swap in both Rome and Milan parts.
Don't feel rushed the hw will be at your disposal whenever you need however long it's needed to help further the project.
It seems close on these parts.
On Fri, Jun 11, 2021, 3:09 AM Patrick Schur @.***> wrote:
@level1wendell https://github.com/level1wendell I would like to have access to an Epyc server. How can I reach you?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hattedsquirrel/ryzen_monitor/issues/2#issuecomment-859336890, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJLXWZVNUXNSUFXO47R5L6DTSGZDZANCNFSM4XTM5GLA .
@level1wendell You got an email. ;)
Just tried this new driver and monitor in advance for a friend who is getting a Ryzen 5950X.
I ran it against my Ryzen 3950X and got this error message.