ocerman / zenmonitor

Zen monitor is monitoring software for AMD Zen-based CPUs.
MIT License
251 stars 28 forks source link

GUI wont load when run as root with EPYC 7642 48-core #34

Open IanSteveC opened 3 years ago

IanSteveC commented 3 years ago

I did the polkit install on Ubuntu 20.04.1 with several of my systems, but one wont launch. It works fine when not run as root, but when trying to run as root to get more sensor readings, it just hangs and no fields populate. Attempting to interact with the GUI i get the prompt that it's not responding and I have to force quit it.

I have two nearly identical systems. ASRock Rack EPYCD8 motherboards, with the same DDR4-3200MHz ram in both. one system running an EPYC 7402P 24-core and another running a 7642 48-core. the installs were done at the same time in the exact same way on both. installed the zenpower driver and blacklisted the k10temp driver on both, followed the ubuntu instructions exactly. everything works on the 24-core system, but the 48-core system just hangs when trying to load as root. the 48-core will load zenmonitor if not run as root.

let me know what I can do to help if you need more info.

KeithMyers commented 3 years ago

I would run that bash script for the debug output for the zenpower driver and post to its issue thread. Make sure the driver is handling the cpu correctly there first since zenmonitor depends on the zenpower driver for its input. https://github.com/ocerman/zenpower/issues/12

IanSteveC commented 3 years ago

done.

copied here: ASRock Rack EPYCD8 EPYC 7642 Ubuntu 20.04.1 (linux kernel 5.4.0-45-generic)

debug data: KERN_SUP: 1 NODE0; CPU0; N/CPU: 1 0005a008 = 00000002 0005a00c = 01f70000 0005a010 = 01760021 0005a014 = 01620074 000598bc = 0fff0fff 0005994c = 0fff0fff 00059954 = 00000ae6 00059958 = 00000ae6 0005995c = 00000ae4 00059960 = 00000ae2 00059964 = 00000af6 00059968 = 00000ae8 0005996c = 00000ae6 00059970 = 00000adc

abucodonosor commented 3 years ago

@IanSteveC

How do you run the app as root?

KeithMyers commented 3 years ago

Simple. Either install via the policy kit method described in the install docs or simply preface the call with sudo.

sudo ./zenmonitor

Run as root user, you get the additional package power readings, the power used by each core and the effective clock frequencies of each core.

Package power is one I find the most useful in seeing the effect of setting the cTDP, PPL and PPT values in the BIOS when you want to push beyond the stock power limits of a Zen cpu.

abucodonosor commented 3 years ago

@KeithMyers

I know what it does:).

But if that is about core count, it will trigger on any ZEN platform with more than X cores. I have a Naples Box with 32C/64T, I'll test that theory soon.

KeithMyers commented 3 years ago

Thanks. That would be appreciated by Ian I'm sure. Whether it is something exclusive to the 7642 or as I suspect just poor programming for the display space.

IanSteveC commented 3 years ago

@KeithMyers

I know what it does:).

But if that is about core count, it will trigger on any ZEN platform with more than X cores. I have a Naples Box with 32C/64T, I'll test that theory soon.

FYI, it works fine also on my 32C/64T Epyc 7502.

Won’t work on the 48-core 7642.

Doesn’t seem like the developer is actively working any issues though since this has been open so long.

abucodonosor commented 3 years ago

@IanSteveC

Well then, unfortunately, I don't have anything bigger than 64T myself. I have access to some servers with hell more cores but no one is running GUI and I cannot install a whole desktop there to test that.

Out of curiosity, what does the zenmonitor window do on 64T? Does it get a scrollbar? If not, @KeithMyers may be correct.

IanSteveC commented 3 years ago

Yes I get a scroll bar on both the 32-core (when run as root for extra stats) and on the 48-core (when not as root). The amount of “threads” doesn’t seem to matter really since the program only reports per core stats and nothing about the extra threads.

The 32-core system lists 32 cores worth of stats for power, frequency, and effective frequency. Taking a total of 109 line items for the whole GUI window.

The 48-core system is actually a dual 48-core system now. So 96C/192T. In the top stats, it lists Node0 and Node1 info, but only includes core frequency for 48 cores, I assume from Node0. This GUI takes 80 line items when not run as root. If you were to run as root, and actually get full per node stats for all cores, it would take 320 lines. A single socket 7648 (which didn’t work previously) should need about 164 lines.