Open pktiuk opened 1 year ago
There is a compatibility feedback here from @staaled on a Supermicro H13SSL-NT motherboard where he managed to configure smfc
.
@staaled: How did you configure the CPU zone for AMD in order to read the temperature properly? Could you please share your config?
Sorry for the late response @petersulyok
Well... to expand a little on what I wrote at https://github.com/petersulyok/smfc/issues/19#issuecomment-1593015583
I use the k10temp
kernel module for AMD CPUs instead of coretemp
for Intel CPUs, and the rest is guesswork.
When running sensors
(from lm-sensors):
k10temp-pci-00c3
Adapter: PCI adapter
Tctl: +39.0 C
Tccd1: +33.0 C
Tccd2: +32.9 C
Tccd3: +34.8 C
Tccd4: +34.4 C
So just did a quick and dirty search for a hwmon temp1_label
file containing Tctl
in sysfs:
/sys/bus/pci/drivers/k10temp/0000:00:18.3/hwmon/hwmon13/temp1_label
and chucked this into smfc.conf
under the CPU zone
section:
hwmon_path=/sys/bus/pci/drivers/k10temp/0000*/hwmon/hwmon*/temp1_input
When running smfc this seems to expand properly and it matches the temperature reading from sensors
and ipmi:
# systemctl status smfc -n 50 | grep hwmon
Jun 28 06:28:01 localhost smfc.service[9451]: hwmon_path = ['/sys/bus/pci/drivers/k10temp/0000:00:18.3/hwmon/hwmon13/temp1_input']
#
Please note this is only tested on a single socket EPYC Zen4(Genoa) CPU running a 6.2.0 kernel.
One observation i made is that hwmon13
is NOT stable, and may vary between reboots/changes to components etc, so I wouldn't recommend using anything like hwmon_path=/sys/class/hwmon/hwmon13/temp1_input
Full config I'm experimenting with now:
[Ipmi]
command=/usr/bin/ipmitool
fan_mode_delay=10
fan_level_delay=5
swapped_zones=1
[CPU zone]
enabled=1
count=1
temp_calc=1
steps=6
sensitivity=3.0
polling=2
min_temp=35.0
max_temp=70.0
min_level=10
max_level=100
hwmon_path=/sys/bus/pci/drivers/k10temp/0000*/hwmon/hwmon*/temp1_input
[HD zone]
enabled=0
FWIW I replaced my chassis fans with Noctua NF-A9x14
's, and disabled the HD zone
because I want those silent puppies running full speed (~2100 RPM), as the stock fan on the Dynatron J12
CPU cooler is a little 80mm monster which does 8000 RPM at full tilt and makes me wonder if I can use it as a siren for the burglar alarm... The min_level
setting in the above config may not be very safe though.
@staaled thanks for sharing this!
I'm planning to add support of AMD CPUs for smfc
as well and I would have some further questions:
k10temp
module is visible on the /sys/devices/platform
branch in hwmon?README.md
(e.g. Intel Speed Shift) ?I really appreciate your help.
We should perhaps create a separate issue for this, however just a quick response to your questions @petersulyok:
/sys/devices/platform/
for k10temp
/sys/module/k10temp/drivers/pci:k10temp/
(same as /sys/bus/pci/drivers/k10temp/
), by looking at symlinks pointing to devices. (Ref https://docs.kernel.org/hwmon/k10temp.html they should all show up as pci devices.)
Putting some sample output here in case it helps:
root@localhost:/sys/module/k10temp/drivers/pci:k10temp# ls -l
total 0
lrwxrwxrwx 1 root root 0 Jun 28 15:11 0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3
--w------- 1 root root 4096 Jun 28 15:13 bind
lrwxrwxrwx 1 root root 0 Jun 28 15:11 module -> ../../../../module/k10temp
--w------- 1 root root 4096 Jun 28 15:13 new_id
--w------- 1 root root 4096 Jun 28 15:13 remove_id
--w------- 1 root root 4096 Jun 28 15:10 uevent
--w------- 1 root root 4096 Jun 28 15:13 unbind
H13SSL-NT
board one can find the manual on the resources page, MNL-2545.pdf
and look at pages 63-66
to find things as SMT
, Core Performance Boost
, C-states
, TDP Control
, Package Power Limit Control
etc..., I would perhaps start by looking at the CPU specs and https://docs.kernel.org/admin-guide/pm/amd-pstate.html , but this might be out of scope for this project?IPMI FULL MODE
confirmed working on H13SSL-NT
using the AST2600
BMC :)140
RPM, the other settings failed, still not gotten around to playing more with the set_ipmi_threshold.sh
script and figure out working commands for ipmitool
.
Currently everything in the IPMI webui reads N/A
for Low NR
, Low CT
, High CT
, High NR
, except for Low CT
=140
for my connected fans, since none of my fans go below 200 RPM this is not much of a problem right now.As a quick sidenote, fan_measurement.sh
requires a lot longer delay between changing fan levels to pick up the actual change, in the range of 10-15 seconds, or in my case they are still speeding up or slowing down when the measurement is taken, a nice feature would also be to detect when it trips the lowct point and fans spin up to 100% automatically.
@petersulyok :
So a friend of mine has a dual socket SuperMicro H12
motherboard with 2x EPYC 7551
, 5.4 kernel
, that outputs:
ls -al /sys/module/k10temp/drivers/pci:k10temp/
total 0
drwxr-xr-x 2 root root 0 Feb 12 20:08 .
drwxr-xr-x 30 root root 0 Feb 12 20:08 ..
lrwxrwxrwx 1 root root 0 Jul 3 12:08 0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3
lrwxrwxrwx 1 root root 0 Jul 3 12:08 0000:00:19.3 -> ../../../../devices/pci0000:00/0000:00:19.3
lrwxrwxrwx 1 root root 0 Jul 3 12:08 0000:00:1a.3 -> ../../../../devices/pci0000:00/0000:00:1a.3
lrwxrwxrwx 1 root root 0 Jul 3 12:08 0000:00:1b.3 -> ../../../../devices/pci0000:00/0000:00:1b.3
lrwxrwxrwx 1 root root 0 Jul 3 12:08 0000:00:1c.3 -> ../../../../devices/pci0000:00/0000:00:1c.3
lrwxrwxrwx 1 root root 0 Jul 3 12:08 0000:00:1d.3 -> ../../../../devices/pci0000:00/0000:00:1d.3
lrwxrwxrwx 1 root root 0 Jul 3 12:08 0000:00:1e.3 -> ../../../../devices/pci0000:00/0000:00:1e.3
lrwxrwxrwx 1 root root 0 Jul 3 12:08 0000:00:1f.3 -> ../../../../devices/pci0000:00/0000:00:1f.3
--w------- 1 root root 4096 Jul 3 12:08 bind
lrwxrwxrwx 1 root root 0 Jul 3 12:08 module -> ../../../../module/k10temp
--w------- 1 root root 4096 Jul 3 12:08 new_id
--w------- 1 root root 4096 Jul 3 12:08 remove_id
--w------- 1 root root 4096 Feb 12 20:08 uevent
--w------- 1 root root 4096 Jul 3 12:08 unbind
Apparently they all have a Tctl sensor, but no Tccd's
I have another single socket EPYC 7302
on a SuperMicro H12SSL-NT
board running 5.15 kernel
:
ls -al /sys/module/k10temp/drivers/pci:k10temp/
total 0
drwxr-xr-x 2 root root 0 May 1 2022 .
drwxr-xr-x 34 root root 0 May 1 2022 ..
lrwxrwxrwx 1 root root 0 Jul 3 13:16 0000:00:18.3 -> ../../../../devices/pci0000:00/0000:00:18.3
--w------- 1 root root 4096 Jul 3 13:16 bind
lrwxrwxrwx 1 root root 0 Jul 3 13:16 module -> ../../../../module/k10temp
--w------- 1 root root 4096 Jul 3 13:16 new_id
--w------- 1 root root 4096 Jul 3 13:16 remove_id
--w------- 1 root root 4096 May 1 2022 uevent
--w------- 1 root root 4096 Jul 3 13:16 unbind
With standard sensors
output:
k10temp-pci-00c3
Adapter: PCI adapter
Tctl: +42.5 C
Tccd1: +39.8 C
Tccd3: +40.0 C
Tccd5: +42.2 C
Tccd7: +39.8 C
Hi @pktiuk, did you manage to setup your system based on the sample here? I would appreciate to hear your feedback.
I haven't done this yet.
Unluckily I don't have too much time in this month for setting this up. But I will keep in mind testing this.
Let me know if you need some further help. The documentation of the latest v3.0.0 version contains recommendation for AMD users.
Hi @pktiuk and @staaled, I'm working on a more generic support of AMD CPUs in smfc
. Do you have access for a motherboard with multiple AMD CPUs? It would be a great help to get program output running on that (udevadm
) .
Unluckily I don't have an access to this kind of device. :/
I see there are only X10/X11 motherboards.
Would it need a lot of effort to implement support for motherboard M12SWA-TF ?