petersulyok / smfc

Super Micro Fan Control
GNU General Public License v3.0
176 stars 15 forks source link

H11SSL-i fan problem on proxmox #37

Closed petersulyok closed 1 month ago

petersulyok commented 4 months ago

@Xyz00777 reported an issue in SMFC hardware compatibility #19 issue:

trying to get it working for my H11SSL-i with ASPEED AST2500 with an proxmox install. because im not sure with fans are connected on what pwm i tried to set lower to 500 for every fan and 2000 as upper limit for every fan in the config

# This script must be executed by root.
if [ "$EUID" -ne 0 ]
then
    echo "ERROR: Please run as root"
    exit -1
fi

# Setup of the lower threshold limits of the fans (Noctua NF-F12 PWM rotation speed 300-1500 rpm).
# Edit the list of fans here (FAN1, FAN2, FAN4, FANA, FANB)!
for i in 1 2 3 5 A B;
do
    # Edit the lower threshold values here (0, 100, 200)!
    ipmitool sensor thresh FAN${i} lower 500 500 500 500 500 500
done

# Setup of the upper threshold limits of the fans (Noctua NF-F12 PWM rotation speed 300-1500 rpm).
# Edit the list of fans here (FAN1, FAN2, FAN4, FANA, FANB)!
for i in 1 2 3 5 A B;
do
    # Edit the upper threshold values here (1600, 1700, 1800)!
    ipmitool sensor thresh FAN${i} upper 2000 2000 2000 2000 2000 2000
done

i have Iceberg Thermal IceGALE Xtra with 500-2500 rpm and Noctua NH-U9 TR4-SP3 with 400-2000 rpm

after i loaded the modules and executed the install.sh file i have startet the service and got these journalctl log and the service crashed with 100% fan speed

May 31 03:07:18 ds9 systemd[1]: Started smfc.service - Super Micro Fan Control.
May 31 03:07:18 ds9 smfc.service[11931]: Logging module was initialized with:
May 31 03:07:18 ds9 smfc.service[11931]:    log_level = 3
May 31 03:07:18 ds9 smfc.service[11931]:    log_output = 2
May 31 03:07:18 ds9 smfc.service[11931]: Command line arguments:
May 31 03:07:18 ds9 smfc.service[11931]:    original arguments: /opt/smfc/smfc.py -c /opt/smfc/smfc.conf -l 3
May 31 03:07:18 ds9 smfc.service[11931]:    parsed config file = /opt/smfc/smfc.conf
May 31 03:07:18 ds9 smfc.service[11931]:    parsed log level = 3
May 31 03:07:18 ds9 smfc.service[11931]:    parsed log output = 2
May 31 03:07:19 ds9 smfc.service[11931]: Ipmi module was initialized with:
May 31 03:07:19 ds9 smfc.service[11931]:    command = /usr/bin/ipmitool
May 31 03:07:19 ds9 smfc.service[11931]:    fan_mode_delay = 10
May 31 03:07:19 ds9 smfc.service[11931]:    fan_level_delay = 2
May 31 03:07:19 ds9 smfc.service[11931]:    swapped_zones = False
May 31 03:07:29 ds9 smfc.py[11931]: Traceback (most recent call last):
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 1150, in <module>
May 31 03:07:29 ds9 smfc.py[11931]:     service.run()
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 1119, in run
May 31 03:07:29 ds9 smfc.py[11931]:     self.cpu_zone = CpuZone(self.log, self.ipmi, self.config)
May 31 03:07:29 ds9 smfc.py[11931]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 600, in __init__
May 31 03:07:29 ds9 smfc.py[11931]:     super().__init__(
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 395, in __init__
May 31 03:07:29 ds9 smfc.py[11931]:     self.build_hwmon_path(hwmon_path)
May 31 03:07:29 ds9 smfc.py[11931]:   File "/opt/smfc/smfc.py", line 632, in build_hwmon_path
May 31 03:07:29 ds9 smfc.py[11931]:     raise ValueError(self.ERROR_MSG_FILE_IO.format(path))
May 31 03:07:29 ds9 smfc.py[11931]: ValueError: Cannot read file (/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input).
May 31 03:07:33 ds9 smfc.service[11931]: smfc terminated: all fans are switched back to the 100% speed.
May 31 03:07:33 ds9 systemd[1]: smfc.service: Main process exited, code=exited, status=1/FAILURE
May 31 03:07:33 ds9 systemd[1]: smfc.service: Failed with result 'exit-code'.

Please help i dont want my fans to spin up every ~10 sec for 5 sec :(

petersulyok commented 4 months ago

Hi @Xyz00777,

Your problem is that the CPU temperature cannot be read from HWMON, as the log stated:

ValueError: Cannot read file (/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input).

Based on SuperMicro official page you have AMD CPU and you have to configure the proper file manually in smfc config. You can find more information here, it will be something like this:

hwmon_path=/sys/bus/pci/drivers/k10temp/0000*/hwmon/hwmon*/temp1_input

Xyz00777 commented 4 months ago

the path is the same on my system, i decommented it in the /opt/smfc/smfc.con and i was able to start it :)

Thank you very mutch! can i provide you/do you need anymore information for further developement if these really awesome software? or can we close the issue?

May 31 14:52:26 ds9 systemd[1]: Started smfc.service - Super Micro Fan Control.
May 31 14:52:26 ds9 smfc.service[6241]: Logging module was initialized with:
May 31 14:52:26 ds9 smfc.service[6241]:    log_level = 3
May 31 14:52:26 ds9 smfc.service[6241]:    log_output = 2
May 31 14:52:26 ds9 smfc.service[6241]: Command line arguments:
May 31 14:52:26 ds9 smfc.service[6241]:    original arguments: /opt/smfc/smfc.py -c /opt/smfc/smfc.conf -l 3
May 31 14:52:26 ds9 smfc.service[6241]:    parsed config file = /opt/smfc/smfc.conf
May 31 14:52:26 ds9 smfc.service[6241]:    parsed log level = 3
May 31 14:52:26 ds9 smfc.service[6241]:    parsed log output = 2
May 31 14:52:27 ds9 smfc.service[6241]: Ipmi module was initialized with:
May 31 14:52:27 ds9 smfc.service[6241]:    command = /usr/bin/ipmitool
May 31 14:52:27 ds9 smfc.service[6241]:    fan_mode_delay = 10
May 31 14:52:27 ds9 smfc.service[6241]:    fan_level_delay = 2
May 31 14:52:27 ds9 smfc.service[6241]:    swapped_zones = False
May 31 14:52:37 ds9 smfc.service[6241]: CPU zone fan controller was initialized with:
May 31 14:52:37 ds9 smfc.service[6241]:    ipmi zone = 0
May 31 14:52:37 ds9 smfc.service[6241]:    count = 1
May 31 14:52:37 ds9 smfc.service[6241]:    temp_calc = 1
May 31 14:52:37 ds9 smfc.service[6241]:    steps = 6
May 31 14:52:37 ds9 smfc.service[6241]:    sensitivity = 3.0
May 31 14:52:37 ds9 smfc.service[6241]:    polling = 2.0
May 31 14:52:37 ds9 smfc.service[6241]:    min_temp = 30.0
May 31 14:52:37 ds9 smfc.service[6241]:    max_temp = 60.0
May 31 14:52:37 ds9 smfc.service[6241]:    min_level = 35
May 31 14:52:37 ds9 smfc.service[6241]:    max_level = 100
May 31 14:52:37 ds9 smfc.service[6241]:    hwmon_path = ['/sys/bus/pci/drivers/k10temp/0000:00:18.3/hwmon/hwmon6/temp1_input']
May 31 14:52:37 ds9 smfc.service[6241]:    Temperature to level mapping:
May 31 14:52:37 ds9 smfc.service[6241]:    0. [T:30.0C - L:35%]
May 31 14:52:37 ds9 smfc.service[6241]:    1. [T:35.0C - L:45%]
May 31 14:52:37 ds9 smfc.service[6241]:    2. [T:40.0C - L:56%]
May 31 14:52:37 ds9 smfc.service[6241]:    3. [T:45.0C - L:67%]
May 31 14:52:37 ds9 smfc.service[6241]:    4. [T:50.0C - L:78%]
May 31 14:52:37 ds9 smfc.service[6241]:    5. [T:55.0C - L:89%]
May 31 14:52:37 ds9 smfc.service[6241]:    6. [T:60.0C - L:100%]
May 31 14:52:37 ds9 smfc.service[6241]: HD zone fan controller was initialized with:
May 31 14:52:37 ds9 smfc.service[6241]:    ipmi zone = 1
May 31 14:52:37 ds9 smfc.service[6241]:    count = 1
May 31 14:52:37 ds9 smfc.service[6241]:    temp_calc = 1
May 31 14:52:37 ds9 smfc.service[6241]:    steps = 4
May 31 14:52:37 ds9 smfc.service[6241]:    sensitivity = 2.0
May 31 14:52:37 ds9 smfc.service[6241]:    polling = 10.0
May 31 14:52:37 ds9 smfc.service[6241]:    min_temp = 32.0
May 31 14:52:37 ds9 smfc.service[6241]:    max_temp = 46.0
May 31 14:52:37 ds9 smfc.service[6241]:    min_level = 35
May 31 14:52:37 ds9 smfc.service[6241]:    max_level = 100
May 31 14:52:37 ds9 smfc.service[6241]:    hwmon_path = ['/sys/class/scsi_disk/1:0:0:0/device/hwmon/hwmon0/temp1_input']
May 31 14:52:37 ds9 smfc.service[6241]:    Temperature to level mapping:
May 31 14:52:37 ds9 smfc.service[6241]:    0. [T:32.0C - L:35%]
May 31 14:52:37 ds9 smfc.service[6241]:    1. [T:35.5C - L:51%]
May 31 14:52:37 ds9 smfc.service[6241]:    2. [T:39.0C - L:67%]
May 31 14:52:37 ds9 smfc.service[6241]:    3. [T:42.5C - L:83%]
May 31 14:52:37 ds9 smfc.service[6241]:    4. [T:46.0C - L:100%]
May 31 14:52:37 ds9 smfc.service[6241]:    WARNING: Standby guard is disabled ([HD zone] count=1
May 31 14:52:37 ds9 smfc.service[6241]:    hd_names = ['/dev/disk/by-id/ata-Patriot_P210_512GB_P210IBCB23102410314']
May 31 14:52:37 ds9 smfc.service[6241]:    Standby guard is disabled
May 31 14:52:37 ds9 smfc.service[6241]:    hddtemp_path = /usr/sbin/hddtemp
May 31 14:52:39 ds9 smfc.service[6241]: CPU zone: new level > 32.4C > [T:30.0C/L:35%]
May 31 14:52:41 ds9 smfc.service[6241]: HD zone: new level > 30.0C > [T:32.0C/L:35%]
petersulyok commented 4 months ago

Maybe a hint: if you have only one SSD installed, you may disable the HD Zone and connect all fans to CPU Zone. Or do you have more hard disks?

Xyz00777 commented 4 months ago

i have 8 hdd and 2 ssd :D but i find out one thing after i restarted my server 2 times. every time he restarts, ~ at the moment smfc starts the fans ramp up completly even if smfc is running smoothly, i have to restart the smfc service one time to let the fans go down again... 🤔

Xyz00777 commented 4 months ago

correction, it looks like it took around 3 and a half minute after system start to let the fans go down again

May 31 15:41:52 ds9 smfc.service[2585]: CPU zone: new level > 37.6C > [T:40.0C/L:50%] May 31 15:41:54 ds9 smfc.service[2585]: HD zone: new level > 35.0C > [T:32.0C/L:25%] May 31 15:45:23 ds9 smfc.service[2585]: CPU zone: new level > 34.6C > [T:35.0C/L:37%]

petersulyok commented 4 months ago

i have 8 hdd and 2 ssd

They are not in the config currently. You have to specify them in hd_names= config parameter. I suggest to remove SSDs and keep HDDs in the config.

it looks like it took around 3 and a half minute after system start to let the fans go down again

Do not worry. This is a typical fine tuning of your configuration. The fan level is controlled in a dynamic way based on the temperature, meaning low temperature will define low fan rotation speed.

Please check and configure the proper temperatures and fan levels for the fans in the CPU and HD zones. The default values on the configuration will not fit to your system. Please take a look in the documentation, it is long but will help you to create a proper configuration. I'm also happy to help you here.

petersulyok commented 4 months ago

I was thinking on that:

at the moment smfc starts the fans ramp up completly even if smfc is running smoothly, i have to restart the smfc service one time to let the fans go down again

You may reset the IPMI BMC (sometime it has issues)

$ ipmitool mc reset cold

and after reset you should define threshold values again!

Xyz00777 commented 4 months ago

i think these didnt fixed it really but when it happens i just restart the facility so its okay for now :), thanks! And i switched to the hdd temps

Xyz00777 commented 1 month ago

i dont know what changed but i had to reinstall the system a few days ago and now everything works fine, thanks :) you can close it (i cant close it)