Closed petersulyok closed 1 month ago
Hi @Xyz00777,
Your problem is that the CPU temperature cannot be read from HWMON, as the log stated:
ValueError: Cannot read file (/sys/devices/platform/coretemp.0/hwmon/hwmon*/temp1_input).
Based on SuperMicro official page you have AMD CPU and you have to configure the proper file manually in smfc
config. You can find more information here, it will be something like this:
hwmon_path=/sys/bus/pci/drivers/k10temp/0000*/hwmon/hwmon*/temp1_input
the path is the same on my system, i decommented it in the /opt/smfc/smfc.con and i was able to start it :)
Thank you very mutch! can i provide you/do you need anymore information for further developement if these really awesome software? or can we close the issue?
May 31 14:52:26 ds9 systemd[1]: Started smfc.service - Super Micro Fan Control.
May 31 14:52:26 ds9 smfc.service[6241]: Logging module was initialized with:
May 31 14:52:26 ds9 smfc.service[6241]: log_level = 3
May 31 14:52:26 ds9 smfc.service[6241]: log_output = 2
May 31 14:52:26 ds9 smfc.service[6241]: Command line arguments:
May 31 14:52:26 ds9 smfc.service[6241]: original arguments: /opt/smfc/smfc.py -c /opt/smfc/smfc.conf -l 3
May 31 14:52:26 ds9 smfc.service[6241]: parsed config file = /opt/smfc/smfc.conf
May 31 14:52:26 ds9 smfc.service[6241]: parsed log level = 3
May 31 14:52:26 ds9 smfc.service[6241]: parsed log output = 2
May 31 14:52:27 ds9 smfc.service[6241]: Ipmi module was initialized with:
May 31 14:52:27 ds9 smfc.service[6241]: command = /usr/bin/ipmitool
May 31 14:52:27 ds9 smfc.service[6241]: fan_mode_delay = 10
May 31 14:52:27 ds9 smfc.service[6241]: fan_level_delay = 2
May 31 14:52:27 ds9 smfc.service[6241]: swapped_zones = False
May 31 14:52:37 ds9 smfc.service[6241]: CPU zone fan controller was initialized with:
May 31 14:52:37 ds9 smfc.service[6241]: ipmi zone = 0
May 31 14:52:37 ds9 smfc.service[6241]: count = 1
May 31 14:52:37 ds9 smfc.service[6241]: temp_calc = 1
May 31 14:52:37 ds9 smfc.service[6241]: steps = 6
May 31 14:52:37 ds9 smfc.service[6241]: sensitivity = 3.0
May 31 14:52:37 ds9 smfc.service[6241]: polling = 2.0
May 31 14:52:37 ds9 smfc.service[6241]: min_temp = 30.0
May 31 14:52:37 ds9 smfc.service[6241]: max_temp = 60.0
May 31 14:52:37 ds9 smfc.service[6241]: min_level = 35
May 31 14:52:37 ds9 smfc.service[6241]: max_level = 100
May 31 14:52:37 ds9 smfc.service[6241]: hwmon_path = ['/sys/bus/pci/drivers/k10temp/0000:00:18.3/hwmon/hwmon6/temp1_input']
May 31 14:52:37 ds9 smfc.service[6241]: Temperature to level mapping:
May 31 14:52:37 ds9 smfc.service[6241]: 0. [T:30.0C - L:35%]
May 31 14:52:37 ds9 smfc.service[6241]: 1. [T:35.0C - L:45%]
May 31 14:52:37 ds9 smfc.service[6241]: 2. [T:40.0C - L:56%]
May 31 14:52:37 ds9 smfc.service[6241]: 3. [T:45.0C - L:67%]
May 31 14:52:37 ds9 smfc.service[6241]: 4. [T:50.0C - L:78%]
May 31 14:52:37 ds9 smfc.service[6241]: 5. [T:55.0C - L:89%]
May 31 14:52:37 ds9 smfc.service[6241]: 6. [T:60.0C - L:100%]
May 31 14:52:37 ds9 smfc.service[6241]: HD zone fan controller was initialized with:
May 31 14:52:37 ds9 smfc.service[6241]: ipmi zone = 1
May 31 14:52:37 ds9 smfc.service[6241]: count = 1
May 31 14:52:37 ds9 smfc.service[6241]: temp_calc = 1
May 31 14:52:37 ds9 smfc.service[6241]: steps = 4
May 31 14:52:37 ds9 smfc.service[6241]: sensitivity = 2.0
May 31 14:52:37 ds9 smfc.service[6241]: polling = 10.0
May 31 14:52:37 ds9 smfc.service[6241]: min_temp = 32.0
May 31 14:52:37 ds9 smfc.service[6241]: max_temp = 46.0
May 31 14:52:37 ds9 smfc.service[6241]: min_level = 35
May 31 14:52:37 ds9 smfc.service[6241]: max_level = 100
May 31 14:52:37 ds9 smfc.service[6241]: hwmon_path = ['/sys/class/scsi_disk/1:0:0:0/device/hwmon/hwmon0/temp1_input']
May 31 14:52:37 ds9 smfc.service[6241]: Temperature to level mapping:
May 31 14:52:37 ds9 smfc.service[6241]: 0. [T:32.0C - L:35%]
May 31 14:52:37 ds9 smfc.service[6241]: 1. [T:35.5C - L:51%]
May 31 14:52:37 ds9 smfc.service[6241]: 2. [T:39.0C - L:67%]
May 31 14:52:37 ds9 smfc.service[6241]: 3. [T:42.5C - L:83%]
May 31 14:52:37 ds9 smfc.service[6241]: 4. [T:46.0C - L:100%]
May 31 14:52:37 ds9 smfc.service[6241]: WARNING: Standby guard is disabled ([HD zone] count=1
May 31 14:52:37 ds9 smfc.service[6241]: hd_names = ['/dev/disk/by-id/ata-Patriot_P210_512GB_P210IBCB23102410314']
May 31 14:52:37 ds9 smfc.service[6241]: Standby guard is disabled
May 31 14:52:37 ds9 smfc.service[6241]: hddtemp_path = /usr/sbin/hddtemp
May 31 14:52:39 ds9 smfc.service[6241]: CPU zone: new level > 32.4C > [T:30.0C/L:35%]
May 31 14:52:41 ds9 smfc.service[6241]: HD zone: new level > 30.0C > [T:32.0C/L:35%]
Maybe a hint: if you have only one SSD installed, you may disable the HD Zone and connect all fans to CPU Zone. Or do you have more hard disks?
i have 8 hdd and 2 ssd :D but i find out one thing after i restarted my server 2 times. every time he restarts, ~ at the moment smfc starts the fans ramp up completly even if smfc is running smoothly, i have to restart the smfc service one time to let the fans go down again... 🤔
correction, it looks like it took around 3 and a half minute after system start to let the fans go down again
May 31 15:41:52 ds9 smfc.service[2585]: CPU zone: new level > 37.6C > [T:40.0C/L:50%] May 31 15:41:54 ds9 smfc.service[2585]: HD zone: new level > 35.0C > [T:32.0C/L:25%] May 31 15:45:23 ds9 smfc.service[2585]: CPU zone: new level > 34.6C > [T:35.0C/L:37%]
i have 8 hdd and 2 ssd
They are not in the config currently. You have to specify them in hd_names=
config parameter. I suggest to remove SSDs and keep HDDs in the config.
it looks like it took around 3 and a half minute after system start to let the fans go down again
Do not worry. This is a typical fine tuning of your configuration. The fan level is controlled in a dynamic way based on the temperature, meaning low temperature will define low fan rotation speed.
Please check and configure the proper temperatures and fan levels for the fans in the CPU and HD zones. The default values on the configuration will not fit to your system. Please take a look in the documentation, it is long but will help you to create a proper configuration. I'm also happy to help you here.
I was thinking on that:
at the moment smfc starts the fans ramp up completly even if smfc is running smoothly, i have to restart the smfc service one time to let the fans go down again
You may reset the IPMI BMC (sometime it has issues)
$ ipmitool mc reset cold
and after reset you should define threshold values again!
i think these didnt fixed it really but when it happens i just restart the facility so its okay for now :), thanks! And i switched to the hdd temps
i dont know what changed but i had to reinstall the system a few days ago and now everything works fine, thanks :) you can close it (i cant close it)
@Xyz00777 reported an issue in SMFC hardware compatibility #19 issue:
trying to get it working for my H11SSL-i with ASPEED AST2500 with an proxmox install. because im not sure with fans are connected on what pwm i tried to set lower to 500 for every fan and 2000 as upper limit for every fan in the config
i have Iceberg Thermal IceGALE Xtra with 500-2500 rpm and Noctua NH-U9 TR4-SP3 with 400-2000 rpm
after i loaded the modules and executed the install.sh file i have startet the service and got these journalctl log and the service crashed with 100% fan speed
Please help i dont want my fans to spin up every ~10 sec for 5 sec :(