Open justindthomas opened 1 year ago
@jeff-yin Could you check with this plartform in master ?
As I'm becoming more comfortable with SONiC, I wonder if the fact that I don't have a second PSU installed might be playing into this. I've noticed that sometimes small configuration changes can cause problems in modules that have different expectations.
I don't have a second PSU to plug in, but that might be something to investigate.
As I'm becoming more comfortable with SONiC, I wonder if the fact that I don't have a second PSU installed might be playing into this. I've noticed that sometimes small configuration changes can cause problems in modules that have different expectations.
I don't have a second PSU to plug in, but that might be something to investigate.
Usually the fans will go to 100% when a FAN module is removed. I don't think lacking a PSU would trigger this. There may be some missing thermal policy code for this platform. I've asked a couple of people at Dell to look into it. @arunlk-dell @vpsubramaniam
I checked last night to see that all 3 fan modules were running and they're all moving air. The LEDs on all of them are off.
The speed of the fans does not seem to be dependent on the PSU presence, but the display of the status does. I picked up a second PSU and the command show platform fan
now works.
jdt@sonic:~$ sudo show platform fan
Drawer LED FAN Speed Direction Presence Status Timestamp
-------- ----- ------------- ------- ----------- ---------- -------- -----------------
FanTray1 N/A FanTray1-Fan1 57% intake Present OK 20231014 02:51:38
FanTray2 N/A FanTray2-Fan1 59% intake Present OK 20231014 02:51:38
FanTray3 N/A FanTray3-Fan1 58% intake Present OK 20231014 02:51:38
N/A N/A PSU1 Fan 15% intake Present OK 20231014 02:51:38
N/A N/A PSU2 Fan 15% intake Present OK 20231014 02:51:39
So they aren't running at 100%, but they are running at a constant higher speed than the default software (Dell OS6, I believe) that came on the switch. Maybe that's normal? It seems like the temperatures could tolerate a less aggressive setting.
jdt@sonic:~$ sudo show platform temperature
Sensor Temperature High TH Low TH Crit High TH Crit Low TH Warning Timestamp
---------------------------- ------------- --------- -------- -------------- ------------- --------- -----------------
Front Panel PHY Temperature 30.687 75 0 N/A N/A False 20231014 02:51:39
Middle Fan Tray Temperature 23.312 75 0 N/A N/A False 20231014 02:51:39
Near Front Panel Temperature 29.25 75 0 N/A N/A False 20231014 02:51:39
Switch Near Temperature 29.75 75 0 N/A N/A False 20231014 02:51:39
Switch Rear Temperature 24.5 75 0 N/A N/A False 20231014 02:51:39
Also, show environment
is still broken as described in the original issue.
@justindthomas .. will be raising the pull request to fix the commands 'show environment' and 'show platform fan' by next week. For the fan speed will bring in the thermal manager changes sooner.
That's great, @arunlk-dell - thanks!
For the fan speed issues, I was able to tame them by adjusting these values:
/sys/bus/i2c/devices/7-002c/pwm1
/sys/bus/i2c/devices/7-002c/pwm2
/sys/bus/i2c/devices/7-002c/pwm3
By default, these are all set to 255
, with /sys/bus/i2c/devices/7-002c/pwm#_enable
set to 0
, which results in the continuous ~58% speed for all of them. I changed those to 100
and the speeds dropped to between 20%-30% and seems more varied, like the system is properly responding to the changing temperature.
Does that parameter adjust the aggressiveness of the thermal algorithm? I experimented with changing the pwm_enable
to 1
, 2
, and 3
, but only 0
and 3
seem to be enabled. And 3
sets the fans to 4% and triggers a fault indicator, so that's clearly not appropriate.
Here's the corrected platform_sensors.py
file to make show environment
work properly. I changed the print
statements to add parentheses, and I specified the iso-8859-1
encoding for the eeprom
output, since that seems to be what the switch generates. I also changed that first check_output
at the top to specify text=True
.
Should I submit a PR? I assume this belongs in platform-specific code somewhere.
#!/usr/bin/python
# This provies support for the following objects:
# * Onboard temperature sensors
# * FAN trays
# * PSU
import subprocess
output = ""
try:
rc = 0
output = subprocess.check_output('/usr/bin/sensors', text=True).splitlines()
valid = False
for line in output:
if line.startswith('acpitz') or line.startswith('coretemp'):
valid = True
if valid:
print(line)
if line == '': valid = False
print("Onboard Temperature Sensors:")
idx = 0
for line in output:
if line.startswith('tmp75'):
print('\t' + output[idx+2].split('(')[0])
idx += 1
print("\nFanTrays:")
idx = 0
found_emc = False
for line in output:
if line.startswith('emc'):
found_emc = True
with open('/sys/devices/platform/dell-n3248te-cpld.0/fan0_prs') as f:
line = f.readline()
present = int(line, 0)
if present :
print('\t' + 'FanTray1:')
print('\t\t' + 'Fan Speed:' + (output[idx+2].split('(')[0]).split(':')[1])
with open('/sys/devices/platform/dell-n3248te-cpld.0/fan0_dir') as f:
line = f.readline()
dir = 'Intake' if line[:-1] == 'B2F' else 'Exhaust'
print('\t\t' + 'Airflow:\t' + dir)
else : print('\t' + 'FanTray1:\tNot Present')
with open('/sys/devices/platform/dell-n3248te-cpld.0/fan1_prs') as f:
line = f.readline()
present = int(line, 0)
if present :
print('\t' + 'FanTray2:')
print('\t\t' + 'Fan Speed:' + (output[idx+3].split('(')[0]).split(':')[1])
with open('/sys/devices/platform/dell-n3248te-cpld.0/fan1_dir') as f:
line = f.readline()
dir = 'Intake' if line[:-1] == 'B2F' else 'Exhaust'
print('\t\t' + 'Airflow:\t' + dir)
else : print('\t' + 'FanTray2:\tNot Present')
with open('/sys/devices/platform/dell-n3248te-cpld.0/fan2_prs') as f:
line = f.readline()
present = int(line, 0)
if present :
print('\t' + 'FanTray3:')
print('\t\t' + 'Fan Speed:' + (output[idx+4].split('(')[0]).split(':')[1])
with open('/sys/devices/platform/dell-n3248te-cpld.0/fan2_dir') as f:
line = f.readline()
dir = 'Intake' if line[:-1] == 'B2F' else 'Exhaust'
print('\t\t' + 'Airflow:\t' + dir)
else : print('\t' + 'FanTray3:\tNot Present')
idx += 1
if not found_emc :
print('\t' + 'FanTray1:\tNot Present')
print('\t' + 'FanTray2:\tNot Present')
print('\t' + 'FanTray3:\tNot Present')
print('\nPSUs:')
idx = 0
with open('/sys/devices/platform/dell-n3248te-cpld.0/psu0_prs') as f:
line = f.readline()
found_psu1 = int(line, 0)
if not found_psu1 :
print('\tPSU1:\tNot Present')
with open('/sys/devices/platform/dell-n3248te-cpld.0/psu1_prs') as f:
line = f.readline()
found_psu2 = int(line, 0)
for line in output:
if line.startswith('dps460-i2c-10'):
with open('/sys/devices/platform/dell-n3248te-cpld.0/psu0_status') as f:
line = f.readline()
status = int(line, 0)
if not status :
print('\tPSU1:\tNot OK')
break
with open('/sys/bus/i2c/devices/10-0056/eeprom', encoding='iso-8859-1') as f:
line = f.readline()
dir = 'Exhaust' if 'FORWARD' in line else 'Intake'
print('\tPSU1:')
print('\t\t' + output[idx+2].split('(')[0])
print('\t\t' + output[idx+4].split('(')[0])
print('\t\t' + output[idx+6].split('(')[0])
print('\t\t' + output[idx+7].split('(')[0])
print('\t\t' + output[idx+9].split('(')[0])
print('\t\t' + output[idx+11].split('(')[0])
print('\t\t' + output[idx+12].split('(')[0])
print('\t\t' + output[idx+14].split('(')[0])
print('\t\t' + output[idx+15].split('(')[0])
print('\t\t' + 'Airflow:\t\t ' + dir)
if line.startswith('dps460-i2c-11'):
with open('/sys/devices/platform/dell-n3248te-cpld.0/psu1_status') as f:
line = f.readline()
status = int(line, 0)
if not status :
print('\tPSU2:\tNot OK')
break
print('\tPSU2:')
with open('/sys/bus/i2c/devices/11-0056/eeprom', encoding='iso-8859-1') as f:
line = f.readline()
dir = 'Exhaust' if 'FORWARD' in line else 'Intake'
print('\t\t' + output[idx+2].split('(')[0])
print('\t\t' + output[idx+4].split('(')[0])
print('\t\t' + output[idx+6].split('(')[0])
print('\t\t' + output[idx+7].split('(')[0])
print('\t\t' + output[idx+9].split('(')[0])
print('\t\t' + output[idx+11].split('(')[0])
print('\t\t' + output[idx+12].split('(')[0])
print('\t\t' + output[idx+14].split('(')[0])
print('\t\t' + output[idx+15].split('(')[0])
print('\t\t' + 'Airflow:\t\t ' + dir)
idx += 1
if not found_psu2 :
print('\tPSU2:\tNot Present')
except subprocess.CalledProcessError as err:
print ("Exception when calling get_sonic_error -> %s\n" %(err))
rc = err.returncode
PR submitted here: https://github.com/sonic-net/sonic-buildimage/pull/17508
Description
I'm new to SONiC and installed it on a Dell N3248TE-ON I received a couple of days ago. On initial boot, the switch actively managed the fans (i.e., speed was constantly changing, presumably in response to load, but was at an average pretty quiet speed).
After installing SONiC, the fans just run at a high (loud) speed constantly. Commands to show the fan status fail with Python errors.
Steps to reproduce the issue:
show platform fan
show environment
Describe the results you received:
show platform fan
show environment
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:techsupport.txt
Additional information you deem important (e.g. issue happens only occasionally):
The dump file is 31MB and GitHub rejects files over 25MB.