Closed cjohnsonj closed 4 years ago
Thank you for all this data. I see some issues that I may have to resolve and this will take some time.
What caught my eye is the ending:
/dev/sdz: Unknown USB bridge [0x13fe:0x4200 (0x100)]
Please specify device type with the -d option.
It seems that you have an USB hard drive attached which doesn't play ball smartctl. Can you maybe post the output of smartctl -d ata /dev/sdz ?
In the mean time, I wonder if disconnecting /dev/sdz would temporary solve your issue. (if that's possible for you)
I noticed this error and realised I don't filter out USB devices as I have none connected to my servers. One USB hard drive I got worked fine and even reported SMART over USB. Fortunately I found an old USB hard drive that didn't and promptly storagefancontrol crashed.
Although this is a different issue, it seems related to your problem.
Would you mind testing out my new version?
You're right about the USB drive. I have a USB drive in this server as an alternate boot source, should the primary SSD fail. I don't expect any smart data from it. It's not really a "hard drive", it's just a small flash drive.
`smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.4.0-0.bpo.4-amd64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
ATA device successfully opened
Use 'smartctl -a' (or '-x') to print SMART (and more) information
`
I did try removing the flash drive, and restarting the daemon. There's no change in the behavior though. For reference, this was NOT on the new version that you just posted. I'll try that in just a minute and let you know how things go.
Ok. I've tested the new version. Bad news. It crashes and burns now....
Graphite module not installed. Data will not be logged to Graphite. Traceback (most recent call last): File "/usr/sbin/storagefancontrol", line 575, in <module> main() File "/usr/sbin/storagefancontrol", line 564, in main highest_temperature = temp_source.get_highest_temperature() File "/usr/sbin/storagefancontrol", line 243, in get_highest_temperature self.highest_temperature = max(results) ValueError: max() arg is an empty sequence
Thanks for your patience, this error message is helpful to me. I’ll investigate further.
Sent from my iPad
On 1 May 2020, at 02:53, cjohnsonj notifications@github.com wrote:
Ok. I've tested the new version. Bad news. It crashes and burns now.... Graphite module not installed. Data will not be logged to Graphite. Traceback (most recent call last): File "/usr/sbin/storagefancontrol", line 575, in
main() File "/usr/sbin/storagefancontrol", line 564, in main highest_temperature = temp_source.get_highest_temperature() File "/usr/sbin/storagefancontrol", line 243, in get_highest_temperature self.highest_temperature = max(results) ValueError: max() arg is an empty sequence — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I'm still working on your issue, but I discovered something else by accident which you should be aware of (if you aren't already):
3 of your hard drives are in bad shape. Two of them seem really bad, one is just starting to show some problems.
As you can see, two of your hard drives are in bad shape as thousands of sectors have been deemed bad and reallocated (replaced with spare ones). I'm not sure how long this is going on and if you are aware, but I would recommend replacing them all, or at least the two 'bad' ones.
Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 40 Reallocated_Sector_Ct 0x0033 098 098 036 Pre-fail Always - 2968 Reallocated_Sector_Ct 0x0033 073 073 036 Pre-fail Always - 1126
I've used your test data and could not reproduce your problem. Have you found a work-around for your particular problem or are you willing to run a debug-version of storagefancontrol to see if we can determine the cause?
I haven't found a workaround and I am willing to run a debug version.
I have created two branches. The first branch is called python2.x. This is the original storagefancontrol you used before the python3 migration. You could probably switch to the old version and things should be ok.
I've also created the 'testing' branch. This branch is based on the 'new' python3 code we have been throubleshooting. This code will create a folder /tmp/storagefancontrol in which some data is written (feel free to inspect the contents).
If you want to run this testing / debug version, could you zip the /tmp/storagefancontrol output and share the zip file here?
The testing/debug version doesn't generate anything in /tmp for me to provide. I think that's because it's exiting before anything of substance can start to happen.. `Graphite module not installed. Data will not be logged to Graphite.
No temperature data retrieved through SMART. Exiting.
Just to sanity check things I whipped up a thing in bash to make sure that smartctl is there/working etc....
sda temp is ### this is my SSD that doesn't show temps sdb temp is 31 sdc temp is 32 sdd temp is 32 sde temp is 32 sdf temp is ### this is an SAS drive that outputs temp, but you have to get it in a different way sdg temp is 30 sdh temp is 31 sdi temp is ### this is the other SAS drive sdj temp is 31 sdk temp is 32 sdl temp is 31 sdm temp is 31 sdo temp is 30 sdp temp is 32 sdq temp is 37 sdr temp is 30 sds temp is 31 sdt temp is 30 sdu temp is 31 sdv temp is 29 sdw temp is 29 sdx temp is 29 sdy temp is 28 sdz temp is ### this is my flash drive so again, no temp data here.
FILES=ls /sys/block/ | grep sd
for disk in $FILES
do
#smartctl -a /dev/$f
#smartctl -a /dev/$f | grep Temperature_Celsius | tail -c3
#smartctl -a /dev/$f | grep Temperature_Celsius | awk '{print $4}' | sed 's/^0*//'
#smartctl -a /dev/$f | grep Temperature_Celsius | awk '{print $10}'
#hddtemp -n /dev/$f 2>&1 | grep -P '(?<!\d)\d{2}(?!\d)'
echo $disk temp is `smartctl -l scttempsts /dev/$disk | grep -oP 'Current Temperature:\K.*(?<!\d)\d{2}(?!\d)'`
#echo ""
#echo ""
` I don't know how much, or if that helps you any. I thought I'd try to help somehow......
P.S. Thanks for the heads up about the smart status of those three drives. I had already ordered replacement drives for the three that were failing. It just took FOREVER for them to actually show up here. Anyway, they've all be replaced with new Ironwolf's. P.P.S Thanks again for your help with this problem.
Thank you for the information, it has helped me to understand what is going on. I didn't realise there were other drives not showing temperature or in the same manner.
I will do my best to fix this once and for all.
Yeah, it's a pain when the data that we want, isn't in the place, in the same format, every time. ;-)
I have updated the 'testing' branch with a new version that allows you to exclude the hard drives that don't output temperature data or in a format I can't parse yet.
The /etc/storagefancontrol file needs an extra line like this in the 'Smart' section of the ini file:
device_exclude = "sda sdb sdc"
For debugging purposes you should see something like this when starting storagefancontrol:
ansible@server:~$ sudo DEBUG=True ./storagefancontrol 'Device sda is excluded.' 'Device sdb is excluded.' 'Device sdc is excluded.' ['sdd', 'sde', 'sdf', 'sdg', 'sdh', 'sdi', 'sdj'] Temp: 36 | FAN: 19% | PWM: 120 | P=-12 | I=51 | D=-20 | Err=-4 |
I would recommend that you try this and add all hard drives that may cause issues due to their (lack of) output.
In a second fase I will actually improve temperature parsing, to cope with absent or non-standard output.
I actually don't see any branch but master here anymore. Am I doing something wrong?
I can see them on github.com maybe use:
git fetch git checkout testing
You should see the remote branches with git branch -r https://stackoverflow.com/questions/1783405/how-do-i-check-out-a-remote-git-branch
Op do 21 mei 2020 om 00:51 schreef cjohnsonj notifications@github.com:
I actually don't see any branch but master here anymore. Am I doing something wrong?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I'm an idiot. I was looking at the lsidrivemap page. Pulling it down now... I'll let you know my findings shortly.
`[General]
polling_interval = 10
target_temperature = 37
mode = smart
[Pid]
P = 2
I = 1
D = 5
D_amplification = 0
I_start = 30
I_max = 100 I_min = 15
[Chassis]
fan_control_device = /sys/class/hwmon/hwmon4/pwm2
fan_control_enable = /sys/class/hwmon/hwmon4/pwm2_enable
pwm_min = 40 pwm_max = 255
pwm_safety = 75
[Smart]
device_filter = "sd" device_exclude = "sdi sdz sdc"
usb_filter = false # set to true if you want to filter out USB devices
smart_workers = 30
[Controller] megacli = /usr/sbin/megacli ports_per_controller = 8
[Graphite] enabled = 0 host = target.graphite.com port = 2003 prefix = storagefancontrol
""" This program controls the chassis fan speed through PWM based on the temperature of the hottest hard drive in the chassis. It uses the IBM M1015 or LSI tool 'MegaCli' for reading hard drive temperatures. """ import os import sys import subprocess import re import time import syslog import multiprocessing as mp import copyreg import types import configparser import pprint try: import graphitesend except ModuleNotFoundError: print("Graphite module not installed. Data will not be logged to Graphite.")
def _reduce_method(meth): """ This is a hack to work around the fact that multiprocessing can't operate on class methods by default. """ return (getattr, (meth.self, meth.func.name))
class PID: """ Discrete PID control Source: http://code.activestate.com/recipes/577231-discrete-pid-controller/
This class calculates the appropriate fan speed based on the difference
between the current temperature and the desired (target) temperature.
"""
def __init__(self, P, I, D, Derivator, Integrator, \
Integrator_max, Integrator_min):
"""
Generic initialisation of local variables.
"""
self.Kp = P
self.Ki = I
self.Kd = D
self.Derivator = Derivator
self.Integrator = Integrator
self.Integrator_max = Integrator_max
self.Integrator_min = Integrator_min
self.set_point = 0.0
self.error = 0.0
def update(self, current_value):
"""
Calculate PID output value for given reference input and feedback
Current_value = set_point - measured value (difference)
"""
self.error = current_value - int(self.set_point)
self.P_value = self.Kp * self.error
self.D_value = self.Kd * ( self.error + self.Derivator)
self.Derivator = self.error
self.Integrator = self.Integrator + self.error
if self.Integrator > self.Integrator_max:
self.Integrator = self.Integrator_max
elif self.Integrator < self.Integrator_min:
self.Integrator = self.Integrator_min
self.I_value = self.Integrator * self.Ki
PID = self.P_value + self.I_value + self.D_value
return PID
def set_target_value(self, set_point):
"""
Initilize the setpoint of PID
"""
self.set_point = set_point
copyreg.pickle(types.MethodType, _reduce_method) class Smart: """ Uses SMART data from storage devices to determine the temperature of the hottest drive. """
def __init__(self, loadDevices):
"""
Init.
"""
self.block_devices = ""
self.device_filter = "(sd[a-z])"
self.usb_filter = False
self.highest_temperature = 0
self.device_exclude = []
if loadDevices:
self.get_block_devices()
self.smart_workers = 24
def filter_disk_by_path_usb(self):
disk_by_path = os.listdir('/dev/disk/by-path')
filtered_list = []
for item in disk_by_path:
if 'usb' not in item and 'part' not in item:
link = os.readlink("/dev/disk/by-path/" + item)
link = os.path.basename(link)
filtered_list.append(link)
return filtered_list
def filter_usb_devices(self, block_devices):
valid_devices = self.filter_disk_by_path_usb()
usb_filtered_devices = []
if self.usb_filter == 'true':
for device in block_devices:
if device in valid_devices:
usb_filtered_devices.append(device)
else:
usb_filtered_devices = block_devices
return usb_filtered_devices
def filter_excluded_devices(self, block_devices):
valid_devices = []
for device in block_devices:
if device not in self.device_exclude:
valid_devices.append(device)
else:
if is_debug_enabled():
pprint.pprint(f"Device {device} is excluded.")
return valid_devices
def filter_block_devices(self, block_devices):
"""
Filter out devices like 'loop, ram'.
"""
devices = []
for device in block_devices:
if not re.search(self.device_filter, device):
continue
else:
devices.append(device)
return devices
def get_block_devices(self):
"""
Retrieve the list of block devices.
By default only lists /dev/sd* devices.
Configure the appropriate device filter with
setting <object>.device_filter to some other value.
"""
devicepath = "/sys/block"
block_devices = os.listdir(devicepath)
block_devices.sort()
self.block_devices = self.filter_block_devices(block_devices)
self.block_devices = self.filter_usb_devices(self.block_devices)
# self.block_devices = ['sda', 'sdb', 'sdc', 'sdd', 'sde', 'sdf',\
# 'sdg', 'sdh', 'sdi', 'sdj', 'sdk', 'sdl', 'sdm', 'sdn', 'sdo',\
# 'sdp', 'sdq', 'sdr', 'sds', 'sdt', 'sdu', 'sdv']
self.block_devices = self.filter_excluded_devices(self.block_devices)
def get_smart_data_debug(self, device):
try:
child = subprocess.Popen(['cat', device
], stdout=subprocess.PIPE, \
stderr=subprocess.PIPE)
except OSError:
print("Executing smartctl gave an error,")
print("is smartmontools installed?")
sys.exit(1)
rawdata = child.communicate()
# pprint.pprint(type(rawdata[0]))
# pprint.pprint(child.returncode)
if child.returncode == 1:
return -2
smartdata = rawdata[0]
return smartdata
def get_smart_data(self, device):
"""
Call the smartctl command line utilily on a device to get the raw
smart data output.
"""
device = "/dev/" + device
try:
child = subprocess.Popen(['smartctl', '-a', \
device], stdout=subprocess.PIPE, \
stderr=subprocess.PIPE)
except OSError:
print("Executing smartctl gave an error,")
print("is smartmontools installed?")
sys.exit(1)
rawdata = child.communicate()
if child.returncode:
if child.returncode == 1:
return None
smartdata = rawdata[0]
destination = '/tmp/storagefancontrol'
os.makedirs(destination, exist_ok=True)
with open(destination + '/smartdata_' + os.path.basename(device), 'w') as f:
f.write(smartdata.decode('utf-8'))
return smartdata
def get_parameter_from_smart(self, data, parameter, distance):
"""
Retreives the desired value from the raw smart data.
"""
regex = re.compile(parameter + '(.*)')
match = regex.search(data)
# DEBUG
destination = '/tmp/storagefancontrol'
os.makedirs(destination, exist_ok=True)
with open(destination + '/regex_data', 'a') as f:
f.write(str(match)+'\n')
# /DEBUG
if match:
tmp = match.group(1)
length = len(tmp.split(" "))
if length <= distance:
distance = length-1
#
# SMART data is often a bit of a mess, so this
# hack is used to cope with this.
#
try:
model = match.group(1).split(" ")[distance].split(" ")[1]
except:
model = match.group(1).split(" ")[distance+1].split(" ")[1]
return str(model)
return -10
def get_temperature(self, device):
"""
Get the current temperature of a block device.
"""
result = self.get_smart_data(device)
if isinstance(result, bytes):
smart_data = self.get_smart_data(device).decode('utf-8')
temperature = int(self.get_parameter_from_smart(smart_data, \
'Temperature_Celsius', 10))
else:
temperature = -30 # should mean 'error'
return temperature
def get_highest_temperature(self):
"""
Get the highest temperature of all the block devices in the system.
Because retrieving SMART data is slow, multiprocessing is used
to collect SMART data in parallel from multiple devices.
"""
highest_temperature = 0
pool = mp.Pool(processes=int(self.smart_workers))
results = pool.map(self.get_temperature, self.block_devices)
pool.close()
try:
self.highest_temperature = max(results)
except ValueError:
print(f"\nNo temperature data retreived through SMART. Exiting.\n")
exit(1)
return self.highest_temperature
class Controller: """ Reading temperature data from IBM / LSI controllers. """ def init(self): self.megacli = "/opt/MegaRAID/MegaCli/megacli" self.ports_per_controller = 8 self.highest_temperature = 0
def number_of_controllers(self):
"""
Get the number of LSI HBAs on the system.
In my case, I have 3 controllers with 8 drives each.
"""
rawdata = subprocess.Popen(\
[self.megacli,'-cfgdsply','-aALL'],\
stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()[0]
regex = re.compile('Adapter:.*')
match = regex.findall(rawdata)
return len(match)
def get_drive_temp(self, controller, port):
"""
Get the temperature from an individual drive through the megacli
utility. The return value is a positive integer that specifies the
temperature in Celcius.
"""
rawdata = subprocess.Popen(\
[self.megacli, '-pdinfo', '-physdrv', '[64:' +\
str(port) +']', '-a' + str(controller)],\
stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate()[0]
regex = re.compile('Drive Temperature :(.*)')
match = regex.search(rawdata)
try:
temp = match.group(1).split("C")[0]
# Ugly hack: issue with some old WD drives
# Controller reports 65C for them.
if temp == "N/A":
temp = "?"
if int(temp) >= 60:
temp = "?"
return temp
except(AttributeError):
return ""
except(IndexError):
return ""
def fetch_data(self):
"""
Returns a two-dimentional list containing
the temperature of each drive. The first dimension is the
chassis. The second dimension is the drive.
"""
drivearray = \
[[0 for x in xrange(self.ports_per_controller)]\
for x in xrange(self.number_of_controllers())]
for controller in xrange(self.number_of_controllers()):
for port in xrange(self.ports_per_controller):
disk = self.get_drive_temp(controller, port)
if len(disk) == 0:
disk = ""
drivearray[controller][port] = disk
return drivearray
def get_highest_temperature(self):
"""
Walks through the list of all the drives and compares
all drive temperatures. The highest drive temperature
is returned as an integer, representing degrees of Celcius.
"""
data = self.fetch_data()
temperature = 0
for controller in data:
for disk in controller:
if disk > temperature:
temperature = disk
self.highest_temperature = int(temperature)
return self.highest_temperature
class FanControl: """ The chassis object provides you with the option:
Set the fan speed """
def init(self): """ Generic init method. """ self.polling_interval = 30 self.pwm_max = 255 self.pwm_min = 100 self.pwm_safety = 160 self.fan_speed = 50 self.fan_control_enable = "" self.fan_control_device = "" self.debug = False
def get_pwm(self): """ Return the current PWM speed setting. """ PWM=""
for device in self.fan_control_device:
filename = device
filehandle = open(filename, 'r')
pwm_value = int(filehandle.read().strip())
filehandle.close()
PWM = PWM + " " + str(pwm_value)
return PWM
def set_pwm(self, value): """ Sets the fan speed. Only allows values between pwm_min and pwm_max. Values outside these ranges are set to either pwm_min or pwm_max as a safety precaution. """ self.enable_fan_control()
for device in self.fan_control_device:
filename = device
pwm_max = self.pwm_max
pwm_min = self.pwm_min
value = pwm_max if value > pwm_max else value
value = pwm_min if value < pwm_min else value
filehandle = open(filename, 'w')
filehandle.write(str(value))
filehandle.close()
def set_fan_speed(self, percent): """ Set fan speed based on a percentage of full speed. Values are thus 1-100 instead of raw 1-255 """ self.fan_speed = percent one_percent = float(self.pwm_max) / 100 pwm = percent * one_percent self.set_pwm(int(pwm))
def enable_fan_control(self): """ Tries to enable manual fan speed control." """ for device in self.fan_control_enable: filename = device filehandle = open(filename, 'w') try: filehandle.write('1') filehandle.close() except IOError: message = "Error enabling fan control. Sufficient privileges?" print(message) sys.exit(1)
def is_debug_enabled(): """ Set debug if enabled. """ try: debug = os.environ['DEBUG'] if debug == "True": return True else: return False
except (KeyError):
return False
def log(temperature, chassis, pid, graphite): """ Logging to syslog and terminal (export DEBUG=True). """ P = str(pid.P_value) I = str(pid.I_value) D = str(pid.D_value) E = str(pid.error)
TMP = str(temperature)
PWM = str(chassis.get_pwm())
PCT = str(chassis.fan_speed)
all_vars = [TMP, PCT, PWM, P, I, D, E]
formatstring = "Temp: {:2} | FAN: {:2}% | PWM: {:3} | P={:3} | I={:3} | "\
"D={:3} | Err={:3}|"
msg = formatstring.format(*all_vars)
syslog.openlog("SFC")
syslog.syslog(msg)
if is_debug_enabled():
print(msg)
if graphite['enabled']:
dataset = [ ("temperature", TMP),
("pwm", PWM),
("fanspeed", PCT)]
send_to_graphite(dataset, graphite)
def send_to_graphite(dataset, settings): g = graphitesend.init(graphite_server=settings['host'], graphite_port=settings['port'], asynchronous=False, prefix=settings['prefix'])
g.send_list(dataset)
def read_config(): """ Main""" config_file = "/etc/storagefancontrol" conf = configparser.ConfigParser() conf.read(config_file) return conf
def get_pid_settings(config): """ Get PID settings """ P = config.getint("Pid", "P") I = config.getint("Pid", "I") D = config.getint("Pid", "D") D_amplification = config.getint("Pid", "D_amplification") I_start = config.getint("Pid", "I_start") I_max = config.getint("Pid", "I_max") I_min = config.getint("Pid", "I_min")
pid = PID(P, I, D, D_amplification, I_start, I_max, I_min)
target_temperature = config.getint("General", "target_temperature")
pid.set_target_value(target_temperature)
return pid
def get_temp_source(config): """ Configure temperature source."""
mode = config.get("General", "mode")
if mode == "smart":
temp_source = Smart(False)
temp_source.device_filter = config.get("Smart", "device_filter")
temp_source.device_exclude = config.get("Smart", "device_exclude")
temp_source.usb_filter = config.get("Smart", "usb_filter")
temp_source.get_block_devices()
temp_source.smart_workers = config.getint("Smart", "smart_workers")
return temp_source
if mode == "controller":
temp_source = Controller()
temp_source.megacli = config.get("Controller", "megacli")
temp_source.ports_per_controller = config.getint("Controller", \
"ports_per_controller")
return temp_source
print("Mode not set, check config.")
sys.exit(1)
def get_chassis_settings(config): """ Initialise chassis fan settings. """
chassis = FanControl()
chassis.pwm_min = config.getint("Chassis", "pwm_min")
chassis.pwm_max = config.getint("Chassis", "pwm_max")
chassis.pwm_safety = config.getint("Chassis", "pwm_safety")
chassis.fan_control_enable = config.get( "Chassis", "fan_control_enable")
chassis.fan_control_enable = chassis.fan_control_enable.split(",")
chassis.fan_control_device = config.get("Chassis", "fan_control_device")
chassis.fan_control_device = chassis.fan_control_device.split(",")
return chassis
def get_graphite_settings(config):
settings = {}
settings['enabled'] = int(config.get("Graphite","enabled"))
settings['host'] = str(config.get("Graphite","host"))
settings['port'] = int(config.get("Graphite","port"))
settings['prefix'] = str(config.get("Graphite","prefix"))
if settings['enabled'] == 1:
settings['enabled'] = True
else:
settings['enabled'] = False
return settings
def main(): """ Main function. Contains variables that can be tweaked to your needs. Please look at the class object to see which attributes you can set. The pid values are tuned to my particular system and may require ajustment for your system(s). """ config = read_config() graphite = get_graphite_settings(config) polling_interval = config.getfloat("General", "polling_interval")
chassis = get_chassis_settings(config)
pid = get_pid_settings(config)
temp_source = get_temp_source(config)
try:
while True:
highest_temperature = temp_source.get_highest_temperature()
fan_speed = pid.update(highest_temperature)
chassis.set_fan_speed(fan_speed)
log(highest_temperature, chassis, pid, graphite)
time.sleep(polling_interval)
except (KeyboardInterrupt, SystemExit):
chassis.set_pwm(chassis.pwm_safety)
sys.exit(1)
if name == "main": main()
Graphite module not installed. Data will not be logged to Graphite.
No temperature data retreived through SMART. Exiting. `
I'm sorry for how appalling the formatting is there. I can't figure out how to get the comment editor to format things the way the should be.
Thanks for the feedback, you were running the right code.
You should see something like this:
ansible@server:~sudo DEBUG=True ./storagefancontrol Device sda will be monitored. Device sdb will be monitored. Device sdc will be monitored. Device sdd will be monitored. Device sde will be monitored. Device sdf will be monitored. Device sdg will be monitored. Device sdh will be monitored. Device sdi will be monitored. Device sdj will be monitored. Valid devices: ['sda', 'sdb', 'sdc', 'sdd', 'sde', 'sdf', 'sdg', 'sdh', 'sdi', 'sdj'] Size of result is: 10 Temperature reading: 35 Temperature reading: 36 Temperature reading: 33 Temperature reading: 37 Temperature reading: 34 Temperature reading: 36 Temperature reading: 34 Temperature reading: 32 Temperature reading: 34 Temperature reading: 35 Temp: 37 | FAN: 28% | PWM: 120 | P=-9 | I=52 | D=-15 | Err=-3 | Size of result is: 10 Temperature reading: 35 Temperature reading: 36 Temperature reading: 33 Temperature reading: 37 Temperature reading: 34 Temperature reading: 36 Temperature reading: 34 Temperature reading: 32 Temperature reading: 34 Temperature reading: 35 Temp: 37 | FAN: 11% | PWM: 120 | P=-9 | I=50 | D=-30 | Err=-3 |
You don't have to copy/paste the code, just the output you get from running the command. Also, can you still check if any files are generated in /tmp/storagefancontrol ?
`root@openmediavault:/sbin# DEBUG=True ./storagefancontrol Graphite module not installed. Data will not be logged to Graphite. Valid devices: [] Size of result is: 0
No temperature data retreived through SMART. Exiting.
contents of /tmp after the above run....
total 4
drwxrwxrwt 10 root root 200 May 21 09:09 .
drwxrwxr-x 20 root root 4096 Apr 24 13:32 ..
drwxrwxrwt 2 root root 40 May 7 18:46 .font-unix
drwxrwxrwt 2 root root 40 May 7 18:46 .ICE-unix
drwx------ 2 root root 40 May 7 18:52 mc-root
drwx------ 3 root root 60 May 7 18:46 systemd-private-585ac1582ac2440abcf2352bf4d3d1d9-chrony.service-GcvXAQ
drwx------ 3 root root 60 May 7 18:46 systemd-private-585ac1582ac2440abcf2352bf4d3d1d9-systemd-resolved.service-FmWkmW
drwxrwxrwt 2 root root 40 May 7 18:46 .Test-unix
drwxrwxrwt 2 root root 40 May 7 18:46 .X11-unix
drwxrwxrwt 2 root root 40 May 7 18:46 .XIM-unix
`
Thank you, it seems that storagefancontrol can't actually find relevant devices in /sys/block for some reason. I've updated the code.
You should see the 'Raw block devices detected part:
ansible@server:~$ sudo DEBUG=True ./storagefancontrol Raw block devices detected: ['loop0', 'loop1', 'loop2', 'loop3', 'loop4', 'loop5', 'loop6', 'loop7', 'md11', 'md12', 'md13', 'md14', 'md15', 'md6', 'sda', 'sdb', 'sdc', 'sdd', 'sde', 'sdf', 'sdg', 'sdh', 'sdi', 'sdj']
Your issue doesn't seem to be an issue with parsing the smart data, but an issue with actually finding the devices themselves. Maybe I made some kind of mistake with the filtering.
they're all there
lrwxrwxrwx 1 root root 0 May 21 10:40 md0 -> ../devices/virtual/block/md0 lrwxrwxrwx 1 root root 0 May 21 10:40 sda -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:0/end_device-0:0/target0:0:0/0:0:0:0/block/sda lrwxrwxrwx 1 root root 0 May 21 10:40 sdb -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:1/end_device-0:1/target0:0:1/0:0:1:0/block/sdb lrwxrwxrwx 1 root root 0 May 21 10:40 sdc -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:2/end_device-0:2/target0:0:2/0:0:2:0/block/sdc lrwxrwxrwx 1 root root 0 May 21 10:40 sdd -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:3/end_device-0:3/target0:0:3/0:0:3:0/block/sdd lrwxrwxrwx 1 root root 0 May 21 10:40 sde -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:4/end_device-0:4/target0:0:4/0:0:4:0/block/sde lrwxrwxrwx 1 root root 0 May 21 10:40 sdf -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:5/end_device-0:5/target0:0:5/0:0:5:0/block/sdf lrwxrwxrwx 1 root root 0 May 21 10:40 sdg -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:6/end_device-0:6/target0:0:6/0:0:6:0/block/sdg lrwxrwxrwx 1 root root 0 May 21 10:40 sdh -> ../devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/port-0:7/end_device-0:7/target0:0:7/0:0:7:0/block/sdh lrwxrwxrwx 1 root root 0 May 21 10:40 sdi -> ../devices/pci0000:00/0000:00:1f.2/ata1/host1/target1:0:0/1:0:0:0/block/sdi lrwxrwxrwx 1 root root 0 May 21 10:40 sdj -> ../devices/pci0000:00/0000:00:01.1/0000:02:00.0/host7/port-7:0/end_device-7:0/target7:0:0/7:0:0:0/block/sdj lrwxrwxrwx 1 root root 0 May 21 10:40 sdk -> ../devices/pci0000:00/0000:00:01.1/0000:02:00.0/host7/port-7:1/end_device-7:1/target7:0:1/7:0:1:0/block/sdk lrwxrwxrwx 1 root root 0 May 21 10:40 sdl -> ../devices/pci0000:00/0000:00:01.1/0000:02:00.0/host7/port-7:2/end_device-7:2/target7:0:2/7:0:2:0/block/sdl lrwxrwxrwx 1 root root 0 May 21 10:40 sdm -> ../devices/pci0000:00/0000:00:01.1/0000:02:00.0/host7/port-7:3/end_device-7:3/target7:0:3/7:0:3:0/block/sdm lrwxrwxrwx 1 root root 0 May 21 10:40 sdo -> ../devices/pci0000:00/0000:00:01.1/0000:02:00.0/host7/port-7:5/end_device-7:5/target7:0:5/7:0:5:0/block/sdo lrwxrwxrwx 1 root root 0 May 21 10:40 sdp -> ../devices/pci0000:00/0000:00:01.1/0000:02:00.0/host7/port-7:6/end_device-7:6/target7:0:6/7:0:6:0/block/sdp lrwxrwxrwx 1 root root 0 May 21 10:40 sdq -> ../devices/pci0000:00/0000:00:01.1/0000:02:00.0/host7/port-7:7/end_device-7:7/target7:0:7/7:0:7:0/block/sdq lrwxrwxrwx 1 root root 0 May 21 10:40 sdr -> ../devices/pci0000:00/0000:00:06.0/0000:03:00.0/host8/port-8:0/end_device-8:0/target8:0:0/8:0:0:0/block/sdr lrwxrwxrwx 1 root root 0 May 21 10:40 sds -> ../devices/pci0000:00/0000:00:06.0/0000:03:00.0/host8/port-8:1/end_device-8:1/target8:0:1/8:0:1:0/block/sds lrwxrwxrwx 1 root root 0 May 21 10:40 sdt -> ../devices/pci0000:00/0000:00:06.0/0000:03:00.0/host8/port-8:2/end_device-8:2/target8:0:2/8:0:2:0/block/sdt lrwxrwxrwx 1 root root 0 May 21 10:40 sdu -> ../devices/pci0000:00/0000:00:06.0/0000:03:00.0/host8/port-8:3/end_device-8:3/target8:0:3/8:0:3:0/block/sdu lrwxrwxrwx 1 root root 0 May 21 10:40 sdv -> ../devices/pci0000:00/0000:00:06.0/0000:03:00.0/host8/port-8:4/end_device-8:4/target8:0:4/8:0:4:0/block/sdv lrwxrwxrwx 1 root root 0 May 21 10:40 sdw -> ../devices/pci0000:00/0000:00:06.0/0000:03:00.0/host8/port-8:5/end_device-8:5/target8:0:5/8:0:5:0/block/sdw lrwxrwxrwx 1 root root 0 May 21 10:40 sdx -> ../devices/pci0000:00/0000:00:06.0/0000:03:00.0/host8/port-8:6/end_device-8:6/target8:0:6/8:0:6:0/block/sdx lrwxrwxrwx 1 root root 0 May 21 10:40 sdy -> ../devices/pci0000:00/0000:00:06.0/0000:03:00.0/host8/port-8:7/end_device-8:7/target8:0:7/8:0:7:0/block/sdy lrwxrwxrwx 1 root root 0 May 21 10:40 sdz -> ../devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.4/1-1.4:1.0/host9/target9:0:0/9:0:0:0/block/sdz
ahhh now we're getting somewhere. NICE!
Graphite module not installed. Data will not be logged to Graphite. Raw block devices detected: ['md0', 'sda', 'sdb', 'sdc', 'sdd', 'sde', 'sdf', 'sdg', 'sdh', 'sdi', 'sdj', 'sdk', 'sdl', 'sdm', 'sdo', 'sdp', 'sdq', 'sdr', 'sds', 'sdt', 'sdu', 'sdv', 'sdw', 'sdx', 'sdy', 'sdz'] Valid devices: [] Size of result is: 0
This is on Debian 10 (Buster )
I think I've found the cause thanks to your last output and the contents of your /etc/storagefancontrol file.
If you edit your /etc/storagefancontrol: this is the problem:
device_filter = "sd"
I can reproduce your problem exactly when the line reads like this.
It should read:
device_filter = sd
So no double quotes. I think I should do something to handle this case.
Could you try and test this fix with both the 'testing' and 'master' branch?
master branch after removing "sd"
and replacing with sd
.......
Graphite module not installed. Data will not be logged to Graphite. multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker result = (True, func(*args, *kwds)) File "/usr/lib/python3.7/multiprocessing/pool.py", line 44, in mapstar return list(map(args)) File "./storagefancontrol", line 207, in get_temperature smart_data = self.get_smart_data(device).decode('utf-8') AttributeError: 'str' object has no attribute 'decode' """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "./storagefancontrol", line 558, in
and the same treatment applied to the testing branch: Graphite module not installed. Data will not be logged to Graphite. Raw block devices detected: ['md0', 'sda', 'sdb', 'sdc', 'sdd', 'sde', 'sdf', 'sdg', 'sdh', 'sdi', 'sdj', 'sdk', 'sdl', 'sdm', 'sdo', 'sdp', 'sdq', 'sdr', 'sds', 'sdt', 'sdu', 'sdv', 'sdw', 'sdx', 'sdy', 'sdz'] Device sda will be monitored. Device sdb will be monitored. Device sdc is excluded. Device sdd will be monitored. Device sde will be monitored. Device sdf will be monitored. Device sdg will be monitored. Device sdh will be monitored. Device sdi is excluded. Device sdj will be monitored. Device sdk will be monitored. Device sdl will be monitored. Device sdm will be monitored. Device sdo will be monitored. Device sdp will be monitored. Device sdq will be monitored. Device sdr will be monitored. Device sds will be monitored. Device sdt will be monitored. Device sdu will be monitored. Device sdv will be monitored. Device sdw will be monitored. Device sdx will be monitored. Device sdy will be monitored. Device sdz is excluded. Valid devices: ['sda', 'sdb', 'sdd', 'sde', 'sdf', 'sdg', 'sdh', 'sdj', 'sdk', 'sdl', 'sdm', 'sdo', 'sdp', 'sdq', 'sdr', 'sds', 'sdt', 'sdu', 'sdv', 'sdw', 'sdx', 'sdy'] Size of result is: 22 Temperature reading: -10 Temperature reading: 32 Temperature reading: 32 Temperature reading: 33 Temperature reading: -10 Temperature reading: 31 Temperature reading: 32 Temperature reading: 32 Temperature reading: 33 Temperature reading: 32 Temperature reading: 31 Temperature reading: 31 Temperature reading: 33 Temperature reading: 37 Temperature reading: 31 Temperature reading: 32 Temperature reading: 31 Temperature reading: 31 Temperature reading: 30 Temperature reading: 30 Temperature reading: 29 Temperature reading: 29
Hi, do I understand correctly that the testing branch is now working OK for you and that the master branch is still failing?
i just did a re-pull of both branches. both master and testing are working correctly. EXCELLENT WORK, SIR!
i'm curious about something though. i'm concerned about what happens with the device exclude list upon a reboot. the disks could potentially get another /dev/sdX name. i can see the potential for a condition in which a device that i don't not want to be excluded, becomes excluded.
Good to hear! Thanks for sticking with me.
You raised a good point. If I would parse the smart data in a better, more reliable manner, you probably don’t need to exclude the ‘special’ storage devices. I need a failsafe option. So that’s something I still need to fix.
I do wonder if you still get errors if you remove the drives from the exclude list.
Btw: The quotes around the “sd” filter was present in my example config file so this is my mistake.
On 22 May 2020, at 22:04, cjohnsonj notifications@github.com wrote:
i just did a re-pull of both branches. both master and testing are working correctly. EXCELLENT WORK, SIR!
i'm curious about something though. what happens with the device exclude list upon a reboot. the disks could get another /dev/sdX name. i can see the potential for a condition in which a device that i don't not want to be excluded, becomes excluded.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
I don't get any errors if the exclude list is empty. It starts up and runs just fine. One thing that I was wondering is, I'm not certain exactly how you're pushing messages into syslog but on this server I've created /etc/rsyslog.d/00-sfc.conf with the following in it....
:syslogtag,contains,"SFC:" /var/log/fancontrol.log & stop
No matter what I cannot seem to get the program to stop logging to syslog and /var/log/fancontrol.log concurrently. I only want the messages from your program to go to /var/log/fancontrol.log exclusively. Am I doing something wrong with rsyslog or is there something else going on here?
Thanks for the feedback. That means that this issue is now fixed.
I've tested the code below (I used a different file name but contents are identical) and it works exactly as you would want. (Messages appear only in /var/log/fancontrol.log not in syslog)
root@server:/etc/rsyslog.d# cat 40-sfc.conf
:syslogtag,contains,"SFC:" /var/log/fancontrol.log
& stop
You could also try and use &~
https://serverfault.com/questions/798098/rsyslog-log-some-messages-only-to-specific-file
Does this work for you.
Debug output:
Apr 28 17:04:14 vault SFC: **Temp: 0** | FAN: -429% | PWM: 20 | P=-74 | I=15 | D=-370 | Err=-37|
My config file....
[General] polling_interval = 10 target_temperature = 37 mode = smart [Pid] P = 2 I = 1 D = 5 D_amplification = 0 I_start = 30 I_max = 100 I_min = 15 [Chassis] fan_control_device = /sys/class/hwmon/hwmon4/pwm2 fan_control_enable = /sys/class/hwmon/hwmon4/pwm2_enable pwm_min = 20 pwm_max = 255 pwm_safety = 50 [Smart] smart_workers = 26 [Controller] megacli = /opt/MegaRAID/MegaCli/megacli ports_per_controller = 8 [Graphite] enabled = 0 host = target.graphite.com port = 2003 prefix = storagefancontrol
smartctl output