jhoblitt / puppet-smartd

Manages the smartmontools package including the smartd daemon
Other
14 stars 24 forks source link

Auto-generating the devices array programmatically #13

Closed solarkennedy closed 10 years ago

solarkennedy commented 10 years ago

What do you think of this?

https://gist.github.com/solarkennedy/7606943

I think the template approach is pretty good, but I find it difficult to read, and as you said, it is arguably a bug that it is not disable-able. But could be?

With this function:

class { 'smartd':
  devices     => hiera('smartd::devices', smartd_guess() ),
  devicescan  => hiera('smartd::devicescan', false),
}

Obviously this doesn't account for raid controllers yet, but it does skip the right things and disables devicescan so that smartd can actually run on mixed environments. (wouldn't it be nice if DEVICESCAN would gracefully skip things it didn't understand?)

I also like how this function can increment smart scan hours, this is probably also possible with the erb.

I don't know how to best reconcile different other behaviors, like I suppose there could be arguments to this function that alter the scan pattern and behavior, but then it gets all pretty confusing....

In the end, I don't want to have to hand-configure anything. I want puppet to do its best to do this for me.

Either way, more facts are needed to do raid controllers. I can volunteer to write them for others (I feel like I have every kind over here), do you want me to follow the megaraid fact patterns that you have laid out?

And, do you think this function based approach has merit? Should I focus my effort into the template method instead?

jhoblitt commented 10 years ago

In the puppet DSL, functions are both parse order dependant and executed at catalog compile time. That means in a master-agent setup, the function runs [only] on the puppet master. Are you running a masterless setup? If not, this would need to be implemented as either facts or type/provider that manages the smartd.conf file.

I think it would help to have a clearly defined problem. Could you say a little more about how DEVICESCAN isn't working for you? Is it finding devices that shouldn't be probed or missing devices that should be?

The facts in this module need TLC and tests. I have a branch where I started this somewhere but I never finished it. I certainly wouldn't cry if someone beat me too it.

solarkennedy commented 10 years ago

Yea the function works fine in masterless and master mode. I'm not sure why it wouldn't work? All the facts are available at the time it runs, that is all it needs.

Yes. My clearly defined problem is that DEVICESCAN sucks :) (although maybe I'm doing it wrong?) Take the simple example of a server with a raid card on /dev/sda. As-is, smartd will not start. It doesn't know how to figure out how many drives are behind it, etc. (although right now I can't even tell smartd to ignore it as my version doesn't support -d ignore....)

Either this function or the template can solve this problem, to the point where we don't need devicescan.

Either solution will require more facts to determine the disks behind raid controllers. Are there other ways of doing this? How are other people monitoring the smart health of disks behind raid controllers if DEVICESCAN can't do it? (without configuring smartd by hand.)

jhoblitt commented 10 years ago

I misunderstood your intent with a function, I thought you were proposing to probe for raid controllers that way. If it only operates on facts, a function could work. However, there's no need to use a function to operate on purely facts.

smartd can only probe block devices that support smart commands. Since virtually all raid controller drivers don't support smart commands, and there's no standard way to discover devices behind them, there's no way for smartd/DEVICESCAN to auto discover them. Also keep in mind that not every disk in the world is behind a raid controller. That's arguable a more important use case than for raid controller connected devices as most raid controllers provide some form of management software that can do fault notification.

This module already supports automatic discovery of devices behind LSI megaraid controllers via facts. Here's an example of smartd.conf generated by this module on one of my servers. Adding support for additional raid controllers should be relatively easy and straight forward. I am willing to merge facts for other controllers into this module but I think it makes more sense for facts that require a software utility to live in a module that provides that utility. I am planning to eventually more the megaraid facts out of this module and into my megaraid module, which is needed to provide MegaCli.

# Managed by Puppet -- do not edit!
DEFAULT -m root -M daily
/dev/sdc -d sat+megaraid,116 -I 194 -I 199
/dev/sdc -d sat+megaraid,117 -I 194 -I 199
/dev/sdc -d sat+megaraid,120 -I 194 -I 199
/dev/sdc -d sat+megaraid,121 -I 194 -I 199
/dev/sdc -d sat+megaraid,122 -I 194 -I 199
/dev/sdc -d sat+megaraid,123 -I 194 -I 199
/dev/sdc -d sat+megaraid,124 -I 194 -I 199
/dev/sdc -d sat+megaraid,125 -I 194 -I 199
/dev/sdc -d sat+megaraid,126 -I 194 -I 199
/dev/sdc -d sat+megaraid,127 -I 194 -I 199
/dev/sdc -d sat+megaraid,128 -I 194 -I 199
/dev/sdc -d sat+megaraid,129 -I 194 -I 199
/dev/sdc -d sat+megaraid,131 -I 194 -I 199
/dev/sdc -d sat+megaraid,132 -I 194 -I 199
/dev/sdc -d sat+megaraid,133 -I 194 -I 199
/dev/sdc -d sat+megaraid,134 -I 194 -I 199
/dev/sdc -d sat+megaraid,135 -I 194 -I 199
/dev/sdc -d sat+megaraid,136 -I 194 -I 199
/dev/sdc -d sat+megaraid,137 -I 194 -I 199
/dev/sdc -d sat+megaraid,138 -I 194 -I 199
/dev/sdc -d sat+megaraid,139 -I 194 -I 199
/dev/sdc -d sat+megaraid,140 -I 194 -I 199
/dev/sdc -d sat+megaraid,141 -I 194 -I 199
/dev/sdc -d sat+megaraid,142 -I 194 -I 199
/dev/sdc -d sat+megaraid,143 -I 194 -I 199
/dev/sdc -d sat+megaraid,144 -I 194 -I 199
/dev/sdc -d sat+megaraid,145 -I 194 -I 199
/dev/sdc -d sat+megaraid,146 -I 194 -I 199
/dev/sdc -d sat+megaraid,147 -I 194 -I 199
/dev/sdc -d sat+megaraid,148 -I 194 -I 199
/dev/sdc -d sat+megaraid,149 -I 194 -I 199
/dev/sdc -d sat+megaraid,150 -I 194 -I 199
/dev/sdc -d sat+megaraid,151 -I 194 -I 199
/dev/sdc -d sat+megaraid,152 -I 194 -I 199
/dev/sdc -d sat+megaraid,153 -I 194 -I 199
/dev/sdc -d sat+megaraid,154 -I 194 -I 199
/dev/sdc -d sat+megaraid,155 -I 194 -I 199
/dev/sdc -d sat+megaraid,156 -I 194 -I 199
/dev/sdc -d sat+megaraid,157 -I 194 -I 199
/dev/sdc -d sat+megaraid,158 -I 194 -I 199
/dev/sdc -d sat+megaraid,159 -I 194 -I 199
/dev/sdc -d sat+megaraid,160 -I 194 -I 199
/dev/sdc -d sat+megaraid,161 -I 194 -I 199
/dev/sdc -d sat+megaraid,162 -I 194 -I 199
/dev/sdc -d sat+megaraid,163 -I 194 -I 199
/dev/sdc -d sat+megaraid,164 -I 194 -I 199
/dev/sdc -d sat+megaraid,165 -I 194 -I 199
/dev/sdc -d sat+megaraid,166 -I 194 -I 199
/dev/sdc -d sat+megaraid,167 -I 194 -I 199
/dev/sdc -d sat+megaraid,168 -I 194 -I 199
/dev/sdc -d sat+megaraid,169 -I 194 -I 199
/dev/sdc -d sat+megaraid,170 -I 194 -I 199
/dev/sdc -d sat+megaraid,171 -I 194 -I 199
/dev/sdc -d sat+megaraid,172 -I 194 -I 199
/dev/sdc -d sat+megaraid,173 -I 194 -I 199
/dev/sdc -d sat+megaraid,174 -I 194 -I 199
/dev/sdc -d sat+megaraid,175 -I 194 -I 199
/dev/sdc -d sat+megaraid,176 -I 194 -I 199
/dev/sdc -d sat+megaraid,177 -I 194 -I 199
/dev/sdc -d sat+megaraid,178 -I 194 -I 199
/dev/sdc -d sat+megaraid,179 -I 194 -I 199
/dev/sdc -d sat+megaraid,180 -I 194 -I 199
/dev/sdc -d sat+megaraid,181 -I 194 -I 199
/dev/sdc -d sat+megaraid,182 -I 194 -I 199
/dev/sdc -d sat+megaraid,183 -I 194 -I 199
/dev/sdc -d sat+megaraid,184 -I 194 -I 199
/dev/sdc -d sat+megaraid,185 -I 194 -I 199
/dev/sdc -d sat+megaraid,186 -I 194 -I 199
DEVICESCAN
solarkennedy commented 10 years ago

Yes. The megaraid stuff is good. I'll see what I can do with other raid cards as I get to them.

I guess I was mostly using this function so I could ignore drives that were not autodetected yet behind raid controllers. If I don't ignore them, smartd will not start. :( That is a blocker for me.

I'll work towards the template approach instead, and build other modules or use existing ones for more raid facts.

What do you think about the ability to stagger the scheduled smart checks? And with the megaraid, is even possible to schedule smart checks on them? Is that a feature you want? (or does smart just use the default scan if specified?)

jhoblitt commented 10 years ago

I've never seen smartd exit in the maner your describing. I just engineered a setup on a box with only blockdevices on a raid controller (+sr0) and nothing but DEVICESCAN in the smartd.conf and can't reproduce that behavior. What platform are you on and what version of smartmontools are you using? I wonder if you're observing a bug.

# lsscsi
[0:2:0:0]    disk    LSI      MR9261-8i        2.70  /dev/sda 
[0:2:1:0]    disk    LSI      MR9261-8i        2.70  /dev/sdb 
[4:0:0:0]    cd/dvd  Optiarc  DVD RW AD-7710H  1.01  /dev/sr0 
smartd 5.42 2011-10-20 r3458 [x86_64-linux-2.6.32-220.17.1.el6.x86_64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Opened configuration file /etc/smartd.conf 
Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Device: /dev/sda, opened
Device: /dev/sda, [LSI MR9261-8i 2.70], lu id: 0x600605b0032f5580166fb3ff10227465, 137 GB
Device: /dev/sda, IE (SMART) not enabled, skip device 
Try 'smartctl -s on /dev/sda' to turn on SMART features 
Device: /dev/sdb, opened
Device: /dev/sdb, [LSI MR9261-8i 2.70], lu id: 0x600605b0032f5580166fb3ff10233557, 1.65 TB
Device: /dev/sdb, IE (SMART) not enabled, skip device 
Try 'smartctl -s on /dev/sdb' to turn on SMART features 
Monitoring 0 ATA and 0 SCSI devices
smartd has fork()ed into background mode. New PID=8874. 
jhoblitt commented 10 years ago

RE: staggering smart checks. What's the use case? You can set the schedule per device entry, see the examples in smartd.conf(5).

solarkennedy commented 10 years ago

I definitely have servers that make smartd just die. I'll get some more output on monday.

I could set the schedule per device myself yes, but I would rather puppet do it. Seems like staggering the checks is a good idea, but I could be just paranoid.

jhoblitt commented 10 years ago

@solarkennedy I've observed with older versions of smartmontools exiting when no devices found support smart commands. I'm going to to close this ticket out on the assumption that's what your seeing but please feel free to reopen it with new information.