omega8cc / boa

Barracuda Octopus Aegir 5.4.0
https://omega8.cc/compare
394 stars 75 forks source link

Include smartmontools package #876

Open macmladen opened 8 years ago

macmladen commented 8 years ago

I recently had a drive failure, fortunately I was on RAID 1 so nothing got lost.

However, I discovered that smartctl was not installed and I think that is very useful, moreover it could be easily used in cron to report on drive failure.

I suggest adding it to standard install packages list and recommend to add it into some frequent (hourly?) test procedure.

pricejn2 commented 8 years ago

The package can be added for any BOA installation with .barracuda.cnf via _EXTRA_PACKAGES="smartmontools". This could be easily justified with a standard install package as well though.

As far as adding a regular test procedure, that could get problematic as some servers may have RAID controller that add complexity and variability to smartctl commands (I'm thinking of MegaRAID in particular where a command looks more like smartctl -d megaraid,13 -a /dev/sda)

The omega8cc folks may have more thoughts here.

omega8cc commented 8 years ago

@macmladen Thank you for the suggestion, it is a good idea, however, as @pricejn2 pointed out, it may not work on all systems, especially within VM instance, depending on the master system and hardware configuration and restrictions. For example, it will not work on Linux VServer based VM instances, it will not work on HP machines with Adaptec RAID cards, even outside of VM, etc.

That said, smartmontools provides smartd daemon, which comes with DEVICESCAN directive, which is able to detect supported hardware / drives, so you don't need to guess how to run smartctl to make it work.

We could simply attempt to run smartd and watch for the output. If it fails, we know that we can't use it -- it will fail to start with message: "Unable to monitor any SMART enabled devices".

It comes with its own alerting, so we would need to just make sure it sends its messages to the correct email and not root, etc.

We would need to build it from sources, though, because the versions included in packages are often a few years old and can't detect drives and RAID configurations used currently.

macmladen commented 8 years ago

I found out that mdamd (soft RAID tool) was able to notify me but it wasn't configured to do so and failure mail was left on local host.

Just a side note for those who are using the soft RAID.