librenms / librenms-agent

LibreNMS Agent & Scripts
GNU General Public License v2.0
117 stars 188 forks source link

Replaced mdadm script with a newer, more flexible version #401

Closed Trae32566 closed 2 years ago

Trae32566 commented 2 years ago

This updates the mdadm SNMP script to be cleaner and more robust (it was failing prior with specific RAID configurations). Here is a random set of assorted RAID 0s:

[root@stor01a-rh8 ~]# /etc/snmp/mdadm 
{"data":[{"name":"md124","level":"raid0","size":"18003119308800","disc_count":"3","hotspare_count":"0","device_list":["sdc","sdd","sde"],"missing_devices_list":[],"state":"clean","action":"0","degraded":"0","sync_speed":"0","sync_completed":"0"},{"name":"md125","level":"raid0","size":"12002079539200","disc_count":"2","hotspare_count":"0","device_list":["sda","sdb"],"missing_devices_list":[],"state":"clean","action":"0","degraded":"0","sync_speed":"0","sync_completed":"0"},{"name":"md126","level":"raid0","size":"1860725374976","disc_count":"2","hotspare_count":"0","device_list":["nvme0n1p4","nvme1n1p2"],"missing_devices_list":[],"state":"clean","action":"0","degraded":"0","sync_speed":"0","sync_completed":"0"},{"name":"md127","level":"raid0","size":"137571074048","disc_count":"2","hotspare_count":"0","device_list":["nvme0n1p3","nvme1n1p1"],"missing_devices_list":[],"state":"clean","action":"0","degraded":"0","sync_speed":"0","sync_completed":"0"}],"error":"0","errorString":"","version":"1"}

And a separate RAID 1 (and with --debug):

[root@kvm01-rh8 ~]# /etc/snmp/mdadm  --debug
{"data":[{"name":"md127","level":"raid1","size":"2000398778368","disc_count":"2","hotspare_count":"0","device_list":["nvme0n1","nvme1n1"],"missing_devices_list":[],"state":"clean","action":"idle","degraded":"0","sync_speed":"0","sync_completed":"none"}],"error":"0","errorString":"","version":"1"}
{ "data": [{
        "name": "md127",
        "level": "raid1",
        "size": "2000398778368",
        "disc_count": "2",
        "hotspare_count": "0",
        "device_list": [
            "nvme0n1",
            "nvme1n1"
        ],
        "missing_devices_list": [

        ],
        "state": "clean",
        "action": "idle",
        "degraded": "0",
        "sync_speed": "0",
        "sync_completed": "none"
    }],
    "error": "0",
    "errorString": "",
    "version": "1"
}
murrant commented 2 years ago

Seems reasonable, but we have no way of testing. Might be worth bumping the version on the script to 2.

murrant commented 2 years ago

I guess I can test at least one thing :) with no md, running the script does this when it should output an error string instead.

cat: '/sys/block/md[[:digit:]]*/md/level': No such file or directory
cat: '/sys/block/md[[:digit:]]*/size': No such file or directory
./test: line 33: ( * 1024) / 2: syntax error: operand expected (error token is "* 1024) / 2")
{"data":[],"error":"0","errorString":null,"version":"1"}
Trae32566 commented 2 years ago

I'll take care of that and the version, I was looking at adding some error handling anyway since there is none currently. Thanks!

Trae32566 commented 2 years ago

@murrant I believe this should be fixed now along with error handling and more strict json syntax verification, though I've had to add a dependency on jq to properly generate json. From an arbiter storage node (no md storage):

[root@stor01c-rh8 ~]# ./mdadm
{"data":[],"error":2,"errorString":"mdadm array not found!","version":"2.0.0"}

From a storage node with no jq:

[root@stor01b-rh8 ~]# ./mdadm
{"data":[],"error":1,"errorString":"jq_missing!","version":"2.0.0"}

Working storage node:

[root@stor01a-rh8 snmp]# ./mdadm
{"data":[{"name":"md124","level":"raid0","size":12002079539200,"disc_count":2,"hotspare_count":0,"device_list":["sdc","sde"],"missing_devices_list":[],"state":"clean","action":"0","degraded":0,"sync_speed":0,"sync_completed":0},{"name":"md125","level":"raid0","size":18003119308800,"disc_count":3,"hotspare_count":0,"device_list":["sda","sdb","sdd"],"missing_devices_list":[],"state":"clean","action":"0","degraded":0,"sync_speed":0,"sync_completed":0},{"name":"md126","level":"raid0","size":1860725374976,"disc_count":2,"hotspare_count":0,"device_list":["nvme0n1p4","nvme1n1p2"],"missing_devices_list":[],"state":"clean","action":"0","degraded":0,"sync_speed":0,"sync_completed":0},{"name":"md127","level":"raid0","size":137571074048,"disc_count":2,"hotspare_count":0,"device_list":["nvme0n1p3","nvme1n1p1"],"missing_devices_list":[],"state":"clean","action":"0","degraded":0,"sync_speed":0,"sync_completed":0}],"error":0,"errorString":"","version":"2.0.0"}      

Working kvm host:

[root@kvm02-rh8 ~]# /etc/snmp/mdadm
{"data":[{"name":"md127","level":"raid1","size":2000398778368,"disc_count":2,"hotspare_count":0,"device_list":["nvme0n1","nvme1n1"],"missing_devices_list":[],"state":"clean","action":"idle","degraded":0,"sync_speed":0,"sync_completed":0}],"error":0,"errorString":"","version":"2.0.0"}
murrant commented 2 years ago

not to sure about the jq dependency, I don't think it is installed by default. You should at least update the docs if that is the case.

SourceDoctor commented 2 years ago

nice rewrite, nevertheless, a little fix above "sync_complete":

https://github.com/librenms/librenms-agent/pull/409 ;)