sensu-plugins / sensu-plugins-disk-checks

This plugin provides native disk instrumentation for monitoring and metrics collection, including: health, usage, and various metrics.
http://sensu-plugins.io
MIT License
27 stars 63 forks source link

check-smart-status does not use overrides when --device is used #110

Closed gg3nx closed 5 years ago

gg3nx commented 5 years ago

Got this issue while trying to check the status of physical disks in a RAID array on a 3ware controller.

JSON file:

{
  "smart": {
    "attributes": [
      { "id": 1, "name": "Raw_read_Error_Rate", "read": "left16bit" },
      { "id": 5, "name": "Reallocated_Sector_Ct" },
      { "id": 10 , "name": "Spin_Retry_Count" },
      { "id": 184, "name": "End-to-End_Error" },
      { "id": 187, "name": "Reported_Uncorrect" },
      { "id": 188, "name": "Command_Timeout" },
      { "id": 193, "name": "Load_Cycle_Count", "warn_max": 300000, "crit_max": 600000 },
      { "id": 194, "name": "Temperature_Celsius", "read": "right16bit", "crit_min": 20, "warn_min": 10, "warn_max": 40, "crit_max": 50 },
      { "id": 196, "name": "Reallocated_Event_Count" },
      { "id": 197, "name": "Current_Pending_Sector" },
      { "id": 198, "name": "Offline_Uncorrectable" },
      { "id": 199, "name": "UDMA_CRC_Error_Count" },
      { "id": 201, "name": "Unc_Soft_read_Err_Rate", "read": "left16bit" },
      { "id": 230, "name": "Life_Curve_Status", "crit_min": 100, "warn_min": 100, "warn_max": 100, "crit_max": 100 }
    ]
  },
  "hardware": {
    "devices": [
      { "path": "3ware0", "override": "/dev/twl0 -d 3ware,0" },
      { "path": "3ware1", "override": "/dev/twl0 -d 3ware,1" },
      { "path": "3ware2", "override": "/dev/twl0 -d 3ware,2" },
      { "path": "3ware3", "override": "/dev/twl0 -d 3ware,3" }
    ]
  }
}

RAID volume is /dev/sda.

Output of lsblk -nro NAME,TYPE:

sda disk
sda1 part
sda2 part
sda5 part

When check-smart-status.rb uses lsblk to find devices, it does not find any override for sda. When using --device 3ware0,3ware1,3ware2,3ware3, overrides from the json file are not used.

As a quick fix, I modified find_devices to check for overrides when --device option is used and it seems to work:

./check-smart-status.rb -j /etc/sensu-plugins-disk-checks-smart.json  --debug on --device 3ware0,3ware1,3ware2,3ware3
smartctl -H -A -v 1,raw48 -v 5,raw48 -v 10,raw48 -v 184,raw48 -v 187,raw48 -v 188,raw48 -v 193,raw48 -v 194,raw48 -v 196,raw48 -v 197,raw48 -v 198,raw48 -v 199,raw48 -v 201,raw48 -v 230,raw48 /dev/twl0 -d 3ware,0

smartctl -H -A -v 1,raw48 -v 5,raw48 -v 10,raw48 -v 184,raw48 -v 187,raw48 -v 188,raw48 -v 193,raw48 -v 194,raw48 -v 196,raw48 -v 197,raw48 -v 198,raw48 -v 199,raw48 -v 201,raw48 -v 230,raw48 /dev/twl0 -d 3ware,1

smartctl -H -A -v 1,raw48 -v 5,raw48 -v 10,raw48 -v 184,raw48 -v 187,raw48 -v 188,raw48 -v 193,raw48 -v 194,raw48 -v 196,raw48 -v 197,raw48 -v 198,raw48 -v 199,raw48 -v 201,raw48 -v 230,raw48 /dev/twl0 -d 3ware,2

smartctl -H -A -v 1,raw48 -v 5,raw48 -v 10,raw48 -v 184,raw48 -v 187,raw48 -v 188,raw48 -v 193,raw48 -v 194,raw48 -v 196,raw48 -v 197,raw48 -v 198,raw48 -v 199,raw48 -v 201,raw48 -v 230,raw48 /dev/twl0 -d 3ware,3

SmartCheckStatus CRITICAL: Overall health check failed on /dev/twl0 -d 3ware,3