netdata / netdata

Architected for speed. Automated for easy. Monitoring and troubleshooting, transformed!
https://www.netdata.cloud
GNU General Public License v3.0
70.84k stars 5.87k forks source link

[Feat]: support for Broadcom (ex LSI, ex Avago) RAID/HBA controllers (storcli) #14097

Closed k0ste closed 5 months ago

k0ste commented 1 year ago

Problem

Currently, netdata support only MegaCli

Description

MegaCli seems outdated, last release that I can find from 2014... Current tool for all Broadcom RAID/HBA is storcli

Importance

must have

Value proposition

  1. User have installed LSI/Broadcom HBA or RAID controller
  2. User have monitoring of controller card

Proposed implementation

./storcli64 show nolog - for get controllers ./storcli64 /c0 show all nolog - for get all controller info, easy to parse: drives, enclosures, temperature

The output for various vendor controllers: PERCH330Adapter.log PERCH330Mini.log QS3216.log ROMB-QS-3516B.log

ilyam8 commented 1 year ago

Hi, @k0ste.

I did a quick googling:

More questions:

k0ste commented 1 year ago

Does it support JSON formatted output? If yes, then it really easy to parse.

Ha-ha. It's enterprise! 😈

Needed permissions?

Yes.

Does x stand for a controller number? So show nolog is needed to get all controllers (IDs).

Yes, and the first controller is zero, in our internal firmware upgrade tool the function look like this:

function get_device_count() {
  local device_count

  if ! device_count="$(${VENDOR_BIN} show | \
    gawk 'match ($0, /^(Number of Controllers\s=\s)([0-9])$/, a) {print a[2]}')"
    then
      echo -e "${YELLOW}Failed to get ${VENDOR} controllers count${RESET}"
      exit 1
  else
      echo "${device_count}"
  fi
}

function outdated_firmware() {
  local device_count
  local device
  local device_model
  device_count="$(get_device_count)"

  if [[ "${device_count}" -gt 0 ]]
    then
      DEVICES=()
      # lower device count, because storcli device id starts from zero
      for device in $(seq 0 "$(( device_count - 1))")
        do
          device_model="$(get_device_model "${device}")"
          if check_device_firmware_version "${device}" "${device_model}"
            then
              DEVICES+=("${device}")
          fi
        done
    else
      echo -e "No ${VENDOR} controllers found"
      exit 0
  fi
}

This seems to be the official doc. It is from 2013 and I couldn't find (CTRL-F) anything about what nolog is.

By default, without nolog, storcli will print all queries to storcli.log:

Note:
     1. Use 'page[=x]'as the last option in all the commands to set the page break.
        X=lines per page. E.g. 'storcli help page=10'
     2. Use 'nolog' option to disable debug logging. E.g. 'storcli show nolog'
k0ste commented 1 year ago

OMG!!! In libstorage I found that possible to set J flag for JSON formatting 🤩

[root@host]# ./storcli64 /c0 show all nolog J | jq '.Controllers[]."Response Data".HwCfg."Ctrl temperature(Degree Celsius)"'
45

🔝

k0ste commented 1 year ago

The data of JSON output show.json.txt c0_show_all.json.txt

ilyam8 commented 5 months ago

Hey, @k0ste. I plan to implement this. Can you please share

storcli64 /cALL show all J nolog

and for MegaRAID controllers

storcli64 /cALL/eALL/sALL show all J nolog

I see node-exporter-textfile-collector-scripts/storcli.py uses these 2 commands (1, 2).

k0ste commented 5 months ago

Hey, @k0ste. I plan to implement this. Can you please share

It seems that @bocmanpy is already working on this as part of our contribution to netdata

ilyam8 commented 5 months ago

I recently moved adaptec (arrconf) and megacli to go.

A user on Discord shared the output of those 2 commands with me. I will add an initial version tomorrow and you guys can improve/update it (PRs) when you have time.

ilyam8 commented 5 months ago

Added initial version in #17454. I will add more metrics and alarms by Monday.