ncsa / xcat-tools

Useful tools for xCAT
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

SVCPLAN-5154: Add a firmware version check for Dell nodes #61

Closed inwho closed 4 months ago

inwho commented 5 months ago

https://jira.ncsa.illinois.edu/browse/SVCPLAN-5154

Adding a bash script that takes (or prompts for) a node range, and uses racadm & dmidecode to output the node's BIOS version, iDRAC version and hardware model.

Script has been tested on mg-adm01, some sample output below (named dellcheck to differentiate as a draft copy)

[root@mg-adm01 scripts]# ./dellcheck.sh 
Please enter a node range: test
Running scan on noderange: test
       Nodename            Bios            iDRAC               Model                
__________________________________________________________________________
      mgportal2            2.18.1          2.85.85.85          R730
          mgrs2            2.14.1          7.10.30.00          R7515
       mgsched2            2.14.1          7.10.30.00          R7515
       mgtest01            2.14.1          7.10.30.00          R7525
       mgtest02            2.14.1          7.10.30.00          R7515
       mgtest03            2.14.1          7.10.30.00          R7515
[root@mg-adm01 scripts]# ./dellcheck.sh slurmd
Running scan on noderange: slurmd 
       Nodename            Bios            iDRAC               Model                
__________________________________________________________________________
          mg001            2.21.0          7.00.00.171         C6420
          mg002            2.21.0          7.00.00.171         C6420
          mg003            2.21.0          7.00.00.171         C6420
          mg004            2.21.0          7.00.00.171         C6420
billglick commented 5 months ago

Another possible enhancement... though it is probably not worth the effort. It might be useful to do some more error handling to make it clear why certain fields are blank.

In the following case, this particular noderange does not have any bmc* attributes set because they are VMs:

[root@mf-adm03 Dell]# ./firmware_version_check.sh VM@WEBNODE
Running scan on noderange: VM@WEBNODE 
       Nodename            Bios            iDRAC               Model                
__________________________________________________________________________
ERROR unknown ip for 'mforgeweb3'
     mforgeweb3                                                Virtual
ERROR unknown ip for 'mforgeweb4'
     mforgeweb4                                                Virtual
ERROR unknown ip for 'mforgeweb5'
     mforgeweb5                                                Virtual
inwho commented 4 months ago

I've added the suggestions of using timeouts, dynamic sizing of tables (based of longest name in noderange) and more reliable way of grabbing version numbers.

I do agree the script is quite slow. I've tried using the '-f <filter' flag to grab just bios and iDRAC versions but it didn't seem to do much better. Not sure how to go faster unless there is some parallel option for racadm I'm unaware of.

I could possibly add some error handling (I will probably have to find a cluster with VMs I can do some testing on). Maybe a check to see if racadm getversion returns empty string? I do want to say on a related note, that there is no sanitation for the node range. If the range is invalid (e.g. in the case of a typo), nodels returns all nodes instead. I don't think it should be a big deal since the script declares what range it is running against, but if you think adding more explanatory text (either here or in general) could help then I could add that.

billglick commented 4 months ago

I'm fine to not bother with any of the other suggested enhancements for this pull request. Feel free to merge.