thomas-krenn / check_lsi_raid

Monitoring plugin to check MegaRAID controllers
GNU General Public License v3.0
59 stars 26 forks source link

Error: invalid controller number, controller not found! after replacing the motherboard. #33

Closed Fever-Wits closed 2 years ago

Fever-Wits commented 2 years ago

Hello,

I had to replace my motherboard with one: Supermicro MBD-X8DT6-A-IS018. When executing the script:


# ./check_lsi_raid
Error: invalid controller number, controller not found!

storcli sees the controller

 # ./storcli64 show
CLI Version = 007.1804.0000.0000 Apr 09, 2021
Operating system = Linux 5.4.140-1-pve
Status Code = 0
Status = Success
Description = None

Number of Controllers = 1
Host Name = nasa
Operating System = Linux 5.4.140-1-pve
StoreLib IT Version = 07.1803.0200.0000
StoreLib IR3 Version = 16.14-0

System Overview:
===============

-------------------------------------------------- ---------------------------------
Ctl Model Ports PDs DGs DNOpt VDs VNOpt BBU sPR DS EHS ASOs Hlth
-------------------------------------------------- ---------------------------------
  0 AVAGOMegaRAIDSAS9361-8i 8 16 3 0 3 0 Opt Off - Y 5 Opt
-------------------------------------------------- ---------------------------------

Ctl = Controller Index | DGs = Drive groups | VDs = Virtual drives | Fld = Failed
PDs = Physical drives | DNOpt = Array NotOptimal | VNOpt = VD NotOptimal | Opt = Optimal
Msng = Missing | Dgd = Degraded | NdAtn = Need Attention | Unkwn = Unknown
sPR = Scheduled Patrol Read | DS = DimmerSwitch | EHS = Emergency Spare Drive
Y = Yes | N = No | ASOs = Advanced Software Options | BBU = Battery backup unit / CV
Hlth = Health | Safe = Safe-mode boot | CertProv-Certificate Provision mode
Chrg = Charging | MsngCbl = Cable Failure

What to look for and how can I solve the problem?

Regards,

Fever-Wits commented 2 years ago

Hello, after help from a friend to debug the script, it turned out that on / opt / MegaRAID / storcli / storcli64 / c0 show time returns to me:

# /opt/MegaRAID/storcli/storcli64/c0 show time
CLI Version = 007.1804.0000.0000 Apr 09, 2021
Operating system = Linux 5.4.140-1-pve
Controller = 0
Status = Failure
Description = None

Detailed Status:
===============

--------------------------------------------------- -----
Ctrl Status Ctrl_Prop Value ErrMsg ErrCd
--------------------------------------------------- -----
    0 Failed Time - CTRL_TIME_GET failed 49
--------------------------------------------------- -----

After a little digging, I adjusted the clock on the raid controller and everything works fine.