Closed bdunne closed 9 years ago
@logicminds I was running into some issues with the command retry logic with the various Dell and IBM servers in my lab. I have a wide variety of driver support on these systems and noticed that the different vendors return different errors too. Rather than adding to the error list, I thought this would be a much more maintainable approach to the command retry logic. What do you think?
These look like some good changes. Slightly hesitant on gutting find fixes, although it was really only used to switch drivers. However, the potential to fix other errors was at least possible. Curious what kind of machines you were running specifically? Any IPMI 1.0 or IPMI 1.5 systems? What was the exact problem you were having?
@logicminds As far as I could tell, #find_fix
and the rest of the retry logic was only used to change the driver version. Are there other potential problems that you think it may be used to solve?
The machines include Dell PowerEdge 2900 (IPMI 1.0), Dell PowerEdge R410 (IPMI 1.0 & 1.5), Dell PowerEdge R420 (IPMI 1.5) and IBM X3550 (IPMI 1.0 & 1.5). I ran into most of the problems on the 2900 and the R420. I would request the "chassis power status" and get a nil
response instead of "on" or "off". It was due to the error response that the BMC or iDRAC returned not matching any of the known error codes.
Yea, about a year ago I made the default driver 2.0 which would have caused this problem. Although if you explicitly pass the 1.0 or 1.5 driver if might have run cleanly. Do you have any log files?
Why is your hardware so old?
@logicminds The old hardware is just a test environment. It's on borrowed time, but as long as it keeps running, I don't see any reason to replace it. Some of the new iDRAC7 controllers have dropped support for "lan15", so if I run with "lan15" I get:
D, [2015-08-06T22:43:44.377792 #8386] DEBUG -- Rubyipmi: Get Session Challenge command failed
Error: Unable to establish LAN session
Unable to get Chassis Power Status
Are there other errors unrelated to the driver that you are concerned about catching?
These are the documented errors
So if the server supports 2.0 it likely won't support 1.5 because of terrible security issues with 1.5 and lower.
Also can you rebase against the development branch and reissue PR against development
@brandondunne I cherry-picked a few changes. And I'll merge those in shortly. No need to rebase. I am going to wait to implement some of the other things.
Matching error strings is high maintenance and prone to error. Different vendors return different errors. Instead, if the command fails with the current driver, try a different one until they are none left.