logicminds / rubyipmi

Command line wrapper for ipmitool and freeipmi
GNU Lesser General Public License v2.1
35 stars 32 forks source link

Fix error retry logic #27

Closed bdunne closed 9 years ago

bdunne commented 9 years ago

Matching error strings is high maintenance and prone to error. Different vendors return different errors. Instead, if the command fails with the current driver, try a different one until they are none left.

bdunne commented 9 years ago

@logicminds I was running into some issues with the command retry logic with the various Dell and IBM servers in my lab. I have a wide variety of driver support on these systems and noticed that the different vendors return different errors too. Rather than adding to the error list, I thought this would be a much more maintainable approach to the command retry logic. What do you think?

logicminds commented 9 years ago

These look like some good changes. Slightly hesitant on gutting find fixes, although it was really only used to switch drivers. However, the potential to fix other errors was at least possible. Curious what kind of machines you were running specifically? Any IPMI 1.0 or IPMI 1.5 systems? What was the exact problem you were having?

bdunne commented 9 years ago

@logicminds As far as I could tell, #find_fix and the rest of the retry logic was only used to change the driver version. Are there other potential problems that you think it may be used to solve?

The machines include Dell PowerEdge 2900 (IPMI 1.0), Dell PowerEdge R410 (IPMI 1.0 & 1.5), Dell PowerEdge R420 (IPMI 1.5) and IBM X3550 (IPMI 1.0 & 1.5). I ran into most of the problems on the 2900 and the R420. I would request the "chassis power status" and get a nil response instead of "on" or "off". It was due to the error response that the BMC or iDRAC returned not matching any of the known error codes.

logicminds commented 9 years ago

Yea, about a year ago I made the default driver 2.0 which would have caused this problem. Although if you explicitly pass the 1.0 or 1.5 driver if might have run cleanly. Do you have any log files?

Why is your hardware so old?

bdunne commented 9 years ago

@logicminds The old hardware is just a test environment. It's on borrowed time, but as long as it keeps running, I don't see any reason to replace it. Some of the new iDRAC7 controllers have dropped support for "lan15", so if I run with "lan15" I get:

D, [2015-08-06T22:43:44.377792 #8386] DEBUG -- Rubyipmi: Get Session Challenge command failed
Error: Unable to establish LAN session
Unable to get Chassis Power Status

Are there other errors unrelated to the driver that you are concerned about catching?

logicminds commented 9 years ago

These are the documented errors

So if the server supports 2.0 it likely won't support 1.5 because of terrible security issues with 1.5 and lower.

logicminds commented 9 years ago

Also can you rebase against the development branch and reissue PR against development

logicminds commented 9 years ago

@brandondunne I cherry-picked a few changes. And I'll merge those in shortly. No need to rebase. I am going to wait to implement some of the other things.