ytti / oxidized

Oxidized is a network device configuration backup tool. It's a RANCID replacement!
Apache License 2.0
2.79k stars 922 forks source link

Failed getting to prompt on HP Procurve switches #1705

Closed mincebert closed 1 year ago

mincebert commented 5 years ago

[Branching from thread 1607.]

Some of my HP ProCurve switches are failing to backup...

What do the ~/oxidized/logs/ip-ssh and/or ip-telnet file(or whatever your folder structure is) show?

They show the login banner saying "HP J..." ending with "ESC[1;15rESC[1;1H". (I'm using 'less' to view this so the escape sequences show up rather than being interpreted by my terminal.)

The working one continues on from there saying "ESC[24;1HPress any key to continueESC[15;1H..." and then the rest of the session. So I think it's failing before it gets to the prompt.

Have you tried to ssh or telnet from the server to the device does that work correctly?

Yup - it works fine.

Within oxidized Is debug enabled? If so in the router.db disable all devices by putting a # in front of them except for one of the ones with an issue. Run oxidized manually from command prompt from the user you are running it as, what does the log say?

W, [2019-02-13T19:16:39.489158 #16914] WARN -- : 172.30.66.15 raised IOError with msg "closed stream" W, [2019-02-13T19:16:45.733439 #16914] WARN -- : 172.30.66.16 raised Oxidized::PromptUndetect with msg "unable to detect prompt: (?-mix:^\r?([\w\s.-]+# )$)"

... this message is probably slightly misleading as it's really saying "I didn't get to the prompt" when it actually failed somewhere above that.

It may be worth [as a suggestion for a general improvement] if we could dump out the last 100 or so bytes read on the SSH session to show where it failed, to help work out which bit it's stuck on.

I've had a look at the escape sequences around this and they look the same across the working and not working versions.

I've attached the logs for the two switches and also the output of 'ssh SWITCH | tee -o ...', then taken that and put a newline before the 'ESC[' and a '$' at the end of the lines to highlight those and it looks the same to me.

Some of the hp switches are comware and not procurve and require a slightly different set of commands/prompts.

Indeed, but I don't have any of those.

I've tried that The key to troubleshooting with oxidized is to watch the logs both ssh/telnet and oxidized itself. Pretty straightforward errors from what I have seen/experienced.

Yes. I think the HP might be more complicated than some other platforms due to its extensive use of VT100 escape sequences and the need to strip them out with lots of 'expect' substitutions.

Is there anything explaining how oxidized matches things? The 'expect' entries often match things anchored with '^' and '$' but often the text coming in is not on complete lines. Does it read things into a buffer character by character and then run the regexps after each one? If not, I don't understand how it matches the end and start of the string.

Thanks for your help!

not-working.log working.log

cdshow commented 5 years ago

It is possible you are right and this is related to the other issue but I didn't want to confuse things as the other issue was related to operator vs manager login prompts and permissions of what can be done from each type of account.

I hadn't opened the issue yet but I do have the same thing you do on a few of my newest switches. So. I know what you are talking about and had done some research on it. I am almost sure it is related to this https://community.hpe.com/t5/Aruba-ProVision-based/How-to-get-correct-output-when-telnet-to-HP-ProCurve/td-p/5322661#.XGwaHOhKgh4

It may also be related to: https://github.com/ytti/oxidized/issues/356

I am not sure how to filter out the escape sequences but I did get a little further by logging into the switch manually and set the terminal type to vt100. That is done by typing the following when logged in as a "manager": terminal type vt100 write mem and then reboot as that only takes affect after a restart. It still has a bunch of junk between the outputs BUT the commands are being sent correctly and most of the data is saved into oxidized. hp-2530-escape-sequences.txt

Hopefully the hard set vt100 helps you as well. I have just been dealing with that in my output between commands. I would love to clean it up but have other more pressing issues. Perhaps ytti or someone can weigh in on how to clean up the escape characters correctly.

I think it is as I thought, we are in the same boat. Without hard coding the vt100 it sends the commands but nothing is returned to oxidized.

cdshow commented 5 years ago

Actually I think it is related to: https://github.com/ytti/oxidized/pull/746 But that "says" it is fixed... Hmm. I'll look more latter.

cdshow commented 5 years ago

Wanna see something interesting? Use cat to view the -ssh log file rather than using less.... It displays correctly. I'm not sure what to make of that. It has to do with the character escape sequence detection regex I think. I don't know the solution.

mincebert commented 5 years ago

Hello. Excuse the delay - I was out of the office yesterday (amusingly at HPE in London, where they were talking about switches, but I don't think the right people were there for this, but I should have asked!).

Anyway... it displays fine if I use 'cat', yes - that interprets the VT100 escape sequences. However, as the problem appears to be with how the VT100 escape sequences are stripped out by the 'expect' lines in oxidized, I was trying to parse those by hand to see where they might be going wrong: listing the output in 'less' shows the escape character and following sequences.

I did attach the raw output but something seems to have gone wrong and it didn't appear.

I haven't tried forcing the VT100 terminal type as I can't easily restart the affected switches. However, I've got a spare one with the same model (HP J9729A 2920-48G-POE+), so I'll try things on that. I'll also test it out running the different OS releases (the broken one is running WB.15.14.0007 and the working one is running WB.15.18.0013: maybe an upgrade is the simplest solution, but it seems better to find out why).

I'll report back.

cdshow commented 5 years ago

I'm running HP J9775A 2530-48G Switch with Software revision YA.16.02.0012 I believe that is the newest revision, The escape sequence issue is present with that config. I think it may have to do with SSH version compatibility. I executed a ssh -vX and here is the relevant section

debug1: Local version string SSH-2.0-OpenSSH_7.6p1 Ubuntu-4ubuntu0.2 debug1: Remote protocol version 2.0, remote software version Mocana SSH 5.8 debug1: no match: Mocana SSH 5.8

Is there something that needs to be done to add a Remote Protocol match on the latest Ubuntu?

cdshow commented 5 years ago

Do you have the same error when you preform an ssh -vX to your switch manually? If so, I'm wondering if the links below are related, mentioning a bug in Mocana SSH. Can't find a lot more information on this, I'm moving on for now. If anyone finds a solution please post here/let me know! Found this https://www.reddit.com/r/linuxmint/comments/2e9je1/problem_with_ssh_on_mint_17/ Don't know how to implement that within oxidized so I may be "stuck".... The reddit post also led me to this: https://bugzilla.mindrot.org/show_bug.cgi?id=2116

mincebert commented 5 years ago

I've tried a spare switch running WB.15.14.something and it failed in the same way. Upgrading to 15.16.0021 resolved the issue. Upgrading to 16.08.0001 (the latest, as I write this) still worked.

All of the switches we have issues with at present at running 15.14.something or earlier, so I'm going to set them all to upgrade to 15.16.0021 overnight tonight. We'll see if that fixes things. [I can't go straight to 16.something at the moment as that's not supported, but I'll do that at a later date.]

We also have some ProCurve 2615s running slightly older releases so I've set those to upgrade, too.

I think it's not worth debugging these, if the problem is resolved with a software upgrade. However, if I get anything still failing after this, I'll debug them further.

acederlund commented 5 years ago

I'm having the same/similar issue on a Procurve 2510G-24 with software Y.11.12. Only telnet is supported on this software version.

Oxidized can log on and authenticate to the switch, but it seems to have problems after logging on: oxidized[19051]: X.X.X.X raised Net::ReadTimeout (rescued Timeout::Error) with msg "timed out while waiting for more data"

Checking the X.X.X.X-telnet file, I can see that it could log on and recieves the prompt, but it seems to timeout waiting for data. Managing the switch normally via telnet from the Oxidized server works just fine.

acederlund commented 5 years ago

Also, I'm experiencing the same issue on a ProCurve E5406zl with firmware K.15.06.0008.

SSH is working on this one, but there I get this error: oxidized[18441]: X.X.X.X raised Oxidized::PromptUndetect with msg "YYYYY switchname> not matching configured prompt (?-mix:^\r?([\w\s.-]+# )$)"

The switch itself does not require any username, just a password. Telnet just gives timeout (like the 2510G-24).

EDIT: I solved this issue by adding a username for logging in, and now it works just fine.

martinohansen commented 5 years ago

We're having the issue on HPE 2530 running version: YA.15.17.0009. Our log shows:

not matching configured prompt (?-mix:^\r?([\w\s.-]+# )$)"

@mincebert did you manage to solve the issue on all devices with the mentioned versions?

acederlund commented 5 years ago

I've seen this issue on ProCurve 2626 as well with software H.10.83, it can log on but after logging on and getting prompt it's just a timeout: raised Net::ReadTimeout (rescued Timeout::Error) with msg "timed out while waiting for more data"

acederlund commented 5 years ago

After manually adding the changes to procurve.rb that @deajan had done in PR 1866 I can confirm my switches are working properly! Nice work @deajan !