ytti / oxidized

Oxidized is a network device configuration backup tool. It's a RANCID replacement!
Apache License 2.0
2.79k stars 921 forks source link

AOSW Model not pulling entire configuration of Instant Controller #1000

Closed tyler-8 closed 6 years ago

tyler-8 commented 7 years ago

I'm using the https://github.com/ytti/oxidized/blob/master/lib/oxidized/model/aosw.rb model and it doesn't appear to be grabbing the entire config. Looking through the diffs, it appears that it grabs varying levels of the configuration each time. Some times it'll make it further into the config than other times. I've verified there's no paging issue (show running-config just dumps the entire config in one screen). So I'm not sure what to tweak.

Is the session being closed too early? Is there a setting I can tweak that would improve this behavior? It doesn't appear to be a connection speed or stability issue either. I have other devices (IOS/Procurve) that are being pulled successfully from the same location, it's only the AOSW device.

I'm happy to provide any other information to help narrow in on the cause. Thanks in advance!

laf commented 7 years ago

You should enable debugging and post the output of the devices debug file.

tyler-8 commented 7 years ago

@laf Appreciate the reply. I turned on debug and the entire config was pulled via SSH and the session closed with an exit successfully. So it would seem that the entire config is not making it into the local git repo? If I download the config through Oxidized it's incomplete however.

laf commented 7 years ago

Can you post it? Just trim the config so no sensitive info is in.

tyler-8 commented 7 years ago

I can't I'm afraid (it's a dev environment and I can only access it through a funky web client with no copy/paste ability right now).

I have manually combed the debug output and aside from some parse errors for non-existent commands I don't see any errors. Attachment of the parse errors. aoswoutput

laf commented 7 years ago

Is that not the device telling you those commands aren't running?

tyler-8 commented 7 years ago

@laf The commands that return the Parse error are expected based on the AOSW model - it's just attempting them without failing the whole workflow, but the show running-config does return correct output as shown in the screenshot - I've cut the full output of the running config in the screenshot.

laf commented 7 years ago

Ok but this isn't showing the oxidized part to see what's going on. Can you not run a cli pastebin client to upload the text?

tyler-8 commented 7 years ago

I'm using the debug config outlined here: https://github.com/ytti/oxidized#debugging and all of the files it generates only contain the SSH outputs from the devices, even the devices that are working correctly in oxidized. The last line I see on the debug file for example is test_vc#exit. Is there another way to enable debugging for Oxidized's processing?

laf commented 7 years ago

No I think that's it but not 100% sure.

Sorry I'm out of ideas :(

tyler-8 commented 7 years ago

@laf I appreciate you trying nonetheless - I'm not a Ruby guy otherwise I'd be approaching this differently. I'll continue to tinker as I have time until I or someone else figures it out.

chipgwyn commented 7 years ago

Curious, are you running Centos7? I'm running into the same issue. Seems to only affect Aruba devices, everything else works fine. I've also tried changing the output from git to file and the issue remains. I have several other devices that have MUCH larger configs, so it doesn't feel like a ssh buffer issue.

tyler-8 commented 7 years ago

@chipgwyn Yes, running CentOS7. It's definitely not an SSH issue because the debug output shows the entire contents of the configuration. It's only when Oxidized is taking that data and parsing it somehow that it's getting cut short. I've looked through the config to see if there were some sort of special characters and there isn't anything of concern there.

chipgwyn commented 7 years ago

I tried pairing down all the stuff in aosw.rb to only the "show running-config" command, issue still exists.

tyler-8 commented 7 years ago

@chipgwyn Good validation there, that was going to be my next step. I'm beginning to wonder if something from one of these functions is causing the issue:


    out = []
    cfg.each_line do |line|
      out << line.rstrip
    end
    out = out.join "\n"
    out << "\n"
  end

  def clean cfg
    out = []
    cfg.each_line do |line|
      # drop the temperature, fan speed and voltage, which change each run
      next if line.match /Output \d Config/i
      next if line.match /(Tachometers|Temperatures|Voltages)/
      next if line.match /((Card|CPU) Temperature|Chassis Fan|VMON1[0-9])/
      next if line.match /[0-9]+\s+(RPMS?|m?V|C)/i
      out << line.strip
    end
    out = comment out.join "\n"
    out << "\n"
  end```
chipgwyn commented 7 years ago

The confounding thing is its not consistent. Where it breaks in the file seems to be different and it doesn't do it every time. I was thinking maybe its something to do with the number of devices. I ran oxidized with only three devices; two controllers and one normal cisco switch. The cisco pulled perfectly every time, the controllers seemed kind of random.

I think my next step is to check the file that's saved with the ssh debug and see if there are any odd characters that may throw things off. Perhaps a LF/CRLF issue or something... Very odd.

tyler-8 commented 7 years ago

I've observed the same behavior. Here are a couple screenshots of the "changes" that oxidized sees, even though they aren't actually changing, it just fails to parse at different parts of the config.

aosw1 aosw2

chipgwyn commented 7 years ago

I see this has the 'bug' label added to it. Anyone identified the issue? I'd be happy to provide any output or troubleshooting data, even do an interactive session if needed. Just let me know what data is needed.

jdenoy commented 6 years ago

same problem here using the firewareos model. tested in output file or git. timeout has been ported to 300 sec, but output from device takes about 40 sec in total. seems the app is not processing the full length of the data returned from the commands run

laf commented 6 years ago

If someone can provide access to a device then I can probably take more of a look?

chipgwyn commented 6 years ago

@laf I can escort you into a device for a bit, help gather data and such. I'll hit you in IRC and can work out the details. Probably will be Monday or Tuesday before I can arrange something. Thanks!

laf commented 6 years ago

Sounds good, I'm on the gitter channel so just highlight me when you're ready.

laf commented 6 years ago

So it looks like changing prompt /^\(?.+\)?\s?[#>]/ to prompt /^\(?.+\)?\s[#>]/ works for @chipgwyn

@tyler-8 Can you check if that works for you? Basically it makes the space between something # mandatory rather than optional. I don't know these devices to know if that's correct or not.

@vppencilsharpener @thanegill You've made changes to this model before, does that look correct now to you?

thanegill commented 6 years ago

I've been experiencing this same issue. Looks like this commit 4e6fc650d326e146558627fd6f13ac301fe24450 may be the breaking change. I'll change my prompt to the one that @laf suggested and do some testing.

laf commented 6 years ago

I thought the user who had committed that had closed their github account but they haven't.

@vppencilsharpener can you take a look at this and see if the proposed change above will break things for you?

@thanegill Be great if you can report back on your findings.

thanegill commented 6 years ago

@laf That new prompt seems to be working. I haven't had anything flapping of the config since making the change (2 day). Where it used to be almost every time.

vppencilsharpener commented 6 years ago

I apologize for being late to the party on this one. I had actually given up on my installation until yesterday.

Any chance we can get more information on what devices are being used by others? I'm wondering if the IAP is different enough that it needs to be handled in a special way or if the problem could be elsewhere.

It looks like making the space mandatory breaks the compatibility with the Instant APs (IAP), or at least firmware version 6.5.4.3_61959. The prompt on my device looks like this: APHostname#

Debug output complains about the prompt.

D, [2018-04-02T12:40:26.357941 #3773] DEBUG -- : lib/oxidized/input/ssh.rb: Connecting to IAPControllerIP D, [2018-04-02T12:40:27.101196 #3773] DEBUG -- : lib/oxidized/input/ssh.rb: expecting [/^(?.+)?\s[#>]/] at IAPControllerIP

I had previously seen the problem with "changes" that were not really changes.