ytti / oxidized

Oxidized is a network device configuration backup tool. It's a RANCID replacement!
Apache License 2.0
2.8k stars 925 forks source link

Creating new model -- Unsure where this is failing #2303

Closed spurgelaurels closed 3 years ago

spurgelaurels commented 3 years ago

I'm a bit confused here, because I don't know which part of this is failing. I'm trying to connect to some Aerohive APs to pull their configs, which meant having to create a new model. The rb file is pasted below, and is quite simple.

Unfortunately, the logs look like it's working, but then drop a non-verbose failure, so I'm not sure where to work from here. Back in the rancid days, you could invoke clogin/hlogin to test the connectivity first, and I don't know if oxidized has something like that. W, [2021-05-14T12:53:28.107607 #1259] WARN -- : /ap-basement status fail, retry attempt 1

config

---
username: admin
password: "Password"
model: Hiveos
resolve_dns: false
interval: 3600
use_syslog: false
debug: true
threads: 30
timeout: 20
retries: 3
prompt: !ruby/regexp /^([\w.@-]+[#>]\s?)$/
rest: 127.0.0.1:8888
next_adds_job: false

vars: {}

groups: {}
models: {}
pid: "/root/.config/oxidized/pid"

crash:
  directory: "/root/.config/oxidized/crashes"
  hostnames: false

stats:
  history_size: 10

input:
  default: ssh
  debug: true
  ssh:
    secure: false
  ftp:
    passive: true
  utf8_encoded: true

output:
  default: file
  file:
    directory: "/root/.config/oxidized/configs"

source:
  debug: true
  default: csv
  csv:
    file: "/root/.config/oxidized/router.db"
    delimiter: !ruby/regexp /:/
    map:
      ip: 0
      name: 1
      model: 2
    gpg: false

model_map:
  juniper: junos
  cisco: ios

router.db (There are several more entries redacted) 172.20.0.60:ap-front:Hiveos

/var/lib/gems/2.5.0/gems/oxidized-0.28.0/lib/oxidized/model/hiveos.rb

class Hiveos < Oxidized::Model

  cmd 'show running-config'

end

Log Output from oxidized

D, [2021-05-14T12:53:25.959723 #1259] DEBUG -- : resolving DNS for ap-greathall...
D, [2021-05-14T12:53:25.959735 #1259] DEBUG -- : IPADDR 172.20.0.65
D, [2021-05-14T12:53:25.959759 #1259] DEBUG -- : node.rb: resolving node key 'model', with passed global value of '' and node value 'hiveos'
D, [2021-05-14T12:53:25.959774 #1259] DEBUG -- : node.rb: setting node key 'model' to value 'Hiveos' from global
D, [2021-05-14T12:53:25.959803 #1259] DEBUG -- : node.rb: returning node key 'model' with value 'hiveos'
D, [2021-05-14T12:53:25.959818 #1259] DEBUG -- : node.rb: resolving node key 'input', with passed global value of 'ssh' and node value ''
D, [2021-05-14T12:53:25.959830 #1259] DEBUG -- : node.rb: returning node key 'input' with value 'ssh'
D, [2021-05-14T12:53:25.959846 #1259] DEBUG -- : node.rb: resolving node key 'output', with passed global value of 'file' and node value ''
D, [2021-05-14T12:53:25.959865 #1259] DEBUG -- : node.rb: returning node key 'output' with value 'file'
D, [2021-05-14T12:53:25.959880 #1259] DEBUG -- : node.rb: resolving node key 'username', with passed global value of '' and node value ''
D, [2021-05-14T12:53:25.959898 #1259] DEBUG -- : node.rb: setting node key 'username' to value 'admin' from global
D, [2021-05-14T12:53:25.959916 #1259] DEBUG -- : node.rb: returning node key 'username' with value 'admin'
D, [2021-05-14T12:53:25.959941 #1259] DEBUG -- : node.rb: resolving node key 'password', with passed global value of '' and node value ''
D, [2021-05-14T12:53:25.959954 #1259] DEBUG -- : node.rb: setting node key 'password' to value 'Password' from global
D, [2021-05-14T12:53:25.959972 #1259] DEBUG -- : node.rb: returning node key 'password' with value 'Password'
I, [2021-05-14T12:53:25.960009 #1259]  INFO -- : lib/oxidized/nodes.rb: Loaded 5 nodes
D, [2021-05-14T12:53:26.106626 #1259] DEBUG -- : lib/oxidized/core.rb: Starting the worker...
Puma starting in single mode...
* Version 3.11.4 (ruby 2.5.1-p57), codename: Love Song
* Min threads: 0, max threads: 16
* Environment: development
* Listening on tcp://127.0.0.1:8888
Use Ctrl-C to stop
D, [2021-05-14T12:53:27.106801 #1259] DEBUG -- : lib/oxidized/worker.rb: Jobs running: 0 of 1 - ended: 0 of 5
D, [2021-05-14T12:53:27.107240 #1259] DEBUG -- : lib/oxidized/worker.rb: Added /ap-basement to the job queue
D, [2021-05-14T12:53:27.107269 #1259] DEBUG -- : lib/oxidized/worker.rb: 1 jobs running in parallel
D, [2021-05-14T12:53:27.107305 #1259] DEBUG -- : lib/oxidized/job.rb: Starting fetching process for ap-basement at 2021-05-14 12:53:27 UTC
D, [2021-05-14T12:53:27.107428 #1259] DEBUG -- : lib/oxidized/job.rb: Config fetched for ap-basement at 2021-05-14 12:53:27 UTC
W, [2021-05-14T12:53:28.107607 #1259]  WARN -- : /ap-basement status fail, retry attempt 1
D, [2021-05-14T12:53:28.107660 #1259] DEBUG -- : lib/oxidized/worker.rb: Jobs running: 0 of 1 - ended: 0 of 5
D, [2021-05-14T12:53:28.107747 #1259] DEBUG -- : lib/oxidized/worker.rb: Added /ap-basement to the job queue
D, [2021-05-14T12:53:28.107829 #1259] DEBUG -- : lib/oxidized/worker.rb: 1 jobs running in parallel
D, [2021-05-14T12:53:28.107794 #1259] DEBUG -- : lib/oxidized/job.rb: Starting fetching process for ap-basement at 2021-05-14 12:53:28 UTC
D, [2021-05-14T12:53:28.107896 #1259] DEBUG -- : lib/oxidized/job.rb: Config fetched for ap-basement at 2021-05-14 12:53:28 UTC
W, [2021-05-14T12:53:29.108176 #1259]  WARN -- : /ap-basement status fail, retry attempt 2
spurgelaurels commented 3 years ago

So I tested with the linuxgeneric model, and oxidized was able to log, and execute cat /etc/hostname but it barfs without error right after that.... It doesn't even newline properly.

root@0eec47dfa1d3:~/.config/oxidized/logs# cat 172.20.1.63-ssh
Last login: Fri May 14 10:29:01 2021 from 172.17.0.7
[admin@alembic ~]$ cat /etc/hostname
alembic.slough.ca
[admin@alembic ~]$ root@0eec47dfa1d3:~/.config/oxidized/logs#

The only logs I can see in oxidized's output:

D, [2021-05-14T14:29:43.049949 #1367] DEBUG -- : lib/oxidized/job.rb: Config fetched for alembic at 2021-05-14 14:29:43 UTC
W, [2021-05-14T14:29:43.224447 #1367]  WARN -- : /alembic status no_connection, retries exhausted, giving up
davama commented 3 years ago

Not a ruby expert but at least most model i see have this generic outline:

cat bla.rb

class BLA < Oxidized::Model
  prompt /^([\w.@()-]+[#>]\s?)$/
  comment  '! '

  cmd 'show running-config'

  cfg :telnet do
    username /^login:/
    password /^Password:/
  end

  cfg :ssh, :telnet do
    post_login 'terminal length 0'
    post_login 'terminal width 0'
    pre_logout 'logout'
  end
end
spurgelaurels commented 3 years ago

Not a ruby expert but at least most model i see have this generic outline:

Made the change suggested, and now I can see debug SSH logs where it's negotiating between client and server. Even got as far as outputting logs into the logs directory now. Seems it's dying as soon as the prompt is present, but now at least I think I can work with the model file to fix or troubleshoot!

spurgelaurels commented 3 years ago

Okay, so half of my APs aren't working and are getting SSH Conn reset by peer errors. The other half are. I realized that the model versions were slightly different, as one is an AP130, the other is an AP330. Commands are the same for both, prompt is the same for both. Only difference is the copyright is on a line-break on one model.

admin@172.20.0.60's password:
Last login: Fri May 14 11:36:45 2021 from 172.20.1.63
Copyright (c) 2006-2020 Extreme Networks, Inc.
AP-FRONT#show ver
Copyright (c) 2006-2020 Extreme Networks, Inc.

Version:            HiveOS 10.0r10b build-254127
Build time:         Thu Jan 21 09:41:43 UTC 2021
Build cookie:       2101210141-254127
Platform:           AP130
Bootloader ver:     v0.0.4.42
TPM ver:            v1.2.66.4
Uptime:             1 weeks, 4 days, 1 hours, 17 minutes, 0 seconds
AP-FRONT#
admin@172.20.0.61's password:
Last login: Fri May 14 11:37:51 2021 from 172.20.1.63
Extreme Networks, Inc.
Copyright (c) 2006-2020
AP-BASEMENT#show ver
Extreme Networks, Inc.
Copyright (c) 2006-2020

Version:            HiveOS 6.5r14 build-255963
Build time:         Mon Mar  1 10:37:16 UTC 2021
Build cookie:       2103010237-255963
Platform:           AP370
Bootloader ver:     v1.0.3.41
TPM ver:            v1.2.35.8
Uptime:             1 weeks, 3 days, 23 hours, 59 minutes, 26 seconds
AP-BASEMENT#
spurgelaurels commented 3 years ago

As I dug deeper into this, I realized that the 3 non-working APs were running older versions of OpenSSH (5.9). Debug logs shows the host disconnected our ssh attempts, likely due to some older keys/cipher use.

Unfortunately, using SSH from the command line on my oxidized host connects just fine, and debug logs don't show anything useful.

I've added the following to my config file, and they seem to connect now! If you're encountering this as well, check the docs and you'll see you can apply this with a model / vars map to different devices as needed. (I only have one decide type, so a global var is fine)

vars:
  ssh_kex: diffie-hellman-group-exchange-sha256