napalm-automation / napalm

Network Automation and Programmability Abstraction Layer with Multivendor support
Apache License 2.0
2.25k stars 554 forks source link

NXOS configuration replace does not work via an SSH proxy #762

Closed iamroddo closed 5 years ago

iamroddo commented 6 years ago

I created a checkpoint file on a Nexus 9K running 7.0(3)I7(4) and transferred it to my script host with SCP. After doing so I changed the hostname on the Nexus and used the example script at http://napalm.readthedocs.io/en/latest/tutorials/samples.html#load-replace-configuration and the downloaded checkpoint/configuration file against this Nexus 9K. The outcome was that the hostname of the Nexus 9K was not reverted and the script also produced an error socket.error: [Errno 110] Connection timed out.

When I change the script to use "load_merge_candidate" the script gets further, it reverts the hostname on the Nexus 9K but times out.

Running the command rollback running-config file bootflash:test-checkpoint on the Nexus 9K rolls back the config without error.

configure replace bootflash:///test-checkpoint fails.

I setup a virtual NXOS in CML, created a checkpoint file, modified the hostname of the virtual device and rolled the checkpoint file with the CLI tool, using strategy = replace and also with merge.

With napalm --user cisco --password cisco --vendor nxos_ssh --optional_args 'ssh_config_file="/home/vagrant/.ssh/cml-config"' 10.255.0.47 configure cml-checkpoint --strategy merge, the error napalm.base.exceptions.MergeConfigException: Socket is closed appeared.

With napalm --user cisco --password cisco --vendor nxos_ssh --optional_args 'ssh_config_file="/home/vagrant/.ssh/cml-config"' 10.255.0.47 configure cml-checkpoint --strategy replace the error TimeoutError: [Errno 110] Connection timed out.

Should a checkpoint file generated by a NXOS device be able to be used by a Napalm load_replace_candidate operation? Does such a checkpoint file need to be modified in order for it to be valid?

The Napalm Ansible host is running Xenial64 on Vagrant with Napalm installed using PIP. vagrant@ansible:~$ pip list

DEPRECATION: The default format will switch to columns in the future. You can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning. asn1crypto (0.24.0) bcrypt (3.1.4) blinker (1.3) certifi (2018.4.16) cffi (1.11.5) chardet (3.0.4) cloud-init (18.2) command-not-found (0.3) configobj (5.0.6) cryptography (2.2.2) diffios (0.0.9) future (0.16.0) idna (2.7) Jinja2 (2.10) jsonpatch (1.10) jsonpointer (1.9) jtextfsm (0.3.1) junos-eznc (2.1.8) language-selector (0.1) lxml (4.2.3) MarkupSafe (1.0) napalm (2.3.1) napalm-ansible (0.9.1) ncclient (0.6.0) netaddr (0.7.19) netmiko (2.1.1) oauthlib (1.0.3) paramiko (2.4.1) pip (9.0.3) prettytable (0.7.2) pyasn1 (0.4.3) pycparser (2.18) pycurl (7.43.0) pyeapi (0.8.2) pygobject (3.20.0) pyIOSXR (0.53) PyJWT (1.3.0) PyNaCl (1.2.1) pynxos (0.0.3) pyserial (3.4) python-apt (1.1.0b1+ubuntu0.16.4.1) python-debian (0.1.27) python-systemd (231) PyYAML (3.12) requests (2.19.1) scp (0.11.0) setuptools (39.2.0) six (1.11.0) ssh-import-id (5.5) textfsm (0.4.1) ufw (0.35) urllib3 (1.23) virtualenv (15.0.1) wheel (0.31.1) You are using pip version 9.0.3, however version 10.0.1 is available. You should consider upgrading via the 'pip install --upgrade pip' command. vagrant@ansible:~$

ktbyers commented 6 years ago

@iamroddo Your first error is an error actually reaching the device.

You need to post your ansible playbook.

What are you using an ssh_config_file for? You will need to post your SSH config file:

With napalm --user cisco --password cisco --vendor nxos_ssh --optional_args 'ssh_config_file="/home/vagrant/.ssh/cml-config"'
iamroddo commented 6 years ago

I don't think the issue is with the initial connection to the device since when I run napalm CLI with the debug flag get_facts succeeds. I also discovered that "show accounting log" shows what commands were run and during the merge operation that the last successful command was the one before the banner motd. When I removed the banner motd both from the checkpoint file and the running configuration of the device I still had issues with both replace and merge. In the case of the merge operation the last command in the accounting log was 7 minutes after the first one, is this expected? Could there be a timeout? Is it possible that there is something in the checkpoint file that disrupts the SSH session?

The SSH config used ~/.ssh/cml-config



Host mgmt-lxc 
  HostName 10.17.238.146
  IdentityFile ~/.ssh/id_rsa
  User d069683

Host 10.255.0.*
  ProxyCommand ssh -F ~/.ssh/cml-config -W %h:%p mgmt-lxc```

Replace operation accounting log on NXOS device
```Core-1# sh accounting log

Sat Jul  7 09:41:26 2018:type=update:id=console0:user=cisco:cmd=clear accounting log (SUCCESS)
Sat Jul  7 09:41:38 2018:type=start:id=10.255.0.59@pts/11:user=cisco:cmd=
Sat Jul  7 09:41:42 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=terminal length 0 (SUCCESS)
Sat Jul  7 09:41:46 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=dir bootflash:/ (SUCCESS)
Sat Jul  7 09:41:47 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=dir bootflash:/cml-checkpoint (SUCCESS)
Sat Jul  7 09:42:01 2018:type=stop:id=10.255.0.59@pts/11:user=cisco:cmd=shell terminated because the ssh session closed
Core-1#```

The account logs from the merge operation
```Sat Jul  7 08:26:42 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; version 7.0(3)I6(1) (SUCCESS)
Sat Jul  7 08:26:42 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; hostname Core-1 (SUCCESS)
Sat Jul  7 08:26:43 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; class-map type network-qos c-nq1 (FAILURE)
Sat Jul  7 08:26:44 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; class-map type network-qos c-nq2 (FAILURE)
Sat Jul  7 08:26:46 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; class-map type network-qos c-nq3 (FAILURE)
Sat Jul  7 08:26:47 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; class-map type network-qos c-8q-nq1 (FAILURE)
<snip snip>
Sat Jul  7 08:33:17 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; interface loopback0 ; ip address 10.0.139.241/32 (SUCCESS)
Sat Jul  7 08:33:18 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; line console (SUCCESS)
Sat Jul  7 08:33:18 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; line vty (SUCCESS)
Sat Jul  7 08:33:19 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; ip route 0.0.0.0/0 10.0.128.1 (REDIRECT)
Sat Jul  7 08:33:19 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; ip route 0.0.0.0/0 10.0.128.1 (SUCCESS)
Sat Jul  7 08:33:19 2018:type=update:id=10.255.0.59@pts/11:user=cisco:cmd=configure terminal ; no priority-flow-control override-interface mode off (SUCCESS)
Sat Jul  7 08:33:29 2018:type=stop:id=10.255.0.59@pts/11:user=cisco:cmd=shell terminated because the ssh session closed```
iamroddo commented 6 years ago

Accidentally closed

ktbyers commented 6 years ago

@iamroddo Can you re-state your issue?

Your original issue was replace (merge and get_facts are very different than the replace operation).

Please post your error message and your Ansible playbook.

ktbyers commented 6 years ago

Note, replace config on nxos_ssh will not work via an SSH proxy.

iamroddo commented 6 years ago

I wanted to get replace working. The details on merge was to provide background. Is there any chance that replace will function with nxos_ssh via SSH proxy at any point?

ktbyers commented 6 years ago

@iamroddo Feel free to submit a PR on it.

ktbyers commented 5 years ago

This is a bit of a hard problem or at least harder than it might initially appear due to use of Secure Copy for the file transfer (and also due to Netmiko SCP needing to support Cisco IOS and consequently needing to open a completely separate SCP channel). This is NX-OS, but Netmiko needs to support both so its pattern is the lowest common denominator (i.e. supports Cisco IOS).

If anyone wants to work on this, let me know.

I am going to close this for now.