Closed MichalObs97 closed 9 months ago
What happens if you switch the Nornir runner to serial (i.e. disable threads)?
Serial runner works fine! Nornir is able log into the device and do the command. Problem seems to only occur when there is threading present and it gets worse with higher num_workers
DEBUG:paramiko.transport:=== Key exchange agreements ===
DEBUG:paramiko.transport:Kex: diffie-hellman-group-exchange-sha1
DEBUG:paramiko.transport:HostKey: rsa-sha2-512
DEBUG:paramiko.transport:Cipher: aes256-ctr
DEBUG:paramiko.transport:MAC: hmac-sha2-256
DEBUG:paramiko.transport:Compression: none
DEBUG:paramiko.transport:=== End of kex handshake ===
DEBUG:paramiko.transport:Got server p (8192 bits)
DEBUG:paramiko.transport:kex engine KexGex specified hash_algo <built-in function openssl_sha1>
DEBUG:paramiko.transport:Switch to new keys ...
DEBUG:paramiko.transport:Got EXT_INFO: {'server-sig-algs': b'ssh-ed25519,ssh-rsa,rsa-sha2-256,rsa-sha2-512,ssh-dss'}
DEBUG:paramiko.transport:Adding ssh-rsa host key for XXX
DEBUG:paramiko.transport:userauth is OK
INFO:paramiko.transport:Auth banner: b'****...
Are you using AAA
for authentication (TACACS+/RADIUS)?
Yes sir, I am using TACACS to authenticate
@MichalObs97 Yes, that is likely your issue i.e. your TACACS+ server is unable to keep up with the thread authentications and starts to cause problems.
One possible solution is to introduce a small random delay before each Netmiko connection is created.
I appreciate your explanation! I have two follow up questions.
Thank you so much for your help!
@MichalObs97 On question #2, don't know.
On question #1, something like:
import random
import time
def netmiko_task_w_sleep(task):
# Sleep a random time between 0 and 1 second (you would have to experiment here)
time.sleep(random.random())
task.run(task=netmiko_send_command, command_string="show ip int brief")
And then you would call this task from your main Nornir section.
You might need to multiple that sleep to larger values (i.e. to introduce more random delay between threads starting). Let me know if this helps or not.
Hi, I ran some tests for the solution you provided me with. I added two random times back to back to introduce more random wait time:
def paging(task):
time.sleep(random.randint(1,10))
time.sleep(random.randint(1,10))
task.run(task=netmiko_send_command, command_string= 'terminal length 0', enable = True)
print(esc('31') + f'{task.host}')
After running the code my results were very similar to the ones at the beginning: 15/152 successful on Dell OS9 46/47 successful on Dell OS6
From the log file I see that errors for OS9 are mostly: paramiko.ssh_exception.SSHException: No existing session
one again. I recorded 129 errors like that. (some switches might be down so pretty much all failed are No existing session).
Note: This was ran with num_workers set on 50. I also ran it with 20 threads and result was 60/152 and 5 threads with result of 142/152
@MichalObs97 Did you ever work this out? Are you seeing errors/issues on your TACACS+ server? I am still inclined to think it is a TACACS+ bottleneck.
Hi, I did not really worked this out with netmiko. I changed my script to use scrapli instead, on the same devices. With Scrapli I am able to use 100 threads and devices work, which wouldn't match with your theory. I must say that we did have a problem with TACACS+ as well, but it has been patched and now works well. Unfortunately, netmiko (and paramiko) still showing the same errors. So I am not really sure, whats up :(
Update: I tried scarpli and paramiko and it is showing the same error, so it is pointing towards something in Paramiko not liking Dell_OS9
@MichalObs97 Okay, thanks for the additional explanation. It seems like it is in some way similar to this issue:
https://github.com/ktbyers/netmiko/pull/2688
https://github.com/paramiko/paramiko/issues/2005
But you had already set conn_timeout to a larger value so that doesn't really explain it (and you saw the same issue with paramiko-scrapli so that would be odd as timeout would likely be different).
It would be interesting to see/know if Netmiko 4.0.0 fixed this issue, but totally understand if testing that doesn't work for you.
@ktbyers I tried it with netmiko 4.0.0 and still the same behavior.
Hi guys, seems like I have the same issue -> I am using Nornir to connect to my devices, get some data via Scrapli, parse results, send files to the devices via netmiko_file_transfer
which fails.
Sample code:
def copy_rsa_keys_to_device(task: Task, dry_run: bool) -> None:
task.run(
task=netmiko_file_transfer,
source_file=f'/tmp/{rsa_key_file}',
dest_file=f'/disk0:/{rsa_key_file}'
)
targets = nr.filter(usage=usage, type='r2')
targets.run(task=copy_rsa_keys_to_device, dry_run=args.dry_run)
...
Error:
[04/20/2022 03:25:11 PM][nornir.core.task][ERROR] Host 'lab-vars-r2n1-sk': task 'netmiko_file_transfer' failed with traceback:
Traceback (most recent call last):
File "/home/mdieska/.local/lib/python3.8/site-packages/netmiko/base_connection.py", line 935, in establish_connection
self.remote_conn_pre.connect(**ssh_connect_params)
File "/home/mdieska/.local/lib/python3.8/site-packages/paramiko/client.py", line 368, in connect
self._auth(
File "/home/mdieska/.local/lib/python3.8/site-packages/paramiko/client.py", line 691, in _auth
raise saved_exception
File "/home/mdieska/.local/lib/python3.8/site-packages/paramiko/client.py", line 684, in _auth
self._transport.auth_interactive_dumb(username)
File "/home/mdieska/.local/lib/python3.8/site-packages/paramiko/transport.py", line 1520, in auth_interactive_dumb
return self.auth_interactive(username, handler, submethods)
File "/home/mdieska/.local/lib/python3.8/site-packages/paramiko/transport.py", line 1493, in auth_interactive
raise SSHException('No existing session')
paramiko.ssh_exception.SSHException: No existing session
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mdieska/.local/lib/python3.8/site-packages/nornir/core/task.py", line 99, in start
r = self.task(self, **self.params)
File "/home/mdieska/.local/lib/python3.8/site-packages/nornir_netmiko/tasks/netmiko_file_transfer.py", line 24, in netmiko_file_transfer
net_connect = task.host.get_connection(CONNECTION_NAME, task.nornir.config)
File "/home/mdieska/.local/lib/python3.8/site-packages/nornir/core/inventory.py", line 494, in get_connection
self.open_connection(
File "/home/mdieska/.local/lib/python3.8/site-packages/nornir/core/inventory.py", line 546, in open_connection
conn_obj.open(
File "/home/mdieska/.local/lib/python3.8/site-packages/nornir_netmiko/connections/netmiko.py", line 59, in open
connection = ConnectHandler(**parameters)
File "/home/mdieska/.local/lib/python3.8/site-packages/netmiko/ssh_dispatcher.py", line 326, in ConnectHandler
return ConnectionClass(*args, **kwargs)
File "/home/mdieska/.local/lib/python3.8/site-packages/netmiko/cisco/cisco_xr.py", line 10, in __init__
return super().__init__(*args, **kwargs)
File "/home/mdieska/.local/lib/python3.8/site-packages/netmiko/base_connection.py", line 350, in __init__
self._open()
File "/home/mdieska/.local/lib/python3.8/site-packages/netmiko/base_connection.py", line 355, in _open
self.establish_connection()
File "/home/mdieska/.local/lib/python3.8/site-packages/netmiko/cisco/cisco_xr.py", line 14, in establish_connection
super().establish_connection(width=511, height=511)
File "/home/mdieska/.local/lib/python3.8/site-packages/netmiko/base_connection.py", line 980, in establish_connection
raise NetmikoTimeoutException(msg)
netmiko.ssh_exception.NetmikoTimeoutException: Paramiko: 'No existing session' error: try increasing 'conn_timeout' to 10 seconds or larger.
(task.py:115)
[04/20/2022 03:25:11 PM][nornir.core.task][ERROR] Host 'lab-vars-r2n1-sk': task 'copy_rsa_keys_to_device' failed with traceback:
Traceback (most recent call last):
File "/home/mdieska/.local/lib/python3.8/site-packages/nornir/core/task.py", line 99, in start
r = self.task(self, **self.params)
File "pnb_aaa_config.py", line 313, in copy_rsa_keys_to_device
task.run(
File "/home/mdieska/.local/lib/python3.8/site-packages/nornir/core/task.py", line 174, in run
raise NornirSubTaskError(task=run_task, result=r)
nornir.core.exceptions.NornirSubTaskError: Subtask: netmiko_file_transfer (failed)
Can you try using the develop
branch here and Netmiko 4.0.0 and see if it fixes the issue?
Note, this fix might only be a temporary fix if you are using NAPALM at all as NAPALM is currently only Netmiko 3.4.0 (though I do have a pull-request in the NAPALM repository that you could potentially use for that as well).
@m1009d
Hi @ktbyers , unfortunately no :(
I tried also with paramiko_ng-2.8.10
instead of paramiko 2.10.3
but with the same result.
[04/20/2022 09:54:30 PM][nornir.core.task][DEBUG] Host 'lab-vars-r2n1-sk': running task 'netmiko_file_transfer' (task.py:98)
[04/20/2022 09:54:30 PM][paramiko.transport][DEBUG] starting thread (client mode): 0xbb0d2b50 (transport.py:1697)
[04/20/2022 09:54:30 PM][paramiko.transport][DEBUG] Local version/idstring: SSH-2.0-paramiko_2.8.10 (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] Remote version/idstring: 'SSH-2.0-Cisco-2.0' (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][INFO] Connected (version 2.0, client Cisco-2.0) (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] kex follows? False
kex algos: ['ecdh-sha2-nistp521', 'ecdh-sha2-nistp384', 'ecdh-sha2-nistp256', 'diffie-hellman-group14-sha1']
server key: ['ecdsa-sha2-nistp521', 'rsa-sha2-512', 'rsa-sha2-256', 'ssh-rsa']
client encrypt: ['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'aes128-gcm@openssh.com', 'aes256-gcm@openssh.com']
server encrypt: ['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'aes128-gcm@openssh.com', 'aes256-gcm@openssh.com']
client mac: ['hmac-sha2-512', 'hmac-sha2-256', 'hmac-sha1']
server mac: ['hmac-sha2-512', 'hmac-sha2-256', 'hmac-sha1']
client lang: b''
server lang: b''
client compress: ['none']
server compress: ['none'] (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] Kex agreed: ecdh-sha2-nistp256 (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] HostKey agreed: ecdsa-sha2-nistp521 (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] Cipher agreed: aes128-ctr (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] MAC agreed: hmac-sha2-256 (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] Compression agreed: none (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] kex engine KexNistp256 specified hash_algo <built-in function openssl_sha256> (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] Switch to new keys ... (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] Adding ecdsa-sha2-nistp521 host key for 10.235.42.1: 8615d22180cdd3bfe994f3e21fd73a9a (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] userauth is OK (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][INFO] Auth banner: b'\n\n****************************************************************************\n .\n****************************************************************************\n\n\n' (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] Authentication type (none) not permitted. (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] Allowed methods: ['password', 'publickey', 'keyboard-interactive'] (transport.py:1697)
[04/20/2022 09:54:32 PM][paramiko.transport][DEBUG] Trying discovered key 67c7cddec6c06dc9fb5e15fff06b2024 in /home/test/.ssh/id_rsa (transport.py:1697)
[04/20/2022 09:55:02 PM][paramiko.transport][DEBUG] Trying discovered key 67c7cddec6c06dc9fb5e15fff06b2024 in /home/test/.ssh/id_rsa (transport.py:1697)
[04/20/2022 09:55:02 PM][paramiko.transport][DEBUG] EOF in transport thread (transport.py:1697)
[04/20/2022 09:55:02 PM][paramiko.transport][DEBUG] Trying interactive auth (transport.py:1697)
[04/20/2022 09:55:02 PM][nornir.core.task][ERROR] Host 'lab-vars-r2n1-sk': task 'netmiko_file_transfer' failed with traceback:
Traceback (most recent call last):
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/netmiko/base_connection.py", line 1022, in establish_connection
self.remote_conn_pre.connect(**ssh_connect_params)
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/paramiko/client.py", line 368, in connect
self._auth(
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/paramiko/client.py", line 691, in _auth
raise saved_exception
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/paramiko/client.py", line 684, in _auth
self._transport.auth_interactive_dumb(username)
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/paramiko/transport.py", line 1520, in auth_interactive_dumb
return self.auth_interactive(username, handler, submethods)
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/paramiko/transport.py", line 1493, in auth_interactive
raise SSHException('No existing session')
paramiko.ssh_exception.SSHException: No existing session
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/nornir/core/task.py", line 99, in start
r = self.task(self, **self.params)
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/nornir_netmiko/tasks/netmiko_file_transfer.py", line 24, in netmiko_file_transfer
net_connect = task.host.get_connection(CONNECTION_NAME, task.nornir.config)
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/nornir/core/inventory.py", line 494, in get_connection
self.open_connection(
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/nornir/core/inventory.py", line 546, in open_connection
conn_obj.open(
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/nornir_netmiko/connections/netmiko.py", line 59, in open
connection = ConnectHandler(**parameters)
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/netmiko/ssh_dispatcher.py", line 344, in ConnectHandler
return ConnectionClass(*args, **kwargs)
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/netmiko/base_connection.py", line 434, in __init__
self._open()
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/netmiko/base_connection.py", line 439, in _open
self.establish_connection()
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/netmiko/cisco/cisco_xr.py", line 11, in establish_connection
super().establish_connection(width=width, height=height)
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/netmiko/base_connection.py", line 1067, in establish_connection
raise NetmikoTimeoutException(msg)
netmiko.exceptions.NetmikoTimeoutException: Paramiko: 'No existing session' error: try increasing 'conn_timeout' to 15 seconds or larger.
(task.py:115)
[04/20/2022 09:55:02 PM][nornir.core.task][ERROR] Host 'lab-vars-r2n1-sk': task 'copy_rsa_keys_to_device' failed with traceback:
Traceback (most recent call last):
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/nornir/core/task.py", line 99, in start
r = self.task(self, **self.params)
File "pnb_aaa_config.py", line 323, in copy_rsa_keys_to_device
task.run(
File "/home/test/git/PNB/PNB_internal/PNB_Scripts/pnb-config-audit/r2_accounts/venv/lib/python3.8/site-packages/nornir/core/task.py", line 174, in run
raise NornirSubTaskError(task=run_task, result=r)
nornir.core.exceptions.NornirSubTaskError: Subtask: netmiko_file_transfer (failed)
@m1009d How are you trying to authenticate?
Are you trying to use SSH keys or a username/password?
Hi @ktbyers , yes I am using ssh keys
In [15]: nr.inventory.groups['lab'].connection_options['netmiko'].dict()
Out[15]:
{'extras': {'use_keys': True,
'ssh_strict': False,
'timeout': 60,
'session_timeout': 60,
'auth_timeout': 60,
'blocking_timeout': 60,
'banner_timeout': 60,
'global_delay_factor': 3,
'conn_timeout': 30,
'session_log': '/tmp/my_out.txt',
'ssh_config_file': '/home/test/.ssh/config',
'key_file': '/home/test/.ssh/id_rsa'},
'hostname': None,
'port': None,
'username': None,
'password': None,
'platform': None}
@m1009d Does you setup work if you don't use threading? i.e. if you set the Nornir runner to serial does it work or still fail?
This is not being actively worked on so closing.
Hi!
Recently I encountered a issue with netmiko and nornir which I cannot seems to get rid of. I was researching across all the posts in here but nothing seems to be helping me. Here is the issue.
I have a hosts file with 150 devices all being Dell_OS9 switches. I created a script that does basic:
paging
:In this configuration, when I run the script, I get bunch of
paramiko.ssh_exception.SSHException: No existing session
andnetmiko.ssh_exception.NetmikoTimeoutException: Paramiko: 'No existing session' error: try increasing 'conn_timeout' to 10 seconds or larger.
As you can see, I have a conn_timeout on 30 seconds but log file is still suggesting me using a timeout... Also what I noticed is that this happens when I run the script onnum_workers:50
but when I go down to 5, I start to see some successfull, some fail with No existing session and some fail with Negotiation failed.paramiko.ssh_exception.SSHException: Negotiation failed.
.When I turn on logging I can see alot of
DEBUG:paramiko.transport:Got server p (8192 bits)
lines under each other but it never gets the connection settled.Looks to me like it stops on that
Got server p (8192 bits)
and after that, connection is closed and new one is opened... I also caught lines likeINFO:paramiko.transport:Disconnect (code 2): % Error: Login failure.
andERROR:paramiko.transport:Exception (client): Error reading SSH protocol banner[Errno 9] Bad file descriptor
before which seemed to me like it could be connected to my problem.This is an example of how the log for a device in logfile looks:
I have all newest versions of nornir, netmiko and paramiko installed and I have been trying to play with the various timeouts but nothing seems to help me figure out where this problem is coming from...
Any idea what could it be? Thank you in advance