ktbyers / netmiko

Multi-vendor library to simplify Paramiko SSH connections to network devices
MIT License
3.55k stars 1.3k forks source link

Telnet - excepting NetmikoTimeoutException but not getting one #2988

Open Toxic-Waste- opened 1 year ago

Toxic-Waste- commented 1 year ago

Hi all,

apologies for making a second ticket in 2 days, I am working on a project on improving the exception handling, and it's the second time I run into weird behavior (I think).

To simplify the test, I want to get a decent output when a device is offline, so I created a new small test to this:

router3 = {
        'device_type': 'huawei_telnet',
        'host': '<snip - offline IP>',
        'username': '<snip>',
        'password': '<snip>',
        'session_log': '/home/<snip>/devops/logs/test/netmiko_session.log'
    }

try:
    c800_connection = netmiko.ConnectHandler(**router3, conn_timeout=10)
except netmiko.NetmikoAuthenticationException:
    print("Login Failure")
except netmiko.NetmikoTimeoutException:
    print("Router connection timed out")
except Exception as e:
    print("general error " + str(e))

When I run this script against a router that is online, I get the netmikoauthenticationexception correctly:

<snip>@jumphost:~/devops/test-development/network/projects/firmware_tool$ python3 test.py
Login Failure

However, when I run the script against an IP that is offline, I would except a NetmikoTimeoutException, but I get the general exception:

<snip?>@jumphost:~/devops/test-development/network/projects/firmware_tool$ python3 test.py
general error timed out

I would except that after 10 seconds, netmiko would return a NetmikoTimeoutException? Or am I implementing this incorrectly? Behavior seems to be the same whether I use huawei_telnet or cisco_ios_telnet.

ktbyers commented 1 year ago

@Toxic-Waste- Can you get rid of your exception handling and let the original exception happen (so I can see the stack trace and line numbers).

Alternatively, you can just add the following:

except Exception as e:
    print("general error " + str(e))
    raise

So that the original exception is re-raised.

ktbyers commented 1 year ago

You also might want to look into this Netmiko 4.x feature since it is similar to the pattern you are trying to implement.

https://pynet.twb-tech.com/blog/netmiko/ConnLogOnly.html

Toxic-Waste- commented 1 year ago

@ktbyers , that is where I got my inspiration, but I will have to see how to implement it :) I will run into both exceptions from time to time, so I want to log them properly

I added the raise and ran the command again:

Traceback (most recent call last):
  File "/home/<snip>/devops/test-development/network/projects/firmware_tool/test.py", line 29, in <module>
    c800_connection = netmiko.ConnectHandler(**router3, conn_timeout=10)
  File "/home/<snip>/.local/lib/python3.10/site-packages/netmiko/ssh_dispatcher.py", line 365, in ConnectHandler
    return ConnectionClass(*args, **kwargs)
  File "/home/<snip>/.local/lib/python3.10/site-packages/netmiko/base_connection.py", line 439, in __init__
    self._open()
  File "/home/<snip>/.local/lib/python3.10/site-packages/netmiko/base_connection.py", line 444, in _open
    self.establish_connection()
  File "/home/<snip>/.local/lib/python3.10/site-packages/netmiko/base_connection.py", line 1029, in establish_connection
    self.remote_conn = telnetlib.Telnet(
  File "/usr/lib/python3.10/telnetlib.py", line 218, in __init__
    self.open(host, port, timeout)
  File "/usr/lib/python3.10/telnetlib.py", line 235, in open
    self.sock = socket.create_connection((host, port), timeout)
  File "/usr/lib/python3.10/socket.py", line 845, in create_connection
    raise err
  File "/usr/lib/python3.10/socket.py", line 833, in create_connection
    sock.connect(sa)
TimeoutError: timed out
ktbyers commented 1 year ago

The general issue is you are using telnet and I haven't done much work on making telnet behavior consistent with SSH behavior (since it is only used very infrequently and really shouldn't be used).

So yep it is raising a telnetlib TimeoutError as opposed to a NetmikoTimeoutException (though I probably should change that for both SSH and telnet as it really isn't the most logical exception for a didn't connect to anything at ip_address/port).

ktbyers commented 1 year ago

@Toxic-Waste- ConnLogOnly code should just be:

router3 = {
        'device_type': 'huawei_telnet',
        'host': '<snip - offline IP>',
        'username': '<snip>',
        'password': '<snip>',
        'session_log': '/home/<snip>/devops/logs/test/netmiko_session.log'
    }

c800_connection = netmiko.ConnLogOnly(**router3)
if c800_connection is None:
    # decide what to do if the connection completely failed
    pass
Toxic-Waste- commented 1 year ago

The general issue is you are using telnet and I haven't done much work on making telnet behavior consistent with SSH behavior (since it is only used very infrequently and really shouldn't be used).

So yep it is raising a telnetlib TimeoutError as opposed to a NetmikoTimeoutException (though I probably should change that for both SSH and telnet as it really isn't the most logical exception for a didn't connect to anything at ip_address/port).

I get that, sorry for making it more difficult, outphasing telnet is also on the to do list, but it will probably also be with a netmiko script, so that is a bit of a catch 22 ;D

@Toxic-Waste- ConnLogOnly code should just be:

router3 = {
        'device_type': 'huawei_telnet',
        'host': '<snip - offline IP>',
        'username': '<snip>',
        'password': '<snip>',
        'session_log': '/home/<snip>/devops/logs/test/netmiko_session.log'
    }

c800_connection = netmiko.ConnLogOnly(**router3)
if c800_connection is None:
    # decide what to do if the connection completely failed
    pass

I will try the ConnLogOnly approach aswell, thanks for the example! :)

ktbyers commented 1 year ago

@Toxic-Waste- No worries at all, but you can also just keep doing what you are doing (it is just that some of the exceptions are going to fall into your 'except Exception as e:` category.

Or alternatively, you could import the specific TimeoutError exception that is happening for that nothing at that IP address case and handle it explicitly.

except TimeoutError:
    # whatever you want to do in this case

Note, you probably would need to temporarily get rid of your general exception handler (i.e. your except Exception as e code and see what the specific exception it is and which library is raising it).

Toxic-Waste- commented 1 year ago

yeah, I will get rid off that Exception as e, I figured it was a smart way to show all errors in the log, but the reality is that I actually have some issues with my scripts because of it.

Is there any way to raise the TimeoutError faster? I tried to do this with the conn_timeout=10 parameter, but since it doesn't raise the proper exception there (or it's difficult to do so), it doesn;t seem to be followed, and the timeout sometimes takes up to 1-2 minutes. Not the worst thing in the world, but would be handy to make it smaller.

This also gives me an extra push to get our migrate telnet out of the picture project on the rails :)

Thanks for the feedback!

ktbyers commented 1 year ago

Try setting the timeout=10 argument to ConnectHandler. This actually isn't used in that many places, but I think it actually controls how long telnet will wait trying to connect.

We probably should flag that as a bug and switch that to conn_timeout as that is more logical.

Toxic-Waste- commented 1 year ago

Try setting the timeout=10 argument to ConnectHandler. This actually isn't used in that many places, but I think it actually controls how long telnet will wait trying to connect.

We probably should flag that as a bug and switch that to conn_timeout as that is more logical.

this works like a charm, so I will use it like this for now, thanks for the help! :)

Don't hesitate to let me know if I can help test something