napalm-automation / napalm-base

Apache License 2.0
32 stars 48 forks source link

Add always_alive decorator #247

Closed mirceaulinic closed 7 years ago

mirceaulinic commented 7 years ago

Following discussion from https://github.com/napalm-automation/napalm-ios/pull/152

Defining this decorator which checks the status of the connection before trying to execute the actual method. If the transport layer is unusable, it will (re)open the session. It checks for the private property auto_reconnect which is set in the driver from the optional args (i.e. self.auto_reconnect = optional_args['auto_reconnect'] etc.). If not defined, will default it to False behavior and does not execute the alive check. Similarly, if the is_alive method is not defined, it will consider the connection as up and doesn't try to perform anything.

Usage example:

@always_alive
def get_lldp_neighbors(self):

I have been using a slightly different version of this decorator in production for almost one year, without seeing any issues.

@ktbyers and @dbarrosop I know you'll have objections, Kirk probably very strong, but please let's try to be constructive and provide something useful for the users.

kbirkeland commented 7 years ago

A LBYL approach is not ideal, at least the napalm-ios case. The first method to be called after the remote host closes the socket will still fail. The client will still think the connection is active until the next read or write after the close.

mirceaulinic commented 7 years ago

Hrm, @kbirkeland are you saying that is_alive still returns True even when the device killed the SSH connection? If that's the case, do you see any deterministic & lightweight implementation for is_alive to return the real status of the SSH session?

dbarrosop commented 7 years ago

I think this is a bad idea. For starters, using a decorator and leaving up to the implementor is just going to cause different behavior between drivers and even within the same driver. So, even from an implementation perspective I think this is not a good way of doing this.

Second, napalm is a library, it's not a tool. We don't have to solve use cases, we have to provide the means for people to solve their use cases. If salt requires a long connection we have to provide the tools for salt to detect the problem and correct it. We don't have to solve the problem for them. If tomorrow some other tool requires some other behavior we can not keep just solving each and all use cases as it's going to lead to buggy code and tech debt. So I personally think it's salt who should correct this. We give them an exception and/or a boolean they can check and reopen the connection if it's broken. I am sure this is easy to solve on whatever code it's calling napalm, it's just a try...except... block.

provide something useful for the users

Agreed, we should try to return a consistent exception if the session dies so they can catch it and handle it in any way they want; logging, reopening it, crashing the application, whatever. It's up to them. I don't want us to decide for them.

mirceaulinic commented 7 years ago

So I personally think it's salt who should correct this. We give them an exception and/or a boolean they can check and reopen the connection if it's broken. I am sure this is easy to solve on whatever code it's calling napalm, it's just a try...except... block.

I agree on this; both flag and exception sound good to me. I will submit a PR to introduce a new Exception that we can raise. I think the first driver using it would be napalm-ios -- @kbirkeland @ktbyers are you OK with that?

ktbyers commented 7 years ago

Yes, I agree with @dbarrosop that this should really be outside of NAPALM and that we should strive to provide consistent behavior across devices (i.e. exceptions and ways of checking status of connection).

@mirceaulinic Sounds good to me.

We can also potentially try to incorporate Exception into Netmiko so that we consistently raise things at that level.

kbirkeland commented 7 years ago

@mirceaulinic I'm good with this as well.

NAPALM looks like it already has a ConnectionException - would that be the correct one to use here?

mirceaulinic commented 7 years ago

@kbirkeland Let's create a different one, say ConnectionClosedError - ConnectionException is raised when the initial connection failed (during open). (yeah, great documentation, right?) :-)