napalm-automation / napalm-ios

Apache License 2.0
31 stars 40 forks source link

Allow parallel requests over the same SSH session #120

Closed mirceaulinic closed 7 years ago

mirceaulinic commented 7 years ago

Opening this PR for initial discussion. Ref: #119

Introducing a simple mutex mechanism to virtually lock the channel in the current SSH session. During this time, all subsequent requests will be waiting to be served. If there are any requests waiting more than the timeout specified during open(), CommandTimeoutException is raised. Request from other SSH sessions are not affected by this. This should not affect at all the existing behaviour, but it will widen the scope of the library.

I have tested on a 2811 and I confirm it works as expected.

mirceaulinic commented 7 years ago

@MatthiasGabriel most probably this branch will need some adjustments after Kirk's review. Meanwhile, would you be able to test it and confirm it works properly on your device? Thanks!

MatthiasGabriel commented 7 years ago

@mirceaulinic Thank you so much for your effort. Yea I'm able to test it on my device. I'll write another comment when I'm done with it.

MatthiasGabriel commented 7 years ago

@mirceaulinic First the good news. I got the parallel tasks working as well for the most cases, except some of the net.traceroute calls because they take to long and then the following tasks run into the timeout... totally intended of course :) I'll discuss this issue with my co-worker but I think we can live with that for the moment or we have to increase the timeout or decrease the timeout of the traceroute but thats another problem :)

One additional thing I'd like to mention is the codesnippet

def _lock_ssh_session(self, start):
        while (not self._ssh_session_locker.acquire(False)
            and not self._timeout_exceeded(start, 'Waiting to acquire the SSH session!')):
                # will wait here till the SSH channel is free again and ready to receive new requests
                # if stays too much, _timeout_exceeded will raise CommandTimeoutException
                pass  # do nothing, just wait
        return True  # ready to go now

Wouldn't it be a little bit nicer if we'd use the time.sleep method e.g. time.sleep(0.1) or even time.sleep(0.01) instead of pass just to safe some cpu-cycles?

mirceaulinic commented 7 years ago

I got the parallel tasks working as well for the most cases

Great!

except some of the net.traceroute calls because they take to long and then the following tasks run into the timeout... totally intended of course :) I'll discuss this issue with my co-worker but I think we can live with that for the moment or we have to increase the timeout or decrease the timeout of the traceroute but thats another problem :)

Yes, unfortunately that will be the case when you'll issue few parallel requests over the same session and one takes very long. Again, this is because we don't have available any mechanism to assign an unique ID and then able to identify what reply from the device corresponds to what command. So some of them will simply need to be excluded and the user re-try.

Wouldn't it be a little bit nicer if we'd use the time.sleep method e.g. time.sleep(0.1) or even time.sleep(0.01) instead of pass just to safe some cpu-cycles?

Yes, thanks for suggestion!

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.2%) to 66.342% when pulling 6520e75e7729738a7c1a77f31829abfb5ea8b4d6 on fix-119 into b123c8b26a663fe4d6b32b92ae70882ae47b8e88 on develop.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.2%) to 66.342% when pulling 6520e75e7729738a7c1a77f31829abfb5ea8b4d6 on fix-119 into b123c8b26a663fe4d6b32b92ae70882ae47b8e88 on develop.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+0.2%) to 66.342% when pulling 6520e75e7729738a7c1a77f31829abfb5ea8b4d6 on fix-119 into b123c8b26a663fe4d6b32b92ae70882ae47b8e88 on develop.

ktbyers commented 7 years ago

@mirceaulinic Since this is screen-scraping, we can probably increase the time.sleep(1) to larger increments.

I will need to experiment with this some.

The main question that I have is--should this locking mechanism be rolled into the Netmiko instead of being in NAPALM and then have NAPALM call to the Netmiko lock.

We also probably need to consider whether there are any operations like SCP that also need to lock.

ktbyers commented 7 years ago

Per conversation with @mirceaulinic I think we are going to integrate this directly into Netmiko.