Latency Issue Cause various errors - increased global_delay_factor

sincerywaing commented 7 years ago

Description of Issue/Question

large latency (>350ms) cause deployment failure. We have to change the global_delay_factor to 7 that fix the issue.

Did you follow the steps from https://github.com/napalm-automation/napalm#faq

[* ] Yes
[ ] No

Setup

large latency environment

napalm-ios version

(Paste verbatim output from pip freeze | grep napalm-ios between quotes below)

0.7.0

IOS version

(Paste verbatim output from show version between quotes below)

any ios

Steps to Reproduce the Issue

large latency environment.

Error Traceback

(Paste the complete traceback of the exception between quotes below)

if latency <150ms, there is no issue.

if latency is >300ms or even >350ms, there are various errors. increase global_delay_factor to a large value would help.

An error occurred in dynamically determining remote file system: dir -sw1#
dis-sw1#
dis-sw1#

or

Unexpected output from check_file_exists

sincerywaing commented 7 years ago

purely using netmiko can reproduce the issue, seems to be delay_factor related:

In [13]: output = conn.send_command('dir', delay_factor=2)

In [14]: print output
Directory of flash:/

    2  -rwx        1756   Jan 2 2006 08:02:04 +08:00  vlan.dat
    3  -rwx    14570368  Nov 25 1994 10:50:17 +08:00  c2960s-universalk9-mz.150-2.SE8.bin
    5  -rwx        4120   Jul 7 2017 12:02:08 +08:00  multiple-fs
    4  -rwx        3825   Jul 7 2017 12:02:08 +08:00  private-config.text
    6  -rwx           0   Jul 7 2017 15:01:27 +08:00  merge_config.txt
  587  -rwx        7952   Jul 7 2017 12:02:08 +08:00  config.text
    7  drwx         512   Mar 1 1993 08:14:58 +08:00  c2960s-universalk9-mz.122-55.SE5
  588  -rwx        9332   Jul 7 2017 15:01:22 +08:00  candidate_config.txt

57931776 bytes total (27957248 bytes free)

In [15]: output = conn.send_command('dir')

In [16]: print output
-sw1#

but adding global_delay_factor=2 can't fix the issue during load_merge_candidate.

Unexpected output from check_file_exists

or

'NoneType' object has no attribute 'group'

I added up to 7 that fixed the issue.

Is there a way to dynamically adjust this?

sincerywaing commented 7 years ago

I used a ping function to measure the latency and divide by 50 to determine the global_delay_factor looks to be a good solution. close this one for now.

napalm-automation / napalm-ios