napalm-automation / napalm-ansible

Apache License 2.0
245 stars 103 forks source link

Netmiko4 read_timeout default is too short for IOS commit_conifg operation #197

Closed jifox closed 2 years ago

jifox commented 2 years ago

Hello,

I get an error when executing the following command after 1 minutes, 36.05 seconds what is far less than the timeout, conn_timeout and read_timeout values. Until now I ran into this only when commit_changes is True in paybook.

Because I'm not shure which arguments are passed to netmiko I duplicated the settings in the netmiko dict.

Even auto_rollback_on_error is true, the new switch config is updated after the error is displayed.

I updated the python libraries and this is currently installed:

pip list | grep napalm
napalm                  4.0.0
napalm-ansible          1.1.0

In the shell I also used this export statements to ensure that the ansible timeouts are set

export ANSIBLE_PERSISTENT_CONNECT_TIMEOUT=300
export ANSIBLE_PERSISTENT_COMMAND_TIMEOUT=300
- name: Set Configuration - Check-Mode if do_commit is not defined
  napalm_install_config:
    config_file: "{{ managed_config_dest }}"
    commit_changes: "{{ is_commit_now }}"
    replace_config: true
    get_diffs: true
    diff_file: "{{ managed_config_dest }}.diff"
    provider: "'ios"
    timeout: 200
    optional_args: 
      debug: true
      netmiko: 
        timeout: 180
        conn_timeout: 180
        auth_timeout: 30  # Timeout to wait for authentication response
        banner_timeout: 20
        read_timeout": 200
      timeout: 180
      conn_timeout: 180
      auth_timeout: 30  # Timeout to wait for authentication response
      banner_timeout: 20
      read_timeout": 200
      auto_rollback_on_error: "{{ auto_rollback_on_error }} }"
  register: result
  tags: [print_action]

Playbook output after: 1 minutes, 36.05 seconds == 96,05sec

# Set Configuration - Check-Mode if do_commit is not defined **************************************************************
  * ATKPACSE002                             - FAILED!!! -----------------------------------------------------------------
    cannot install config: 
    Pattern not detected: '(?:[>##]\\s*$|.*all username(redacted) in output.

    Things you might try to fix this:
    1. Explicitly set your pattern using the expect_string argument.
    2. Increase the read_timeout to a larger value.

    You can also look at the Netmiko session_log or debug log for more information.

The pattern is proofed to be correct because napalm_get_facts module is executed without problems before in this playbook.

How do I increase the read timeout?

ktbyers commented 2 years ago

@jifox

Which version of Netmiko do you have installed.

I assume the device is Cisco IOS or IOS-XE and that you have your ansible_network_os set to ios in Ansible inventory?

jifox commented 2 years ago

@ktbyers

Yes, the device is a C9300 IOS-XE device and ansible_network_os is set to ios

pip list | grep netmiko
netmiko                 4.1.2

python --version
Python 3.9.10

ansible --version
ansible [core 2.12.8]
  config file = /home/ansible/net-automation/ansible.cfg
  configured module search path = ['/home/ansible/.cache/pypoetry/virtualenvs/net-automation-qUL_uJSX-py3.9/lib/python3.9/site-packages/napalm_ansible']
  ansible python module location = /home/ansible/.cache/pypoetry/virtualenvs/net-automation-qUL_uJSX-py3.9/lib/python3.9/site-packages/ansible
  ansible collection location = /home/ansible/net-automation/collections
  executable location = /home/ansible/.cache/pypoetry/virtualenvs/net-automation-qUL_uJSX-py3.9/bin/ansible
  python version = 3.9.10 (main, Mar  7 2022, 07:16:32) [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
  jinja version = 3.1.2
  libyaml = True
ktbyers commented 2 years ago

Do you get the same error if you do the following?

- name: Set Configuration - Check-Mode if do_commit is not defined
  napalm_install_config:
    config_file: "{{ managed_config_dest }}"
    commit_changes: true         # I am assuming you are committing the change
    replace_config: true       
    get_diffs: true
    diff_file: "{{ managed_config_dest }}.diff"
  register: result
  tags: [print_action]
jifox commented 2 years ago

No, the same error occurs. I had to add the provider: "ios" to the parameters

ktbyers commented 2 years ago

Maybe look at the Netmiko session_log and see if it gives you more information on what is failing. It is probably:

- name: Set Configuration - Check-Mode if do_commit is not defined
  napalm_install_config:
    config_file: "{{ managed_config_dest }}"
    commit_changes: true         # I am assuming you are committing the change
    replace_config: true       
    get_diffs: true
    diff_file: "{{ managed_config_dest }}.diff"
    platform: "ios"        # shouldn't really need to do this, but that is a separate issue
    optional_args:
        session_log: "output.txt"
  register: result
  tags: [print_action]

Only run this against a single device. I think this will create a file named "output.txt" in the directory that you run Ansible from. Just let me know if this doesn't work.

If it works, it would be interesting to see what is going on in the "output.txt" file (by "work" I mean this file is created and contains the session contents).

jifox commented 2 years ago

Here are the results

output.txt

ktbyers commented 2 years ago

And did you still get an error the last time it ran? The output looks like it all ran properly?

jifox commented 2 years ago

Yes, the same error

ktbyers commented 2 years ago

Okay, can you run your playbook with -vvv so it outputs a full exception stack trace?

jifox commented 2 years ago

First run with -vvv --> no Error Second run started

jifox commented 2 years ago

2nd run no error

jifox commented 2 years ago

I'll try again without -vvv and see if the error throws again

jifox commented 2 years ago

Withot -vvv parameter the error is there again, strange.

ktbyers commented 2 years ago

How long does this command take to execute (if you execute it manually on the CLI)?

Make sure the flash:/candidate_config.txt file is the right file (has the right contents) before doing the above (as this command is going to load that file into the running-config.

configure replace flash:/candidate_config.txt force revert trigger error
jifox commented 2 years ago

I have no idea why specifying -vvv will prohibit the error

jifox commented 2 years ago

I tried again with just -v: No Error without -v: Also no Error now.

jifox commented 2 years ago

without -v: Error

jifox commented 2 years ago

I'm testing now with -vvv a few times

jifox commented 2 years ago

Got the traceback now:

# Set Configuration - Check-Mode if do_commit is not defined **************************************************************
  * ATKPACSE002                             - FAILED!!! -----------------------------------------------------------------
    cannot install config: 
    Pattern not detected: '(?:[>##]\\s*$|.*all username(redacted) in output.

    Things you might try to fix this:
    1. Explicitly set your pattern using the expect_string argument.
    2. Increase the read_timeout to a larger value.

    You can also look at the Netmiko session_log or debug log for more information.

    File "/tmp/ansible_napalm_install_config_payload_6nugjdr6/ansible_napalm_install_config_payload.zip/ansible/modules/napalm_install_config.py", line 325, in main
      File "/home/ansible/.cache/pypoetry/virtualenvs/net-automation-qUL_uJSX-py3.9/lib/python3.9/site-packages/napalm/ios/ios.py", line 559, in commit_config
        output = self._commit_handler(cmd)
      File "/home/ansible/.cache/pypoetry/virtualenvs/net-automation-qUL_uJSX-py3.9/lib/python3.9/site-packages/napalm/ios/ios.py", line 467, in wrapper
        return f(self, *args, **kwargs)
      File "/home/ansible/.cache/pypoetry/virtualenvs/net-automation-qUL_uJSX-py3.9/lib/python3.9/site-packages/napalm/ios/ios.py", line 484, in _commit_handler
        output = self.device.send_command(cmd, expect_string=patterns)
      File "/home/ansible/.cache/pypoetry/virtualenvs/net-automation-qUL_uJSX-py3.9/lib/python3.9/site-packages/netmiko/utilities.py", line 592, in wrapper_decorator
        return func(self, *args, **kwargs)
      File "/home/ansible/.cache/pypoetry/virtualenvs/net-automation-qUL_uJSX-py3.9/lib/python3.9/site-packages/netmiko/base_connection.py", line 1721, in send_command
        raise ReadTimeout(msg)

# STATS *******************************************************************************************************************
ATKPACSE002    : ok=190 changed=1   failed=1    unreachable=0   rescued=0   ignored=0
ktbyers commented 2 years ago

@jifox Okay, let me look at it.

ktbyers commented 2 years ago

It would still be interesting to know this:

How long does this command take to execute (if you execute it manually on the CLI)?

Make sure the flash:/candidate_config.txt file is the right file (has the right contents) before doing the above (as this command is going to load that file into the running-config.

configure replace flash:/candidate_config.txt force revert trigger error
jifox commented 2 years ago

I will enable ansible.posix.profile_tasks callback, than it should be display the duration.

jifox commented 2 years ago

Set Configuration - Check-Mode if do_commit is not defined **

So the commands duration is 32sec

ktbyers commented 2 years ago

Try (I added read_timeout_override into optional_args below):

- name: Set Configuration - Check-Mode if do_commit is not defined
  napalm_install_config:
    config_file: "{{ managed_config_dest }}"
    commit_changes: true         # I am assuming you are committing the change
    replace_config: true       
    get_diffs: true
    diff_file: "{{ managed_config_dest }}.diff"
    platform: "ios"        # shouldn't really need to do this, but that is a separate issue
    optional_args:
        read_timeout_override: 90.0
  register: result
  tags: [print_action]

Let me know if that works.

jifox commented 2 years ago

I'm still testing...

jifox commented 2 years ago

It looks like that solved the problem.

@ktbyers Tanks for support

ktbyers commented 2 years ago

Okay, let's leave this issue open as this should be fixed in NAPALM.

jifox commented 2 years ago

The device was a stack of 5 chassis Catalyst 9300-48P

ktbyers commented 2 years ago

Fixed here:

https://github.com/napalm-automation/napalm/pull/1744