napalm-automation / napalm-ansible

Apache License 2.0
245 stars 103 forks source link

napalm_install_config timeout with Arista HTTPS API #165

Closed lvrfrc87 closed 4 years ago

lvrfrc87 commented 4 years ago

As per description, here the provider with tuned timers to increase timeout. Works fine with napalm_get_facts. Please note I am running in virtualenv

eos_auth:
  persistent_command_timeout: 180
  persistent_connect_timeout : 180
  timeout: 180
  hostname: l1a.r5b1.ams7.nee.tmcs
  username: username
  dev_os: eos
  password: password
  optional_args:
    port: 443
    transport: https

Here the playbook:

      - name: diff between running-conf/render and create backup of running-conf.
        napalm_install_config:
          config_file: './configurations/{{ inventory_hostname }}/renders/{{ config_file_name }}.cfg'
          commit_changes: False
          replace_config: False
          get_diffs: True
          archive_file: './configurations/{{ inventory_hostname }}/backups/{{ config_file_name }}.bak'
          diff_file: './configurations/{{ inventory_hostname }}/diff/{{ config_file_name }}.diff'
          provider: "{{ eos_auth }}"

And here the logs:


            "archive_file": null,
            "candidate_file": null,
            "commit_changes": false,
            "config": null,
            "config_file": "./configurations/l1a.r5b1.ams7.nee.tmcs/renders/20200204_225450.cfg",
            "dev_os": "eos",
            "diff_file": "./configurations/l1a.r5b1.ams7.nee.tmcs/diff/20200204_225450.diff",
            "get_diffs": true,
            "hostname": "l1a.r5b1.ams7.nee.tmcs",
            "optional_args": {
                "port": 443,
                "transport": "https"
            },
            "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
            "persistent_command_timeout": 180,
            "persistent_connect_timeout": 180,
            "provider": {
                "dev_os": "eos",
                "hostname": "l1a.r5b1.ams7.nee.tmcs",
                "optional_args": {
                    "port": 443,
                    "transport": "https"
                },
                "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER",
                "persistent_command_timeout": 180,
                "persistent_connect_timeout": 180,
                "timeout": 180,
                "username": "prd2204.svc"
            },
            "replace_config": false,
            "timeout": 60,
            "username": "prd2204.svc"
        }
    }
}

MSG:

cannot load config: Socket error during eAPI connection: The read operation timed out

Ansible 2.8.8 napalm 2.5.0 napalm-ansible 1.0.0 pyeapi 0.8.3

ktbyers commented 4 years ago

These won't be used at all and are not supported by napalm-ansible:

  persistent_command_timeout: 180
  persistent_connect_timeout : 180

You would have to look in the optional_args for EOS on NAPALM and see what NAPALM and PyEAPI support.

I did a quick look at the NAPALM and it looks like we support passing in generic PyEAPI arguments:

https://github.com/napalm-automation/napalm/blob/develop/napalm/eos/eos.py#L155

So it really is a question of what is supported by PyEAPI for in the context that we are using it.

lvrfrc87 commented 4 years ago

@ktbyers Thanks for your reply. I' ll dig into PyEAPI library. I noticed though that napalm support timeout=self.timeout, Even though I set to 180 seconds, the connection get disconnected after 30 seconds sharp

lvrfrc87 commented 4 years ago

If I am reading the code right, Arista Library and it supports timeout=60 arguments. This is might explain why the default timeout override of 30 seconds is override to 60 (?)

https://github.com/arista-eosplus/pyeapi/blob/2d14de25c73bcd02e52c1473578f44e179ddbf90/pyeapi/client.py#L393

Saying that, even passing timeout=180 as optional_args I still have 60 seconds of connection max

  optional_args:
    port: 80
    transport: http
    timeout: 180

I have also tried to use http and set the following in ansible.cfg with no luck

[persistent_connection]
connect_timeout = 180
command_timeout = 180
lvrfrc87 commented 4 years ago

I have manged to override the timeout adding timeout: 180 under the task. After further troubleshooting, I can see that there is HTTPs connection, however packets are not sent/received.

   User              Requests       Bytes in       Bytes out    Last hit
----------------- -------------- -------------- --------------- --------------
   prd2204.svc       456            530905         3856176      12 seconds ago
   User              Requests       Bytes in       Bytes out    Last hit
----------------- -------------- -------------- --------------- ---------------
   prd2204.svc       456            530905         3856176      171 seconds ago
lvrfrc87 commented 4 years ago

With the support of Arista (see https://github.com/arista-eosplus/pyeapi/issues/184) seems that we found the issue:

"Napalm playbook worked with no issue until I enabled authorisation via TACACS+. The way how the validation works in this case, every single command has to be validated at the TACACS+ server. This means intensive communication between the switch and server and it takes a lot of time.

In my lab, in order to validate your configuration ansible spent 3 min 55 sec. I guess your thoughts around napalm timeouts were correct, just needed a bit more tweaking. To back up my theory, you can do a packet capture on the interface through which TACACS+ server is available and track the progress that way.

On the other hand, initial spin of your playbook executes in 23sec - I guess you might look into the options of either having local user just for napalm or disable authorisation for specific command type/users (if possible, although I didn't do a research on this one yet)."

However:

"Not sure what Napalm is doing in the background but doing config-replace with eos_config and other modules works just fine. This is what I am doing now and it works great. I will raise with Napalm guys and see what they say."

      - name: render base.j2 template.
        local_action: template src="base.j2" dest="./configurations/{{ inventory_hostname }}/renders/{{ config_file_name }}.cfg"

      - name: diff the running-config against a master config.
        eos_config:
          diff_against: intended
          intended_config: "{{ lookup('file', './configurations/{{ inventory_hostname }}/renders/{{ config_file_name }}.cfg') }}"

      - name: backup running-config.
        eos_config:
          backup: yes
          backup_options:
            filename: "{{ config_file_name }}.bak"
            dir_path: "./configurations/{{ inventory_hostname }}/backups/"

      # TO DO - Delete old files in Arista
      - name: copy via scp rendered file into Arista.
        delegate_to: 127.0.0.1
        command: scp -i ~/.ssh/id_rsa.pub ./configurations/{{ inventory_hostname }}/renders/{{ config_file_name }}.cfg prd2204.svc@{{ inventory_hostname }}:/tmp/

      - name: config-replace.
        eos_command:
          commands:
            - configure session {{ config_file_name }}
            - configure replace file:/tmp/{{ config_file_name }}.cfg

      - name: save config.
        eos_config:
          save_when: always
ktbyers commented 4 years ago

@FedericoOlivieri I guess I am not following--is there more to do here or are you just saying that 'AAA command authorization' broke the automation and there is no follow-up actions.

Yes, you could do something like the above using eos_config (i.e. basically re-implement the NAPALM patterns but using the ansible-core modules in some way.

lvrfrc87 commented 4 years ago

@ktbyers when I use config replace with NAPALM on Arista with aaa authorization commands enabled, each single command sent by NAPALM is checked against AAA. So, a config replace takes 5 minutes or more. Same config replace using eos_config with AAA enabled required just few seconds

ktbyers commented 4 years ago

@FedericoOlivieri Okay, I think it is it because they SCP the file and then load the SCP file as opposed to adding the commands into the configure session directly (which is what we do).

I think the implication of this is that AAA would be bypassed in the file load mechanism? That is potentially a bit of a security issue on Arista's part... (i.e. it is fast because they are not actually evaluating the individual configuration commands and are thus bypassing your AAA). I guess bypass in a sense...you would still have to be authorized for the configure replace from a file.

Anyways I don't think we would change that in NAPALM using Secure Copy generally causes a set of other issues and AAA-authorization is generally not used (though it is definitely not a fringe case either).

Is your set of configuration changes very large?

FWIW, using AAA-authorization is a big pain for automation (i.e. it will probably cause you meaningful automation pain in the long-run).

lvrfrc87 commented 4 years ago

I do full config replace. We worked around disabling the authorization commands side of AAA