napalm-automation / napalm-salt

Modules for event-driven network automation and orchestration using Salt
Apache License 2.0
127 stars 36 forks source link

Grains missing/lost when keep_alive is False and connection is closed. #69

Open TheBirdsNest opened 3 years ago

TheBirdsNest commented 3 years ago

Opening this as I believe it to be a bug/undesirable state.

When refresh grains or any sync activity on the proxy (saltutil.sync_all) grains populated by Napalm are reset to 'None' when the connection to the device has closed. This occurs when the proxy setting keep_alive:False is applied.

Instead, what should happen is if a grains refresh is requested, the connection should be restarted and grains re-collected.

proxy.sls:

proxy:
  global_delay_factor: 5
  proxytype: napalm
  driver: ios
  always_alive: False

Workflow:

/run # salt -I 'realm:WSC' grains.get model
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    model:
        C891-24X/K9

/run # salt 'c40988df-1a86-4ff0-bf8d-e018cc6c55bd' saltutil.refresh_grains
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    True

/run # salt 'c40988df-1a86-4ff0-bf8d-e018cc6c55bd' grains.get model
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    None

Proxy log:

[ERROR   ] Cannot execute "cli" on <redacted> as Salt. Reason: Socket is closed!
[ERROR   ] Traceback (most recent call last):
  File "/usr/lib/python3.9/site-packages/napalm/ios/ios.py", line 205, in _send_command
    output = self.device.send_command(command)
  File "/usr/lib/python3.9/site-packages/netmiko/utilities.py", line 500, in wrapper_decorator
    return func(self, *args, **kwargs)
  File "/usr/lib/python3.9/site-packages/netmiko/base_connection.py", line 1475, in send_command
    prompt = self.find_prompt(delay_factor=delay_factor)
  File "/usr/lib/python3.9/site-packages/netmiko/base_connection.py", line 1179, in find_prompt
    self.write_channel(self.RETURN)
  File "/usr/lib/python3.9/site-packages/netmiko/base_connection.py", line 459, in write_channel
    self._write_channel(out_data)
  File "/usr/lib/python3.9/site-packages/netmiko/base_connection.py", line 417, in _write_channel
    self.remote_conn.sendall(write_bytes(out_data, encoding=self.encoding))
  File "/usr/lib/python3.9/site-packages/paramiko/channel.py", line 846, in sendall
    sent = self.send(s)
  File "/usr/lib/python3.9/site-packages/paramiko/channel.py", line 801, in send
    return self._send(s, m)
  File "/usr/lib/python3.9/site-packages/paramiko/channel.py", line 1198, in _send
    raise socket.error("Socket is closed")
OSError: Socket is closed

Please note I applied the fix for the bug noted here: https://github.com/saltstack/salt/issues/60025

Versions Report:

/run # salt-proxy --versions-report
Salt Version:
          Salt: 3003

Dependency Versions:
          cffi: 1.14.5
      cherrypy: 18.6.1
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 3.0.1
       libgit2: 1.1.0
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 1.0.2
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: 2.20
      pycrypto: 3.10.1
  pycryptodome: 3.10.1
        pygit2: 1.4.0
        Python: 3.9.5 (default, May 12 2021, 20:44:22)
  python-gnupg: Not Installed
        PyYAML: 5.4.1
         PyZMQ: 19.0.2
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.4

System Versions:
          dist: alpine 3.14.0
        locale: utf-8
       machine: x86_64
       release: 5.4.72-microsoft-standard-WSL2
        system: Linux
       version: Alpine Linux 3.14.0

Note: This is a lab setup but the same is seen in our production setup with the same version but running on CentOS.

Many thanks!

TheBirdsNest commented 3 years ago

P.S - the reason for us enabling keep_alive:False is due to a Cisco CVE where the implementation of SSH Global Requests is causing a FSM exception and crashing devices.

TheBirdsNest commented 3 years ago

I believe I have found the issue in salt.utilts.napalm package..

I can do a PR to the SaltStack repo if thats the right place for it?

>>> import napalm.base as napalm_base
>>> from napalm_base.exceptions import ConnectionClosedException
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'napalm_base'
>>>
>>>
>>>
>>> from napalm.base.exceptions import ConnectionClosedException

It appears naplam_base is never loaded (correctly?) so the HAS_CONN_CLOSED_EXC_CLASS which causes this reconnect clause to be ignored in the call() function:

elif retry and HAS_CONN_CLOSED_EXC_CLASS and isinstance(error, ConnectionClosedException):

Tested my manually importing:

from napalm.base.exceptions import ConnectionClosedException and its working well now:

/run # salt -I 'realm:WSC' grains.get model
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    C891-24X/K9

-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 1
# of minions returned: 1
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------
/run # salt -I 'realm:WSC' saltutil.refresh_grains
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    True

-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 1
# of minions returned: 1
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------
/run # salt -I 'realm:WSC' grains.get model
c40988df-1a86-4ff0-bf8d-e018cc6c55bd:
    C891-24X/K9

-------------------------------------------
Summary
-------------------------------------------
# of minions targeted: 1
# of minions returned: 1
# of minions that did not return: 0
# of minions with errors: 0
-------------------------------------------
/run #
TheBirdsNest commented 3 years ago

I created a related bug in the SaltStack project as I guess that is the right place for it: https://github.com/saltstack/salt/issues/60581 Feel free to close this one if needed otherwise I will on its resolution in the SaltStack repo.

I'm aiming on completing a PR for this.