saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.19k stars 5.48k forks source link

Infinite loop: salt-proxy with napalm driver when can not connect to the 'proxy device' #48017

Closed blefeuvr closed 6 years ago

blefeuvr commented 6 years ago

Description of Issue/Question

I hesitated a lot between issuing this here or in napalm-salt github. But I will start this here. Setup is quiet simple I have a proxy pillar cisco.sls:

proxy:
  proxytype: napalm
  driver: ios
  host: <host>
  username: <user>
  password: <passord>
  multiprocessing: False

A pillar top file :

  'cisco1':
    - cisco

And then I try to launch the proxy minion cisco1 process from a minion :

salt-proxy --proxyid=cisco1 -l debug

And here comes the issue, if there is a problem somewhere, like:

The process enter in an infinite loop trying to refresh grains, failing because device is not connected, and just trying everything anew.

Part of debug:

[DEBUG   ] Grains refresh requested. Refreshing grains.
[DEBUG   ] Reading configuration from /etc/salt/proxy
[DEBUG   ] Please install 'virt-what' to improve results of the 'virtual' grain.
[DEBUG   ] Unable to derive osmajorrelease from osrelease_info '(u'proxy',)'. The osmajorrelease grain will not be set.
[DEBUG   ] LazyLoaded napalm.get_device
[ERROR   ] Cannot execute "get_facts" on [unspecified hostname] as . Reason: not connected!
[ERROR   ] Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/utils/napalm.py", line 158, in call
    raise Exception('not connected')
Exception: not connected

[DEBUG   ] dummy proxy __virtual__() called...
[INFO    ] nxos proxy __virtual__() called...
[DEBUG   ] rest_sample proxy __virtual__() called...
[INFO    ] ssh_sample proxy __virtual__() called...
[DEBUG   ] Could not LazyLoad napalm.grains: 'napalm.grains' is not available.
[DEBUG   ] Setting up NAPALM connection
[CRITICAL] Unexpected error while connecting to salt.lan.edificom.ch
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 991, in _connect_minion
    yield minion.connect_master(failed=failed)
  File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/usr/lib/python2.7/dist-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 1063, in run
    yielded = self.gen.throw(*exc_info)
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 1182, in connect_master
    yield self._post_master_init(master)
  File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 1055, in run
    value = future.result()
  File "/usr/lib/python2.7/dist-packages/tornado/concurrent.py", line 238, in result
    raise_exc_info(self._exc_info)
  File "/usr/lib/python2.7/dist-packages/tornado/gen.py", line 1069, in run
    yielded = self.gen.send(value)
  File "/usr/lib/python2.7/dist-packages/salt/minion.py", line 3578, in _post_master_init
    proxy_init_fn(self.opts)
  File "/usr/lib/python2.7/dist-packages/salt/proxy/napalm.py", line 183, in init
    NETWORK_DEVICE.update(salt.utils.napalm.get_device(opts))
  File "/usr/lib/python2.7/dist-packages/salt/utils/napalm.py", line 332, in get_device
    network_device.get('DRIVER').open()
  File "/usr/local/lib/python2.7/dist-packages/napalm/ios/ios.py", line 125, in open
    **self.netmiko_optional_args)
  File "/usr/local/lib/python2.7/dist-packages/netmiko/ssh_dispatcher.py", line 178, in ConnectHandler
    return ConnectionClass(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/netmiko/base_connection.py", line 207, in __init__
    self.establish_connection()
  File "/usr/local/lib/python2.7/dist-packages/netmiko/base_connection.py", line 693, in establish_connection
    raise NetMikoTimeoutException(msg)
NetMikoTimeoutException: Connection to device timed-out: cisco_ios 10.255.225.212:22
[DEBUG   ] Connecting to master. Attempt 1 of 1

Versions Report

minion :

pip freeze
asn1crypto==0.24.0
backports-abc==0.5
bcrypt==3.1.4
certifi==2018.1.18
cffi==1.11.5
chardet==3.0.4
check-mk-web-api==1.3
cryptography==2.1.4
enum34==1.1.6
future==0.16.0
futures==3.2.0
idna==2.6
ipaddress==1.0.17
Jinja2==2.10
jtextfsm==0.3.1
junos-eznc==2.1.8
keyring==10.6.0
keyrings.alt==3.0
lxml==4.2.1
Mako==1.0.7
MarkupSafe==1.0
msgpack==0.5.6
napalm==2.3.1
napalm-base==1.0.0
napalm-fortios==0.4.1
napalm-ios==0.8.1
napalm-iosxr==0.5.6
napalm-junos==0.12.1
napalm-panos==0.5.2
napalm-pluribus==0.5.1
napalm-vyos==0.1.5
ncclient==0.5.3
netaddr==0.7.19
netmiko==2.1.1
pan-python==0.13.0
paramiko==2.4.1
ply==3.11
psutil==5.4.2
pyasn1==0.4.3
pycparser==2.18
pycrypto==2.6.1
pyeapi==0.8.2
pyfg==0.50
pygobject==3.26.1
pyIOSXR==0.53
PyNaCl==1.2.1
pynxos==0.0.3
pyOpenSSL==17.5.0
pyPluribus==0.3.1
pyserial==3.4
python-apt==1.6.0
python-dateutil==2.6.1
pyxdg==0.25
PyYAML==3.12
pyzmq==16.0.2
requests==2.18.4
requests-toolbelt==0.8.0
salt==2018.3.0
scp==0.11.0
SecretStorage==2.3.1
singledispatch==3.4.0.3
six==1.11.0
systemd-python==234
textfsm==0.4.1
tornado==4.5.3
urllib3==1.22
VyattaConfParser==0.5.1
xmltodict==0.11.0

master :

Salt Version:
           Salt: 2018.3.0

Dependency Versions:
           cffi: 1.11.5
       cherrypy: 3.5.0
       dateutil: 2.4.2
      docker-py: Not Installed
          gitdb: 0.6.4
      gitpython: 1.0.1
          ioflo: Not Installed
         Jinja2: 2.10
        libgit2: 0.24.0
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: 1.0.3
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: 2.18
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: 0.24.0
         Python: 2.7.12 (default, Dec  4 2017, 14:50:18)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.2.0
           RAET: Not Installed
          smmap: 0.9.0
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: Ubuntu 16.04 xenial
         locale: UTF-8
        machine: x86_64
        release: 4.4.0-127-generic
         system: Linux
        version: Ubuntu 16.04 xenial
gtmanfred commented 6 years ago

I am not sure this is a bug. I think this would be the expected behavior. Especially for not being able to connect, because you could eventually be able to connect. It could be just that the device is down and rebooting, etc.

And if you pip install napalm when it wasn't installed, then it would try to connect, without having to restart the system.

There is also a scenario where the proxy minion starts up, and the pillars haven't been rendered and it takes a try or two before the proxy minion gets the information to then connect.

So in my opinion, this is by design.

@cro @mirceaulinic opinions?

Thanks, Daniel

cro commented 6 years ago

I agree, I think it's not a bug. It is ugly, granted, but these devices can be recalcitrant and the safest thing to do to try to get a reconnect is what we are doing right now.

blefeuvr commented 6 years ago

OK thanks a lot for the answer, makes things clearer for me, at least I know this is the way it is normally going. I close, thanks again for your awesome reactivity.

TheBirdsNest commented 4 years ago

We have an issue where we have a significant number of devices down at at one time (these are devices that are used ad-hoc and when not required are turned off), this causes our Salt master to be overloaded as the proxy process restarts.. Any ideas to resolve this scenario?