saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.1k stars 5.47k forks source link

manage.present still reports `lost` minion #43936

Closed oeuftete closed 6 years ago

oeuftete commented 6 years ago

Description of Issue/Question

manage.present, manage.alived, etc. (the docs are unhelpful on the distinctions, #22386) continue to report the presence of a minion that is lost. I believe this is a distinct issue from #33466, where from the best I can there was never a salt/presence/change event for the lost minion? In any case, that issue appears to be marked as fixed in 2017.7 with zmq transport, which is what I'm using.

Setup

A very vanilla master configuration with presence_events: True and log_level: debug. Four minions (manager[12], worker[12]) with pre-seeded keys. All brought up with vagrant.

Steps to Reproduce Issue

Start the hosts. Once everything is up, remove a minion instance with prejudice. (I used vagrant halt worker2.)

Looking at the salt/presence events on the master:

2017-10-05 17:31:44,739 [salt.utils.event ][DEBUG   ][13452] Sending event: tag = salt/presence/present; data = {'_stamp': '2017-10-05T17:31:44.739073', 'present': []}
2017-10-05 17:32:44,917 [salt.utils.event ][DEBUG   ][13452] Sending event: tag = salt/presence/change; data = {'new': ['worker1', 'worker2', 'manager2', 'manager1'], '_stamp': '2017-10-05T17:32:44.917237', 'lost': []}
[...]
2017-10-05 17:36:45,830 [salt.utils.event ][DEBUG   ][13452] Sending event: tag = salt/presence/present; data = {'_stamp': '2017-10-05T17:36:45.830761', 'present': ['worker1', 'worker2', 'manager2', 'manager1']}
2017-10-05 17:37:46,147 [salt.utils.event ][DEBUG   ][13452] Sending event: tag = salt/presence/change; data = {'new': [], '_stamp': '2017-10-05T17:37:46.147167', 'lost': ['worker2']}
2017-10-05 17:37:46,148 [salt.utils.event ][DEBUG   ][13452] Sending event: tag = salt/presence/present; data = {'_stamp': '2017-10-05T17:37:46.148408', 'present': ['worker1', 'manager2', 'manager1']}
2017-10-05 17:38:46,369 [salt.utils.event ][DEBUG   ][13452] Sending event: tag = salt/presence/present; data = {'_stamp': '2017-10-05T17:38:46.369541', 'present': ['worker1', 'manager2', 'manager1']}
2017-10-05 17:39:46,596 [salt.utils.event ][DEBUG   ][13452] Sending event: tag = salt/presence/present; data = {'_stamp': '2017-10-05T17:39:46.596123', 'present': ['worker1', 'manager2', 'manager1']}
[...]
salt-run --out text -l quiet manage.present
['manager1', 'manager2', 'worker1', 'worker2']

Versions Report

Salt Version:
           Salt: 2017.7.1

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.8
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 2.7.12 (default, Nov 19 2016, 06:48:10)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.2.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.4.3
            ZMQ: 4.1.4

System Versions:
           dist: Ubuntu 16.04 xenial
         locale: UTF-8
        machine: x86_64
        release: 4.4.0-83-generic
         system: Linux
        version: Ubuntu 16.04 xenial
Ch3LL commented 6 years ago

just to be clear what transport are you using?

Could you possibly be running into: https://github.com/saltstack/salt/issues/38367 ?

oeuftete commented 6 years ago

I'm using the default transport, so I guess that's zeromq (not zmq as I said in the summary).

It does sound like I'm running into #38367, though I never would have found that myself... thanks! If I'm reading that one right, the fact that include_localhost=True is in the argument list in the manager runner still (see https://github.com/saltstack/salt/blob/v2017.7.1/salt/runners/manage.py#L250) suggests that it is almost certainly the same issue.

oeuftete commented 6 years ago

I can confirm that #38367 is the problem I'm seeing. I have a branch I'll try to get in order for a pull request: https://github.com/oeuftete/salt/tree/fix-manage-runner-presence.

oeuftete commented 6 years ago

As far as I'm concerned this can be closed, as it was fixed by #43994.