saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.14k stars 5.47k forks source link

[BUG] Salt-minion pillars don't always refresh when they should #57180

Closed edgan closed 2 years ago

edgan commented 4 years ago

Description Salt-minion pillars don't always refresh when they should.

This is trickery to reproduce and seems to involve a race condition. The steps to reproduce have to be executed very fast.

I am pretty sure from testing that this is caused by changes in this PR, https://github.com/saltstack/salt/pull/54942 . I tried reverted the patches and the issue went away.

Steps to Reproduce the behavior Locally:

cd salt-pillars
mkdir test
echo "base:
  '*':
    - test" > top.sls
echo 'test' > test/init.sls ; git add test/init.sls ; git commit -m 'Testing' ; git push

Master:

/usr/bin/salt-run git_pillar.update ; /usr/bin/salt-run fileserver.update ; salt `hostname -f` state.apply base
test git@github.com:org/salt-grains:
    None
test git@github.com:org/salt-pillars:
    True
True
fqdn:
    Data failed to compile:
----------
    Pillar failed to render with the following messages:
----------
    SLS 'test' does not render to a dictionary

Locally:

echo 'test: a' > test/init.sls ; git add test/init.sls ; git commit -m 'Testing' ; git push

Master:

/usr/bin/salt-run git_pillar.update ; /usr/bin/salt-run fileserver.update ; salt `hostname -f` state.apply base
test git@github.com:org/salt-grains:
    None
test git@github.com:org/salt-pillars:
    True
True
fqdn:
    Data failed to compile:
----------
    Pillar failed to render with the following messages:
----------
    SLS 'test' does not render to a dictionary
salt `hostname -f` saltutil.refresh_pillar
fqdn:
    True
salt]$ salt `hostname -f` state.apply base
fqdn:
----------
          ID: hosts_file
    Function: file.managed
        Name: /etc/hosts
      Result: True
     Comment: File /etc/hosts is in the correct state
     Started: 17:16:12.353157
    Duration: 52.312 ms
     Changes:   

Summary for fqdn
-------------
Succeeded: 1
Failed:     0
-------------
Total states run:     1
Total run time:    1.230 s

Expected behavior The error on the first state.apply is expected. There is a pillar syntax error. The error on the second state.apply is not expected, because the syntax has been fixed. The git_pillar.update has pulled in new pillars as we can see by the True for salt-pillars. When we then refresh_pillar for that minion we get the correct result we should have gotten on the second state.apply.

Versions Report

Salt Version:
           Salt: 3000.2

Dependency Versions:
           cffi: Not Installed
       cherrypy: unknown
       dateutil: 2.6.1
      docker-py: 2.5.1
          gitdb: Not Installed
      gitpython: Not Installed
         Jinja2: 2.10
        libgit2: 0.26.0
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: 1.4.4
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: 3.4.7
         pygit2: 0.26.2
         Python: 3.6.9 (default, Apr 18 2020, 01:56:04)
   python-gnupg: 0.4.1
         PyYAML: 3.12
          PyZMQ: 16.0.2
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.5.3
            ZMQ: 4.2.5

System Versions:
           dist: Ubuntu 18.04 bionic
         locale: UTF-8
        machine: x86_64
        release: 5.3.0-1017-aws
         system: Linux
        version: Ubuntu 18.04 bionic
edgan commented 4 years ago

I have seen this go both ways, false positives and false negatives.Which makes sense, because it is a caching problem. What makes it really dangerous is you could merge bad pillars, even with testing, because you got a false positive.

garethgreenaway commented 4 years ago

@saltstack/team-core Any ideas on this one?

colttt commented 3 years ago

I guess it is somehow related to #31907 .. I've the same issue with salt, edit a pillar, push to git, did a refresh (also restart the minion) but it sill doesn't show the (new) pillar value..

salt -V
Salt Version:
          Salt: 3003

Dependency Versions:
          cffi: Not Installed
      cherrypy: Not Installed
      dateutil: 2.7.3
     docker-py: Not Installed
         gitdb: 2.0.5
     gitpython: 2.1.11
        Jinja2: 2.10
       libgit2: Not Installed
      M2Crypto: Not Installed
          Mako: Not Installed
       msgpack: 0.5.6
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     pycparser: Not Installed
      pycrypto: 2.6.1
  pycryptodome: 3.6.1
        pygit2: Not Installed
        Python: 3.7.3 (default, Jan 22 2021, 20:04:44)
  python-gnupg: Not Installed
        PyYAML: 3.13
         PyZMQ: 17.1.2
         smmap: 2.0.5
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.3.1

System Versions:
          dist: debian 10 buster
        locale: UTF-8
       machine: x86_64
       release: 4.19.0-16-amd64
        system: Linux
       version: Debian GNU/Linux 10 buster

when I run the salt minion in debug mode I got this: refresh pillar:

[INFO    ] User sudo_BOFH Executing command saltutil.refresh_pillar with jid 20210622082604472251
[DEBUG   ] Command details {'fun': 'saltutil.refresh_pillar', 'arg': [], 'tgt': 'salt-minion-1.bofh.home.de', 'jid': '20210622082604472251', 'ret': '', 'tgt_type': 'glob', 'user': 'sudo_BOFH'}
[DEBUG   ] Subprocess ProcessPayload-Job-20210622082604472251 added
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[INFO    ] Starting a new job 20210622082604472251 with PID 25374
[DEBUG   ] Could not LazyLoad direct_call.allow_missing_func: 'direct_call.allow_missing_func' is not available.
[DEBUG   ] LazyLoaded saltutil.refresh_pillar
[DEBUG   ] LazyLoaded event.fire
[DEBUG   ] SaltEvent PUB socket URI: /var/run/salt/minion/minion_event_41ef1aaeeb_pub.ipc
[DEBUG   ] SaltEvent PULL socket URI: /var/run/salt/minion/minion_event_41ef1aaeeb_pull.ipc
[DEBUG   ] Sending event: tag = pillar_refresh; data = {'_stamp': '2021-06-22T08:26:04.556245'}
[DEBUG   ] Minion of 'salt-test' is handling event tag 'pillar_refresh'
[DEBUG   ] Refreshing modules. Notify=False
[DEBUG   ] Grains refresh requested. Refreshing grains.
[DEBUG   ] Reading configuration from /etc/salt/minion
[DEBUG   ] Including configuration from '/etc/salt/minion.d/_schedule.conf'
[DEBUG   ] Reading configuration from /etc/salt/minion.d/_schedule.conf
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Closing IPCMessageClient instance
[DEBUG   ] Minion return retry timer set to 9 seconds (randomized)
[INFO    ] Returning information for job: 20210622082604472251
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://172.16.xxx.xx:4506
[DEBUG   ] Trying to connect to: tcp://172.16.xxx.xx:4506
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] minion return: {'success': True, 'return': True, 'retcode': 0, 'jid': '20210622082604472251', 'fun': 'saltutil.refresh_pillar', 'fun_args': []}
[DEBUG   ] Override  __utils__: <module 'salt.loaded.int.grains.zfs' from '/usr/lib/python3/dist-packages/salt/grains/zfs.py'>
[DEBUG   ] /etc/resolv.conf: The domain and search keywords are mutually exclusive.
[DEBUG   ] Unable to resolve address fe80::250:56ff:fe8c:1936: [Errno 1] Unknown host
[DEBUG   ] Elapsed time getting FQDNs: 0.10307979583740234 seconds
[DEBUG   ] LazyLoaded zfs.is_supported
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Refreshing pillar.
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506', 'aes')
[DEBUG   ] Re-using AsyncAuth for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://172.16.xxx.xx:4506
[DEBUG   ] Trying to connect to: tcp://172.16.xxx.xx:4506
[DEBUG   ] LazyLoaded config.merge
[DEBUG   ] LazyLoaded mine.update
[DEBUG   ] Subprocess ProcessPayload-Job-20210622082604472251 cleaned up
[DEBUG   ] salt.crypt.get_rsa_key: Loading private key
[DEBUG   ] Loaded minion key: /etc/salt/pki/minion/minion.pem
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] Refreshing matchers.
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Refreshing beacons.
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] SaltEvent PUB socket URI: /var/run/salt/minion/minion_event_41ef1aaeeb_pub.ipc
[DEBUG   ] SaltEvent PULL socket URI: /var/run/salt/minion/minion_event_41ef1aaeeb_pull.ipc
[DEBUG   ] Sending event: tag = /salt/minion/minion_pillar_refresh_complete; data = {'complete': True, '_stamp': '2021-06-22T08:26:05.363962'}
[DEBUG   ] Closing IPCMessageClient instance
[DEBUG   ] Minion of 'salt-test' is handling event tag '/salt/minion/minion_pillar_refresh_complete'

ask for pillars: (pillar.items)

[DEBUG   ] LazyLoaded glob_match.match
[INFO    ] User sudo_BOFH Executing command pillar.item with jid 20210622082832819002
[DEBUG   ] Command details {'fun': 'pillar.item', 'arg': [], 'tgt': 'salt-minion-1*', 'jid': '20210622082832819002', 'ret': '', 'tgt_type': 'glob', 'user': 'sudo_BOFH'}
[DEBUG   ] Subprocess ProcessPayload-Job-20210622082832819002 added
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[INFO    ] Starting a new job 20210622082832819002 with PID 25467
[DEBUG   ] Could not LazyLoad direct_call.allow_missing_func: 'direct_call.allow_missing_func' is not available.
[DEBUG   ] LazyLoaded pillar.item
[DEBUG   ] Minion return retry timer set to 8 seconds (randomized)
[INFO    ] Returning information for job: 20210622082832819002
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://172.16.xxx.xx:4506
[DEBUG   ] Trying to connect to: tcp://172.16.xxx.xx:4506
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] minion return: {'success': True, 'return': {}, 'retcode': 0, 'jid': '20210622082832819002', 'fun': 'pillar.item', 'fun_args': []}
[DEBUG   ] Subprocess ProcessPayload-Job-20210622082832819002 cleaned up

ask for specific pillar:

[INFO    ] User sudo_BOFH Executing command pillar.items with jid 20210622083204983233
[DEBUG   ] Command details {'fun': 'pillar.items', 'arg': ['roles'], 'tgt': 'salt-minion-1*', 'jid': '20210622083204983233', 'ret': '', 'tgt_type': 'glob', 'user': 'sudo_BOFH'}
[DEBUG   ] Subprocess ProcessPayload-Job-20210622083204983233 added
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] LazyLoaded jinja.render
[DEBUG   ] LazyLoaded yaml.render
[DEBUG   ] Using pkg_resources to load entry points
[DEBUG   ] Using pkg_resources to load entry points
[INFO    ] Starting a new job 20210622083204983233 with PID 25579
[DEBUG   ] Could not LazyLoad direct_call.allow_missing_func: 'direct_call.allow_missing_func' is not available.
[DEBUG   ] LazyLoaded pillar.items
[DEBUG   ] Minion return retry timer set to 6 seconds (randomized)
[INFO    ] Returning information for job: 20210622083204983233
[DEBUG   ] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506', 'aes')
[DEBUG   ] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'salt-minion-1.bofh.home.de', 'tcp://172.16.xxx.xx:4506')
[DEBUG   ] Connecting the Minion to the Master URI (for the return server): tcp://172.16.xxx.xx:4506
[DEBUG   ] Trying to connect to: tcp://172.16.xxx.xx:4506
[DEBUG   ] Closing AsyncZeroMQReqChannel instance
[DEBUG   ] minion return: {'success': True, 'return': {'roles': ''}, 'retcode': 0, 'jid': '20210622083204983233', 'fun': 'pillar.items', 'fun_args': ['roles']}
[DEBUG   ] Subprocess ProcessPayload-Job-20210622083204983233 cleaned up

actual behavior:

salt "salt-minion-1*" pillar.items roles
salt-minion-1.bofh.home.de:
    ----------
    roles:

expected behavior:

salt "salt-minion-1*" pillar.items roles
salt-minion-1.bofh.home.de:
    ----------
    roles: webserver