saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.19k stars 5.48k forks source link

[BUG] salt-minion freeze #58159

Open nnovikov opened 4 years ago

nnovikov commented 4 years ago

Description Salt minion get stuck after restart or server reboot. Salt-master return 'Minion did not return. [No response]' when I try test.ping.

On minion: strace -p 5057 strace: Process 5057 attached futex(0x7f2d70714970, FUTEX_WAIT_PRIVATE, 2, NULL

Setup Debian GNU/Linux 9.11 (stretch) ii salt-common 2019.2.5+ds-1 all shared libraries that salt requires for all packages ii salt-minion 2019.2.5+ds-1 all client package for salt, the distributed remote execution system

Versions Report

salt-minion --versions-report
Salt Version:
           Salt: 2019.2.5

Dependency Versions:

           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: 2.5.3
      docker-py: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.9.4
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: Not Installed
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.5.6
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.5.3 (default, Sep 27 2018, 17:25:39)
   python-gnupg: Not Installed
         PyYAML: 3.12
          PyZMQ: 16.0.2
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.4.3
            ZMQ: 4.2.1

System Versions:
           dist: debian 9.11
         locale: UTF-8
        machine: x86_64
        release: 4.9.0-11-amd64
         system: Linux
        version: debian 9.11
nnovikov commented 4 years ago

    (gdb) py-bt
    Traceback (most recent call first):
      File "/usr/lib/python3.7/threading.py", line 296, in wait
        waiter.acquire()
      File "/usr/lib/python3.7/threading.py", line 552, in wait
        signaled = self._cond.wait(timeout)
      File "/usr/lib/python3.7/multiprocessing/pool.py", line 648, in wait
        self._event.wait(timeout)
      File "/usr/lib/python3.7/multiprocessing/pool.py", line 651, in get
        self.wait(timeout)
      File "/usr/lib/python3.7/multiprocessing/pool.py", line 268, in map
        return self._map_async(func, iterable, mapstar, chunksize).get()
      File "/usr/lib/python3/dist-packages/salt/modules/network.py", line 2111, in fqdns
        results = pool.map(_lookup_fqdn, addresses)
      File "/usr/lib/python3/dist-packages/salt/grains/core.py", line 2371, in fqdns
        opt = __salt__["network.fqdns"]()
      File "/usr/lib/python3/dist-packages/salt/loader.py", line 825, in grains
        ret = funcs[key]()
      File "/usr/lib/python3/dist-packages/salt/minion.py", line 1242, in __init__
        self.opts["grains"] = salt.loader.grains(opts)
      File "/usr/lib/python3/dist-packages/salt/minion.py", line 1066, in _create_minion_object
        jid_queue=jid_queue,
      File "/usr/lib/python3/dist-packages/salt/minion.py", line 1101, in _spawn_minions
        jid_queue=self.jid_queue,
      File "/usr/lib/python3/dist-packages/salt/minion.py", line 1162, in tune_in
        self._spawn_minions()
      File "/usr/lib/python3/dist-packages/salt/cli/daemons.py", line 352, in _real_start
        self.minion.tune_in()
      File "/usr/lib/python3/dist-packages/salt/cli/daemons.py", line 340, in start
        self._real_start()
    --Type <RET> for more, q to quit, c to continue without paging--
      File "/usr/lib/python3/dist-packages/salt/scripts.py", line 155, in minion_process
        minion.start()
      File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
        self._target(*self._args, **self._kwargs)
      File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
        self.run()
      File "/usr/lib/python3.7/multiprocessing/popen_fork.py", line 74, in _launch
        code = process_obj._bootstrap()
      File "/usr/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
        self._launch(process_obj)
      File "/usr/lib/python3.7/multiprocessing/context.py", line 277, in _Popen
        return Popen(process_obj)
      File "/usr/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "/usr/lib/python3.7/multiprocessing/process.py", line 112, in start
        self._popen = self._Popen(self)
      File "/usr/lib/python3/dist-packages/salt/scripts.py", line 224, in salt_minion
        process.start()
      File "/usr/bin/salt-minion", line 11, in <module>
        load_entry_point('salt==3001.1', 'console_scripts', 'salt-minion')()
    (gdb)
Martchus commented 1 year ago

This may be the same as the stale issue https://github.com/saltstack/salt/issues/55710. I have also noticed this problem for a while, most recently today. In my case the process is also stuck on a futex wait, see https://progress.opensuse.org/issues/131249#note-7. Unfortunately I haven't managed to install the required debug info on our system to produce a useful backtrace. We are currently using version 3006.0 as provided by openSUSE Leap 15.4 on Python 3.6.15. I have seen this problem on x86_64 and ppc64le hosts (so likely the architecture doesn't matter).

martchus@grenache-1:~> salt-minion --versions-report
Salt Version:
          Salt: 3006.0

Python Version:
        Python: 3.6.15 (default, Sep 23 2021, 15:41:43) [GCC]

Dependency Versions:
          cffi: 1.13.2
      cherrypy: Not Installed
      dateutil: 2.8.1
     docker-py: Not Installed
         gitdb: Not Installed
     gitpython: Not Installed
        Jinja2: 2.10.1
       libgit2: Not Installed
  looseversion: 1.0.2
      M2Crypto: 0.38.0
          Mako: Not Installed
       msgpack: 0.5.6
  msgpack-pure: Not Installed
  mysql-python: Not Installed
     packaging: 21.3
     pycparser: 2.17
      pycrypto: 3.9.0
  pycryptodome: Not Installed
        pygit2: Not Installed
  python-gnupg: Not Installed
        PyYAML: 5.4.1
         PyZMQ: 17.1.2
        relenv: Not Installed
         smmap: Not Installed
       timelib: Not Installed
       Tornado: 4.5.3
           ZMQ: 4.2.3

System Versions:
          dist: opensuse-leap 15.4 n/a
        locale: UTF-8
       machine: ppc64le
       release: 5.14.21-150400.24.63-default
        system: Linux
       version: openSUSE Leap 15.4 n/a