saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.1k stars 5.47k forks source link

When running the master in debug mode and exit, it gives a Traceback in 2016.3.3 #35612

Closed frogunder closed 6 years ago

frogunder commented 8 years ago

Description of Issue/Question

When running the master in debug mode and exit, it gives a Traceback when exiting.

2016-08-19 21:18:51,338 [salt.log.setup   ][ERROR   ][25305] An un-handled exception was caught by salt's global exception handler:
OSError: [Errno 3] No such process
Traceback (most recent call last):
  File "/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib64/python2.7/multiprocessing/util.py", line 315, in _exit_function
    p._popen.terminate()
  File "/usr/lib64/python2.7/multiprocessing/forking.py", line 171, in terminate
    os.kill(self.pid, signal.SIGTERM)
OSError: [Errno 3] No such process

Setup

(Please provide relevant configs and/or SLS files (Be sure to remove sensitive info).)

Steps to Reproduce Issue

Run the master in debug mode salt-master -ldebug Exit out of it CTRL-C

Versions Report

[root@li229-131 ~]# salt --versions-report
Salt Version:
           Salt: 2016.3.3

Dependency Versions:
           cffi: Not Installed
       cherrypy: Not Installed
       dateutil: Not Installed
          gitdb: Not Installed
      gitpython: Not Installed
          ioflo: Not Installed
         Jinja2: 2.7.2
        libgit2: Not Installed
        libnacl: Not Installed
       M2Crypto: 0.21.1
           Mako: Not Installed
   msgpack-pure: Not Installed
 msgpack-python: 0.4.7
   mysql-python: Not Installed
      pycparser: Not Installed
       pycrypto: 2.6.1
         pygit2: Not Installed
         Python: 2.7.5 (default, Jun 17 2014, 18:11:42)
   python-gnupg: Not Installed
         PyYAML: 3.11
          PyZMQ: 15.3.0
           RAET: Not Installed
          smmap: Not Installed
        timelib: Not Installed
        Tornado: 4.2.1
            ZMQ: 4.1.4

System Versions:
           dist: centos 7.0.1406 Core
        machine: x86_64
        release: 4.6.3-x86_64-linode70
         system: Linux
        version: CentOS Linux 7.0.1406 Core
tmehlinger commented 8 years ago

This has been happening for a long time and it's not specific to debug mode. If you look in the master log when you stop the master, you'll see this happens all the time.

It's happening because of the shenanigans done to juggle signal handlers in the process module (https://github.com/saltstack/salt/blob/develop/salt/utils/process.py#L697). Basically every process spawned gets assigned SIG_DFL, which for SIGINT, SIGTERM, and SIGQUIT, terminates the process. Furthermore, the master calls os.setsid() when it daemonizes, making itself and all its children members of a process group, meaning any signal sent to the parent process will be propagated to all children. The end result is the children die before the master can clean up.

The fix for this would be to assign a custom handler (to either ignore signals, to communicate receipt of a signal to the parent process, do some cleanup, or a combination thereof).

cachedout commented 7 years ago

We did have this happening for a while but I can't replicate this against the 2016.3.3 tag at all. Is this happening consistently or just on occasion?

tmehlinger commented 7 years ago

It's consistent in 2016.3.2. It's also pretty easy to reproduce, just fire up the master in the foreground and then ^C it.

I can give 2016.3.3 a shot myself and let you know what happens.

tmehlinger commented 7 years ago

I just tested against a fresh 2016.3.3 install in Vagrant on Ubuntu 14.04. Still happens.

^C[DEBUG   ] ZeroMQReqServerChannel received a SIGINT. Exiting
[DEBUG   ] ZeroMQReqServerChannel received a SIGINT. Exiting
[DEBUG   ] EventPublisher received a SIGINT. Exiting
[DEBUG   ] Maintenance received a SIGINT. Exiting
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 862, in emit
[DEBUG   ] MWorker received a SIGINT. Exiting
[DEBUG   ] MWorker received a SIGINT. Exiting
[DEBUG   ] MWorker received a SIGINT. Exiting
[DEBUG   ] MWorker received a SIGINT. Exiting
Traceback (most recent call last):
  File "/usr/lib/python2.7/logging/__init__.py", line 862, in emit
[DEBUG   ] Reactor received a SIGINT. Exiting
    stream.write(ufs % msg)
    stream.write(ufs % msg)
IOError: [Errno 0] Error
Logged from file process.py, line 657
IOError: [Errno 0] Error
Logged from file process.py, line 657
[INFO    ] Some processes failed to respect the KILL signal: Process: <Process(ReqServer, started)> (Pid: 8958)
[INFO    ] kill_children retries left: 3
[DEBUG   ] Reactor received a SIGTERM. Exiting
[WARNING ] Master received a SIGINT. Exiting.
[INFO    ] The Salt Master is shut down
The salt master is shutdown. Master received a SIGINT. Exited.Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib/python2.7/multiprocessing/util.py", line 321, in _exit_function
    p._popen.terminate()
  File "/usr/lib/python2.7/multiprocessing/forking.py", line 171, in terminate
    os.kill(self.pid, signal.SIGTERM)
OSError: [Errno 3] No such process
Error in sys.exitfunc:
[ERROR   ] An un-handled exception was caught by salt's global exception handler:
OSError: [Errno 3] No such process
Traceback (most recent call last):
  File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib/python2.7/multiprocessing/util.py", line 321, in _exit_function
    p._popen.terminate()
  File "/usr/lib/python2.7/multiprocessing/forking.py", line 171, in terminate
    os.kill(self.pid, signal.SIGTERM)
OSError: [Errno 3] No such process
Traceback (most recent call last):
  File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
    func(*targs, **kargs)
  File "/usr/lib/python2.7/multiprocessing/util.py", line 321, in _exit_function
    p._popen.terminate()
  File "/usr/lib/python2.7/multiprocessing/forking.py", line 171, in terminate
    os.kill(self.pid, signal.SIGTERM)
OSError: [Errno 3] No such process

You can see pretty clearly that SIGINT is propagating to child processes.

stale[bot] commented 6 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.