saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Install Salt from the Salt package repositories here:
https://docs.saltproject.io/salt/install-guide/en/latest/
Apache License 2.0
14.19k stars 5.48k forks source link

[BUG] Minion process did not stop properly on PKI failure #57264

Closed afischer-opentext-com closed 1 year ago

afischer-opentext-com commented 4 years ago

Description

One of our salt minions ran into the following issue today.

2020-05-14 07:29:14,601 [salt.crypt       :725 ][CRITICAL][2332] The Salt Master has rejected this minion's public key!
To repair this issue, delete the public key for this minion on the Salt Master and restart this minion.
Or restart the Salt Master in open mode to clean out the keys. The Salt Minion will now exit.

Given the setup of that salt setup, the situation may be expected and a restart would fix the failure.

The issue now is that the python process did not completely exit. A test.ping to the minion no more worked, however, the scsm.exe and the python process continued to execute, leading to the situation that a configured automatic restart did not take.

The log message is written directly before a python sys.exit() call, so the issue is not directly visible. The question is now if there may be known shutdown hooks of the python process which may have lead to this situation? We would like to further analyse the situation but need a tip where to begin.

Steps to Reproduce the behavior

Unclear

Expected behavior

The minion process should always properly exit in case of a fatal/critical situation.

Screenshots

2020-05-14_10h04_34

  1. Log message
  2. Running Windows process
  3. Configured restart behavior**
  4. Python process in ProcessExplorer

Versions Report

C:\salt>salt-call --versions-report
Salt Version:
           Salt: 2018.3.4

Dependency Versions:
           cffi: 1.10.0
       cherrypy: 10.2.1
       dateutil: 2.6.1
      docker-py: Not Installed
          gitdb: 2.0.5
      gitpython: 2.1.3
          ioflo: Not Installed
         Jinja2: 2.9.6
        libgit2: Not Installed
        libnacl: 1.6.1
       M2Crypto: Not Installed
           Mako: 1.0.6
   msgpack-pure: Not Installed
 msgpack-python: 0.4.8
   mysql-python: Not Installed
      pycparser: 2.17
       pycrypto: 2.6.1
   pycryptodome: Not Installed
         pygit2: Not Installed
         Python: 3.5.3 (v3.5.3:1880cb95a742, Jan 16 2017, 16:02:32) [MSC v.1900 64 bit (AMD64)]
   python-gnupg: 0.4.1
         PyYAML: 3.12
          PyZMQ: 16.0.3
           RAET: Not Installed
          smmap: 2.0.5
        timelib: 0.2.4
        Tornado: 4.5.1
            ZMQ: 4.1.6

System Versions:
           dist:
         locale: cp1252
        machine: AMD64
        release: 2016Server
         system: Windows
        version: 2016Server 10.0.14393 SP0 Multiprocessor Free
DmitryKuzmenko commented 4 years ago

@afischer-opentext-com thank you for report. The behavior in the case minion's answered it's key is rejected the minion prints the message about it and exits after 10-20 seconds. Since you've set auto-restart, minion is getting restarted and this happens again. I'm sorry it's not quite clear for me is it a fresh install or an updated minion? So I can't say how the minion key got to the rejected condition. Did you checked salt-key output on master side? Can you confirm the minion key is in rejected state and what if you accept it with salt-key --accept <minion-id> --include-rejected?

Ch3LL commented 1 year ago

Closing due to inactivity. If this is still an issue on the latest version of Salt, please open a new issue with the details.