saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
14.09k stars 5.47k forks source link

[BUG] Shutting down salt-minion when updating via pkg.install from salt-master #64990

Closed MAH69IK closed 1 year ago

MAH69IK commented 1 year ago

Description Hello!

I have some servers, all on Debian 11. I'm trying to update salt-minions from 3005.1+ds-1 to 3005.2+ds-1 (the problem has also happened before, with other versions). My salt-master is 3005.2+ds-1. When I use on salt-master this command: pkg.install only_upgrade=True salt-minion (also for pkg.install only_upgrade=True pkgs='["salt-common", "salt-minion"]') my salt-minion shutdown.

This is state before update:

$ sudo systemctl status salt-minion.service 
● salt-minion.service - The Salt Minion
     Loaded: loaded (/lib/systemd/system/salt-minion.service; enabled; vendor preset: enabled)
     Active: active (running)

salt-minion active, good.

This is proccesses during update:

$ ps fauxww | grep [s]alt
root      382051  0.0  0.2  45088 19400 ?        Ss   фев16   0:00 /usr/bin/python3 /usr/bin/salt-minion
root      382055  0.1  0.6 522060 53384 ?        Sl   фев16 264:26  \_ /usr/bin/python3 /usr/bin/salt-minion
root     3420777  0.6  0.7 534696 59368 ?        S    01:15   0:00      \_ /usr/bin/python3 /usr/bin/salt-minion
root     3420785  2.2  0.6  62208 54680 ?        S    01:15   0:00          \_ /usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold -o DPkg::Options::=--force-confdef --only-upgrade install salt-common salt-minion
root     3420844  0.0  0.0   2480   448 pts/1    S+   01:15   0:00                  \_ /bin/sh /var/lib/dpkg/info/salt-minion.postinst configure 3005.1+ds-1
root     3420865  0.0  0.0   2480   452 pts/1    S+   01:15   0:00                      \_ /bin/sh /usr/sbin/invoke-rc.d salt-minion restart
root     3420872  0.0  0.0   7164  1132 pts/1    S+   01:15   0:00                          \_ systemctl restart salt-minion.service

As you can see, systemctl tried to restart salt-minion, but it failed:

$ sudo systemctl status salt-minion.service 
● salt-minion.service - The Salt Minion
     Loaded: loaded (/lib/systemd/system/salt-minion.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Sat 2023-08-12 01:16:50 EEST; 39s ago
$ sudo journalctl -u salt-minion.service
авг 12 01:15:14 graylog systemd[1]: Stopping The Salt Minion...
авг 12 01:15:14 graylog salt-minion[382055]: [WARNING ] Minion received a SIGTERM. Exiting.
авг 12 01:15:14 graylog salt-minion[382055]: The Salt Minion is shutdown. Minion received a SIGTERM. Exited.
авг 12 01:16:44 graylog systemd[1]: salt-minion.service: State 'stop-sigterm' timed out. Killing.
авг 12 01:16:44 graylog systemd[1]: salt-minion.service: Killing process 382051 (salt-minion) with signal SIGKILL.
авг 12 01:16:44 graylog systemd[1]: salt-minion.service: Main process exited, code=killed, status=9/KILL
авг 12 01:16:44 graylog systemd[1]: salt-minion.service: Failed with result 'timeout'.
авг 12 01:16:44 graylog systemd[1]: salt-minion.service: Unit process 382055 (salt-minion) remains running after unit stopped.
авг 12 01:16:44 graylog systemd[1]: salt-minion.service: Unit process 3420777 (salt-minion) remains running after unit stopped.
авг 12 01:16:44 graylog systemd[1]: Stopped The Salt Minion.
авг 12 01:16:44 graylog systemd[1]: salt-minion.service: Consumed 4h 31min 661ms CPU time.
авг 12 01:16:44 graylog systemd[1]: salt-minion.service: Found left-over process 382055 (salt-minion) in control group while starting unit. Ignoring.
авг 12 01:16:44 graylog systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
авг 12 01:16:44 graylog systemd[1]: salt-minion.service: Found left-over process 3420777 (salt-minion) in control group while starting unit. Ignoring.
авг 12 01:16:44 graylog systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
авг 12 01:16:44 graylog systemd[1]: Starting The Salt Minion...
авг 12 01:16:44 graylog systemd[1]: Started The Salt Minion.
авг 12 01:16:45 graylog salt-minion[3420907]: The Salt Minion is shutdown.
авг 12 01:16:50 graylog systemd[1]: salt-minion.service: Main process exited, code=exited, status=1/FAILURE
авг 12 01:16:50 graylog systemd[1]: salt-minion.service: Failed with result 'exit-code'.

This is from minion log:

$ sudo cat /var/log/salt/minion
2023-08-12 01:15:14,376 [salt.utils.parsers:1060][WARNING ][382055] Minion received a SIGTERM. Exiting.

Setup Please be as specific as possible and give set-up details.

Steps to Reproduce the behavior

sudo salt --state-output=changes '...' pkg.install only_upgrade=True salt-minion

Expected behavior salt-minion remain running after update.

Versions Report Master:

salt --versions-report ```yaml Salt Version: Salt: 3005.2 Dependency Versions: cffi: Not Installed cherrypy: Not Installed dateutil: 2.8.1 docker-py: Not Installed gitdb: 4.0.5 gitpython: 3.1.14 Jinja2: 2.11.3 libgit2: Not Installed M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.0 msgpack-pure: Not Installed mysql-python: Not Installed pycparser: Not Installed pycrypto: Not Installed pycryptodome: 3.9.7 pygit2: Not Installed Python: 3.9.2 (default, Feb 28 2021, 17:03:44) python-gnupg: Not Installed PyYAML: 5.3.1 PyZMQ: 20.0.0 smmap: 4.0.0 timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.4 System Versions: dist: debian 11 bullseye locale: utf-8 machine: x86_64 release: 5.10.0-10-amd64 system: Linux version: Debian GNU/Linux 11 bullseye ```
dmurphy18 commented 1 year ago

@MAH69IK Salt used to have a problem using Salt to upgrade Salt, due to chicken and egg problems, in that Salt is active installing itself and has issues with restarting itself. This is supposed to have been fixed.

However if you do a pkg.upgrade and Salt is in the list, it lose the connection to the minion. It comes back after the new version starts. Here are some SLS files to handle minion upgrades, that you can try.

root@tdeb11:/srv/salt# cat chk_upgrade.sls 
pkg.refresh_db:
   module.run

at:
  pkg.latest:
    - only_upgrade: True
    - refresh: True
    - pkgs:
      - 'salt-minion'

at is running:
  service.running:
    - name: at
    - enable: True

salt_restart:
  at.present:
    - timespec: 'now +1 min'
    - job: 'salt-call --local pkg.latest_version salt-minion'
    - tag: saltrestart
root@tdeb11:/srv/salt#

It worked in upgrading from Salt 3005.1 to 3005.2, did get an error running the state file

local:
----------
          ID: pkg.refresh_db
    Function: module.run
      Result: True
     Comment: Module function pkg.refresh_db executed
     Started: 12:43:46.875668
    Duration: 1414.245 ms
     Changes:   
              ----------
              ret:
                  ----------
                  http://deb.debian.org/debian bullseye InRelease:
                      None
                  http://deb.debian.org/debian bullseye-updates InRelease:
                      None
                  http://security.debian.org/debian-security bullseye-security InRelease:
                      None
                  https://repo.saltproject.io/py3/debian/11/amd64/3005 bullseye InRelease:
                      None
----------
          ID: at
    Function: pkg.latest
      Result: True
     Comment: The following packages were successfully installed/upgraded: salt-minion
     Started: 12:43:49.817916
    Duration: 14646.384 ms
     Changes:   
              ----------
              salt-common:
                  ----------
                  new:
                      3005.2+ds-1
                  old:
                      3005.1+ds-1
              salt-minion:
                  ----------
                  new:
                      3005.2+ds-1
                  old:
                      3005.1+ds-1
----------
          ID: at is running
    Function: service.running
        Name: at
      Result: False
     Comment: The named service at is not available
     Started: 12:44:04.478655
    Duration: 22.064 ms
     Changes:   
----------
          ID: salt_restart
    Function: at.present
      Result: True
     Comment: job salt-call --local pkg.latest_version salt-minion added and will run on now +1 min
     Started: 12:44:04.506038
    Duration: 25.319 ms
     Changes:   
              ----------
              date:
                  2023-08-18
              job:
                  7
              queue:
                  a
              tag:
                  saltrestart
              time:
                  12:45:00
              user:
                  root

Summary for local
------------
Succeeded: 3 (changed=3)
Failed:    1
------------
Total states run:     4
Total run time:  16.108 s

as shown by

root@tdeb11:/srv/salt# apt-cache policy salt-minion
salt-minion:
  Installed: 3005.1+ds-1
  Candidate: 3005.2+ds-1
  Version table:
     3005.2+ds-1 500
        500 https://repo.saltproject.io/py3/debian/11/amd64/3005 bullseye/main amd64 Packages
 *** 3005.1+ds-1 100
        100 /var/lib/dpkg/status
     3002.6+dfsg1-4+deb11u1 500
        500 http://deb.debian.org/debian bullseye/main amd64 Packages
        500 http://security.debian.org/debian-security bullseye-security/main amd64 Packages

resulting in

local:
    Salt Version:
              Salt: 3005.2

    Dependency Versions:
              cffi: 1.15.1
          cherrypy: Not Installed
          dateutil: 2.8.1
         docker-py: Not Installed
             gitdb: Not Installed
         gitpython: Not Installed
            Jinja2: 2.11.3
           libgit2: Not Installed
          M2Crypto: Not Installed
              Mako: 1.1.3
           msgpack: 1.0.0
      msgpack-pure: Not Installed
      mysql-python: Not Installed
         pycparser: 2.21
          pycrypto: Not Installed
      pycryptodome: 3.9.7
            pygit2: Not Installed
            Python: 3.9.2 (default, Feb 28 2021, 17:03:44)
      python-gnupg: Not Installed
            PyYAML: 5.3.1
             PyZMQ: 20.0.0
             smmap: Not Installed
           timelib: Not Installed
           Tornado: 4.5.3
               ZMQ: 4.3.4

    System Versions:
              dist: debian 11 bullseye
            locale: utf-8
           machine: x86_64
           release: 5.10.0-23-amd64
            system: Linux
           version: Debian GNU/Linux 11 bullseye

Noted that the salt-master still shows Salt 3005.1 The original status of the salt-minion on Debian 11

root@tdeb11:/srv/salt# apt-cache policy salt-minion
salt-minion:
  Installed: 3005.1+ds-1
  Candidate: 3005.2+ds-1
  Version table:
     3005.2+ds-1 500
        500 https://repo.saltproject.io/py3/debian/11/amd64/3005 bullseye/main amd64 Packages
 *** 3005.1+ds-1 100
        100 /var/lib/dpkg/status
     3002.6+dfsg1-4+deb11u1 500
        500 http://deb.debian.org/debian bullseye/main amd64 Packages
        500 http://security.debian.org/debian-security bullseye-security/main amd64 Packages

That didn't change to Salt 3005.2 on the salt-master

td11:
    Salt Version:
              Salt: 3005.2

    Dependency Versions:
              cffi: 1.15.1
          cherrypy: Not Installed
          dateutil: 2.8.1
         docker-py: Not Installed
             gitdb: Not Installed
         gitpython: Not Installed
            Jinja2: 2.11.3
           libgit2: Not Installed
          M2Crypto: Not Installed
              Mako: 1.1.3
           msgpack: 1.0.0
      msgpack-pure: Not Installed
      mysql-python: Not Installed
         pycparser: 2.21
          pycrypto: Not Installed
      pycryptodome: 3.9.7
            pygit2: Not Installed
            Python: 3.9.2 (default, Feb 28 2021, 17:03:44)
      python-gnupg: Not Installed
            PyYAML: 5.3.1
             PyZMQ: 20.0.0
             smmap: Not Installed
           timelib: Not Installed
           Tornado: 4.5.3
               ZMQ: 4.3.4

    System Versions:
              dist: debian 11 bullseye
            locale: utf-8
           machine: x86_64
           release: 5.10.0-23-amd64
            system: Linux
           version: Debian GNU/Linux 11 bullseye

until I did the following on the salt-minion

root@tdeb11:/srv/salt# systemctl restart salt-minion
Warning: The unit file, source configuration file or drop-ins of salt-minion.service changed on disk. Run 'systemctl daemon-reload' to reload units.
root@tdeb11:/srv/salt# systemctl daemon-reload
root@tdeb11:/srv/salt# systemctl restart salt-minion

Given the above information, and if it resolves your issue, please consider closing this issue.

dmurphy18 commented 1 year ago

Hmmm! that could be cleaner in execution and have the code handle that systemctl interaction, but that would not work on devuan, that is, none systemd systems. Need to think about making that nicer, than having the user do it, but it could be related to the fact that I didn't stop the salt-minion before the upgrade and could have affected a running system differently.

And this didn't affect local salt-call on the salt-minion , it was the salt-master that had the issue in detecting the Salt version of the minion, and some timed service would probably have kicked in if I had waited long enough and updated the salt-master correctly.

dmurphy18 commented 1 year ago

@MAH69IK Please respond to my last comment otherwise this issue will be closed due to unresponsiveness

dmurphy18 commented 1 year ago

Closing due to unresponsive