saltstack / salt

Software to automate the management and configuration of any infrastructure or application at scale. Get access to the Salt software package repository here:
https://repo.saltproject.io/
Apache License 2.0
13.98k stars 5.47k forks source link

[BUG] [3007.1] startup_states: highstate stop working #66592

Open momolemo opened 1 month ago

momolemo commented 1 month ago

Description After update from 3007 to 3007.1 the startup_state: highstate option stop working. We find in the service salt-minion log the following error message.

mai 24 12:29:18 server salt-minion[503062]: /opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py:2434: RuntimeWarning: coroutine 'Minion._handle_decoded_payload' was never awaited
mai 24 12:29:18 server salt-minion[503062]:   self._handle_decoded_payload(data)
mai 24 12:29:18 server salt-minion[503062]: RuntimeWarning: Enable tracemalloc to get the object allocation traceback

Setup (Please provide relevant configs and/or SLS files (be sure to remove sensitive info. There is no general set-up of Salt.)

Please be as specific as possible and give set-up details.

Steps to Reproduce the behavior in 3007.1 set the config minion to statup_states: highstate and restart minions

Expected behavior A salt highstate apply automatcaly

Screenshots N/A

Versions Report

salt --versions-report Same version on master and minion (test cluster) ```yaml Salt Version: Salt: 3007.1 Python Version: Python: 3.10.14 (main, Apr 3 2024, 21:30:09) [GCC 11.2.0] Dependency Versions: cffi: 1.16.0 cherrypy: unknown dateutil: 2.8.2 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.4 libgit2: Not Installed looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.7 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 23.1 pycparser: 2.21 pycrypto: Not Installed pycryptodome: 3.19.1 pygit2: Not Installed python-gnupg: 0.5.2 PyYAML: 6.0.1 PyZMQ: 25.1.2 relenv: 0.16.0 smmap: Not Installed timelib: 0.3.0 Tornado: 6.3.3 ZMQ: 4.3.4 Salt Extensions: saltext.vault: 1.0.0 Salt Package Information: Package Type: onedir System Versions: dist: debian 11.9 bullseye locale: utf-8 machine: x86_64 release: 5.10.0-27-cloud-amd64 system: Linux version: Debian GNU/Linux 11.9 bullseye ```

Additional context Add any other context about the problem here.

We search which PR can be responsible of this behavior and we find this one that seems to us possible cause but we cant be sure because of our dev level too low

https://github.com/saltstack/salt/commit/0ff43842cf0141c946c58ac9b3af18ca744c0ff9

transistortim commented 1 month ago

I can confirm this happening on all our minions (Debian and Windows) after having upgraded to 3007.1.

The log contains the same message as reported above, enabling tracemalloc reveals the corresponding code line:

root@minion ~ 
# export PYTHONTRACEMALLOC=1
root@minion ~ 
# salt-minion -l debug
[ ... some more debug messages ...]
[INFO    ] Minion is ready to receive requests!
/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py:2434: RuntimeWarning: coroutine 'Minion._handle_decoded_payload' was never awaited
  self._handle_decoded_payload(data)
Object allocated at (most recent call last):
  File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py", lineno 2434
    self._handle_decoded_payload(data)

The corresponding code block https://github.com/saltstack/salt/blob/0ff43842cf0141c946c58ac9b3af18ca744c0ff9/salt/minion.py#L2434 parses the startup_states option and fails to do so because of the missing await. Adding await requires making the calling functions async. By manually fixing /opt/saltstack/salt/lib/python3.10/site-packages/salt/minion.py with the changes below highstating works as expected.

Changes:

diff --git a/minion_old.py b/minion.py
index 68733a5..a4ebbe3 100644
--- a/minion_old.py
+++ b/minion.py
@@ -1144,7 +1144,7 @@ class MinionManager(MinionBase):
                     minion.setup_scheduler(before_connect=True)
                 if minion.opts.get("master_type", "str") != "disable":
                     await minion.connect_master(failed=failed)
-                minion.tune_in(start=False)
+                await minion.tune_in(start=False)
                 self.minions.append(minion)
                 break
             except SaltClientError as exc:
@@ -2406,7 +2406,7 @@ class Minion(MinionBase):
         log.trace("ret_val = %s", ret_val)  # pylint: disable=no-member
         return ret_val

-    def _state_run(self):
+    async def _state_run(self):
         """
         Execute a state run based on information set in the minion config file
         """
@@ -2431,7 +2431,7 @@ class Minion(MinionBase):
                 else:
                     data["fun"] = "state.highstate"
                     data["arg"] = []
-                self._handle_decoded_payload(data)
+                await self._handle_decoded_payload(data)

     def _refresh_grains_watcher(self, refresh_interval_in_minutes):
         """
@@ -3110,7 +3110,7 @@ class Minion(MinionBase):
         return True

     # Main Minion Tune In
-    def tune_in(self, start=True):
+    async def tune_in(self, start=True):
         """
         Lock onto the publisher. This is the main event loop for the minion
         :rtype : None
@@ -3137,7 +3137,7 @@ class Minion(MinionBase):
             salt.utils.win_functions.enable_ctrl_logoff_handler()

         # On first startup execute a state run if configured to do so
-        self._state_run()
+        await self._state_run()

         self.setup_beacons()
         self.setup_scheduler()
iMikeG6 commented 1 month ago

Hi there,

we also have the same issue with the version 3007.1 on all of our minions. startup_states: highstate is not working anymore.

transistortim commented 4 weeks ago

If anyone is looking for a workaround: We enabled salt's reactor with the example from the docs adapted slightly.

root@master ~
# cat /etc/salt/master.d/reactor.conf 
reactor:
  - 'salt/minion/*/start':          # Match tag "salt/minion/*/start"
    - /srv/reactor/start.sls        # Things to do when a minion starts
root@master ~
# cat /srv/reactor/start.sls 
highstate_run:
  local.state.apply:
    - tgt: {{ data['id'] }}
iMikeG6 commented 4 weeks ago

Thanks @transistortim,

this also worked for us.

Nice workaround.

If anyone is looking for a workaround: We enabled salt's reactor with the example from the docs adapted slightly.

root@master ~
# cat /etc/salt/master.d/reactor.conf 
reactor:
  - 'salt/minion/*/start':          # Match tag "salt/minion/*/start"
    - /srv/reactor/start.sls        # Things to do when a minion starts
root@master ~
# cat /srv/reactor/start.sls 
highstate_run:
  local.state.apply:
    - tgt: {{ data['id'] }}
jfduque commented 1 week ago

Can confirm that this also happens when using startup_states: 'sls'. I'm using 3007.1

bgdnlp commented 1 week ago

Confirmed here too. Which, in our case, means that new machines will not be configured properly and that messes up our deployments.