Open PeterS242 opened 9 months ago
Strangely enough a directed target / using no options for targeting seems to work just fine. No errors are produced.
master-3006.6 ~ # salt minion1 test.ping
minion1:
True
master-3006.6 ~ # salt minion\* test.ping
minion1:
True
minion2:
True
minion3:
True
master-3006.6 ~ # salt -E 'minion.*' test.ping
minion1:
True
minion2:
True
minion3:
True
master-3006.6 ~ #
Further examples:
master-3006.6 ~ # salt -C 'G@os:Gentoo' test.ping
gentoo1:
True
gentoo2:
True
redhat1:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20240207191724958644
redhat2:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20240207191724958644
ERROR: Minions returned with non-zero exit code
master-3006.6 ~ # salt -G os:RedHat test.ping
redhat1:
True
redhat2:
True
gentoo1:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20240207192002199323
gentoo2:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20240207192002199323
ERROR: Minions returned with non-zero exit code
master-3006.6 ~ #
Can you try the following?
rm -rf /var/cache/salt/master/minions
on the master(s).Thank you for looking into this.
Can you try the following?
- Stop the master(s).
- Clear the minion cache by running
rm -rf /var/cache/salt/master/minions
on the master(s).- Bring the master(s) back up.
- Wait a bit for minions to reconnect.
- Re-try your targeted calls.
Sure thing. I am not seeing any change. Here's the output (anonymized.)
master-3006.6 ~ # systemctl stop salt-master
master-3006.6 ~ # systemctl status salt-master
○ salt-master.service - The Salt Master Server
Loaded: loaded (/lib/systemd/system/salt-master.service; enabled; preset: disabled)
Active: inactive (dead) since Thu 2024-02-08 09:25:41 CST; 3s ago
Duration: 1d 20h 20min 14.716s
Process: 960442 ExecStart=/usr/bin/salt-master (code=exited, status=0/SUCCESS)
Main PID: 960442 (code=exited, status=0/SUCCESS)
CPU: 16min 58.519s
Feb 07 14:46:31 master-3006.6 salt-master[960515]: salt.utils.files.process_read_exception(exc, minions_path)
Feb 07 14:46:31 master-3006.6 salt-master[960515]: File "/usr/lib/python3.10/site-packages/salt/utils/files.py", line 225, in process_read_exception
Feb 07 14:46:31 master-3006.6 salt-master[960515]: raise CommandExecutionError("{} does not exist".format(path))
Feb 07 14:46:31 master-3006.6 salt-master[960515]: salt.exceptions.CommandExecutionError: /var/cache/salt/master/jobs/11/78e182ed90a51dc46bd3195960f268b274a46364cfb9cfcfbfb3878403f3de/.minions.p does not exist
Feb 08 09:25:41 master-3006.6 systemd[1]: Stopping salt-master.service...
Feb 08 09:25:41 master-3006.6 salt-master[960442]: [WARNING ] Master received a SIGTERM. Exiting.
Feb 08 09:25:41 master-3006.6 salt-master[960442]: The salt master is shutdown. Master received a SIGTERM. Exited.
Feb 08 09:25:41 master-3006.6 systemd[1]: salt-master.service: Deactivated successfully.
Feb 08 09:25:41 master-3006.6 systemd[1]: Stopped salt-master.service.
Feb 08 09:25:41 master-3006.6 systemd[1]: salt-master.service: Consumed 16min 58.519s CPU time.
master-3006.6 ~ #
master-3006.6 ~ # systemctl stop salt-master
master-3006.6 ~ # systemctl status salt-master
○ salt-master.service - The Salt Master Server
Loaded: loaded (/lib/systemd/system/salt-master.service; enabled; preset: disabled)
Active: inactive (dead) since Thu 2024-02-08 09:25:41 CST; 3s ago
Duration: 1d 20h 20min 14.716s
Process: 960442 ExecStart=/usr/bin/salt-master (code=exited, status=0/SUCCESS)
Main PID: 960442 (code=exited, status=0/SUCCESS)
CPU: 16min 58.519s
Feb 07 14:46:31 master-3006.6 salt-master[960515]: salt.utils.files.process_read_exception(exc, minions_path)
Feb 07 14:46:31 master-3006.6 salt-master[960515]: File "/usr/lib/python3.10/site-packages/salt/utils/files.py", line 225, in process_read_exception
Feb 07 14:46:31 master-3006.6 salt-master[960515]: raise CommandExecutionError("{} does not exist".format(path))
Feb 07 14:46:31 master-3006.6 salt-master[960515]: salt.exceptions.CommandExecutionError: /var/cache/salt/master/jobs/11/78e182ed90a51dc46bd3195960f268b274a46364cfb9cfcfbfb3878403f3de/.minions.p does not exist
Feb 08 09:25:41 master-3006.6 systemd[1]: Stopping salt-master.service...
Feb 08 09:25:41 master-3006.6 salt-master[960442]: [WARNING ] Master received a SIGTERM. Exiting.
Feb 08 09:25:41 master-3006.6 salt-master[960442]: The salt master is shutdown. Master received a SIGTERM. Exited.
Feb 08 09:25:41 master-3006.6 systemd[1]: salt-master.service: Deactivated successfully.
Feb 08 09:25:41 master-3006.6 systemd[1]: Stopped salt-master.service.
Feb 08 09:25:41 master-3006.6 systemd[1]: salt-master.service: Consumed 16min 58.519s CPU time.
master-3006.6 ~ # rm -rf /var/cache/salt/master/minions
master-3006.6 ~ # systemctl start salt-master
master-3006.6 ~ # date
Thu Feb 8 09:26:34 AM CST 2024
master-3006.6 ~ # systemctl status salt-master
● salt-master.service - The Salt Master Server
Loaded: loaded (/lib/systemd/system/salt-master.service; enabled; preset: disabled)
Active: active (running) since Thu 2024-02-08 09:26:31 CST; 4min 4s ago
Main PID: 1355739 (salt-master)
Tasks: 43 (limit: 4577)
Memory: 314.2M
CPU: 15.058s
CGroup: /system.slice/salt-master.service
├─1355739 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355800 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355801 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355804 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355805 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355806 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355807 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355808 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355810 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355817 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355818 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
├─1355819 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
└─1355820 /usr/bin/python3.10 /usr/lib/python-exec/python3.10/salt-master
Feb 08 09:26:31 master-3006.6 systemd[1]: Started salt-master.service.
master-3006.6 ~ #
master-3006.6 ~ # date ; time salt -G os_family:Gentoo test.ping
Thu Feb 8 09:32:22 AM CST 2024
gentoo1:
True
gentoo2:
True
redhat1:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20240208153222830925
redhat2:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20240208153222830925
ERROR: Minions returned with non-zero exit code
real 1m12.631s
user 0m3.107s
sys 0m0.344s
master-3006.6 ~ #
OK, I thought this might be something with the minion data cache on the master, but if you cleared it and are still seeing this, it would appear to rule that out.
I tested a git install from the v3006.6 tag. Did you install your master and all your minions using onedir packages (i.e. installed to /opt/saltstack)?
OK, I thought this might be something with the minion data cache on the master, but if you cleared it and are still seeing this, it would appear to rule that out.
I tested a git install from the v3006.6 tag. Did you install your master and all your minions using onedir packages (i.e. installed to /opt/saltstack)?
I have looked into what would be required to install the Onedir version of the Master onto this Gentoo server (v3006.6) but I don't think that's available, which is OK.
The minions are a varied mix of 3005*, Onedir and non-Onedir--about 300 of them--and I get the same results on all of them, which leads me to believe this is a Master-specific issue. Also I am not experiencing this when using my other master(s)--one is a locked ancient version v3000.5 Master that is going away soon(tm) and the other is a fully updated v3005.5 Master.
In case there is any question, here's the versioning on my minions (anonymized:)
master-3005.5 ~ # salt \* cmd.run 'echo {{grains["saltversion"]}}' template=jinja --out=txt | sort-groups | awk '{printf("minion%d %s\n",NR,$2)}'
output attached
Once we've updated to 3006.6 (onedir, Ubuntu 22.04, multimaster setup with master_type: str
and two servers in the list ) we can observer similar traces
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/returners/local_cache.py", line 309, in get_load
with salt.utils.files.fopen(minions_path, "rb") as rfh:
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/files.py", line 393, in fopen
f_handle = open(*args, **kwargs) # pylint: disable=resource-leakage
FileNotFoundError: [Errno 2] No such file or directory: '/var/cache/salt/master/jobs/d0/a09acc1aa71f79737613f9f0fbdbe05cbaf1240b9daad17d65c96f121d5323/.minions.p'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/master.py", line 1927, in run_func
ret = getattr(self, func)(load)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/master.py", line 1718, in _return
salt.utils.job.store_job(
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/job.py", line 129, in store_job
if job_cache == "local_cache" and mminion.returners[getfstr](load.get("jid", "")):
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 159, in __call__
ret = self.loader.run(run_func, *args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1245, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/loader/lazy.py", line 1260, in _run_as
return _func_or_method(*args, **kwargs)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/returners/local_cache.py", line 312, in get_load
salt.utils.files.process_read_exception(exc, minions_path)
File "/opt/saltstack/salt/lib/python3.10/site-packages/salt/utils/files.py", line 225, in process_read_exception
raise CommandExecutionError("{} does not exist".format(path))
salt.exceptions.CommandExecutionError: /var/cache/salt/master/jobs/d0/a09acc1aa71f79737613f9f0fbdbe05cbaf1240b9daad17d65c96f121d5323/.minions.p does not exist
So is the above a validation of this bug? Is there anything remaining on this or any update?
The issue persists with the recently released 3006.7:
master-3006.7 # salt -G os:Gentoo test.ping
gentoo1:
True
gentoo2:
True
gentoo3:
True
gentoo4:
True
...
redhat1:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20240222194437852300
redhat2:
Minion did not return. [No response]
The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:
salt-run jobs.lookup_jid 20240222194437852300
ERROR: Minions returned with non-zero exit code
master-3006.7 ~ # salt-run jobs.lookup_jid 20240222194437852300
gentoo1:
True
gentoo2:
True
gentoo3:
True
gentoo4:
True
master-3006.7 ~ # salt --version
salt 3006.7 (Sulfur)
master-3006.7 ~ #
The issue persists with 3007.
@PeterS242 Since we don't create gentoo packages you should be able to download a onedir archive, extract it, and test against it.
Description When targeting Minions, I am getting results also from non-matching targeted Minions with the error "Minion did not return. [No response]". This looks like a change from 3006.5 to 3006.6 .
Setup My 3006.6 Master is on an updated Gentoo server; I did have to hack the ebuild from 3006.5 to build as 3006.6 . The environment is multi-master.
Please be as specific as possible and give set-up details.
Steps to Reproduce the behavior (Include debug logs if possible and relevant)
Expected behavior When targeting Minions I would expect output to include only those targeted Minions and not from Minions that were not targeted.
Screenshots (Note, the following is an approximation of what I am currently experiencing.)
Versions Report
salt --versions-report
``` master-3006.6 ~ # salt --versions-report Salt Version: Salt: 3006.6 Python Version: Python: 3.10.13 (main, Dec 29 2023, 15:06:59) [GCC 13.2.1 20230826] Dependency Versions: cffi: 1.16.0 cherrypy: Not Installed dateutil: 2.8.2 docker-py: Not Installed gitdb: Not Installed gitpython: Not Installed Jinja2: 3.1.3 libgit2: Not Installed looseversion: 1.3.0 M2Crypto: Not Installed Mako: Not Installed msgpack: 1.0.7 msgpack-pure: Not Installed mysql-python: Not Installed packaging: 23.2 pycparser: 2.21 pycrypto: 3.20.0 pycryptodome: 3.20.0 pygit2: Not Installed python-gnupg: Not Installed PyYAML: 6.0.1 PyZMQ: 25.1.2 relenv: Not Installed smmap: Not Installed timelib: Not Installed Tornado: 4.5.3 ZMQ: 4.3.5 System Versions: dist: gentoo 2.14 n/a locale: utf-8 machine: x86_64 release: 6.6.13-gentoo-dist system: Linux version: Gentoo 2.14 n/a master-3006.6 ~ # ``` ```yaml PASTE HERE ```Additional context I am not sure if some default setting has changed between versions or what, this is always possible.