Open onmeac opened 3 years ago
@onmeac thanks for the report! Trying to reproduce this issue but so far I'm not having any success. Or I'm having success? :joy:
Basically, I can't seem to make this fail - salt-run jobs.lookup_jid
works just fine for me. Always. I'll look into this a bit more tomorrow to see if I can reproduce any sort of thing :+1:
This is what I used to try and reproduce the issue. Requires some manual intervention to accept keys & start syndic process, but... yeah, lookup_jid kept working.
@onmeac do you think that you can use this as a jumping off point to create a full MCVE?
@onmeac I'm going to go ahead and close this issue for now since I can't reproduce. If you're able to put together a MCVE, let me know and I'll be happy to re-open this! (may need to ping on slack or IRC)
https://github.com/saltstack/salt/blob/master/salt/returners/local_cache.py
lines 273-276
if syndic_id is not None:
minions_path = os.path.join(jid_dir, SYNDIC_MINIONS_P.format(syndic_id))
else:
minions_path = os.path.join(jid_dir, MINIONS_P)
if job data is received on a master of masters from a downstream syndic minions_path
is never os.path.join(jid_dir, MINIONS_P)
line 284:
with salt.utils.files.fopen(minions_path, "w+b") as wfh:
obviously creates whatever minions_path
is at the time, this would be os.path.join(jid_dir, SYNDIC_MINIONS_P.format(syndic_id))
if job data was received from a downstream syndic.
lines 320-321
minions_cache = [os.path.join(jid_dir, MINIONS_P)]
minions_cache.extend(glob.glob(os.path.join(jid_dir, SYNDIC_MINIONS_P.format("*"))))
minions_cache is now a list where its very first element does not exist, that file was never created.
yet with salt.utils.files.fopen(minions_path, "rb") as rfh
: (line 326) will attempt to open that file which raises the process_read_exception
I could not get an environment up and running using your images so I used docker.io/saltstack/salt
This is what I did (using podman in root mode, with limited podman/container knowledge):
start mom container
podman-compose -p mom_pod -f mom.yml up --build -d
get ip of mom container
podman exec -ti mom ip a | grep -w inet
_set mom ip as syndicmaster in syndic/syndic file
podman-compose -p syndic_pod -f syndic.yml up --build -d
whait a bit for things to start
podman exec -ti syndic salt \* test.version --async
copy jid
podman exec -ti mom salt-run jobs.lookup_jid <jid>
exception:
Exception occurred in runner jobs.lookup_jid: Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/salt/returners/local_cache.py", line 308, in get_load
with salt.utils.files.fopen(minions_path, "rb") as rfh:
File "/usr/local/lib/python3.7/site-packages/salt/utils/files.py", line 385, in fopen
f_handle = open(*args, **kwargs) # pylint: disable=resource-leakage
FileNotFoundError: [Errno 2] No such file or directory: '/var/cache/salt/master/jobs/98/50909d32c4af0440c92c9020e6e3503d86b5aa6cf7f22676ee34c4c08c1cc1/.minions.p'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/salt/client/mixins.py", line 390, in low
data["return"] = func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/salt/loader/lazy.py", line 149, in __call__
return self.loader.run(run_func, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/salt/loader/lazy.py", line 1201, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/salt/loader/lazy.py", line 1216, in _run_as
return _func_or_method(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/salt/runners/jobs.py", line 140, in lookup_jid
data = list_job(jid, ext_source=ext_source, display_progress=display_progress)
File "/usr/local/lib/python3.7/site-packages/salt/runners/jobs.py", line 205, in list_job
job = mminion.returners["{}.get_load".format(returner)](jid)
File "/usr/local/lib/python3.7/site-packages/salt/loader/lazy.py", line 149, in __call__
return self.loader.run(run_func, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/salt/loader/lazy.py", line 1201, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/salt/loader/lazy.py", line 1216, in _run_as
return _func_or_method(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/salt/returners/local_cache.py", line 311, in get_load
salt.utils.files.process_read_exception(exc, minions_path)
File "/usr/local/lib/python3.7/site-packages/salt/utils/files.py", line 224, in process_read_exception
raise CommandExecutionError("{} does not exist".format(path))
salt.exceptions.CommandExecutionError: /var/cache/salt/master/jobs/98/50909d32c4af0440c92c9020e6e3503d86b5aa6cf7f22676ee34c4c08c1cc1/.minions.p does not exist
My MOM gets this same stack trace about every minute, spamming the master log file. Any update on this issue? I am running v3006.3
getting this after upgrade to 3006.9 on syndics and main master
Description of Issue
In a master of master setup with one or more syndic servers the
.minions.p
file inside job results cache directory will be absent when salt commands are run/started from syndic, causingsalt-run jobs.lookup_jid <jid>
to fail on master of masters.Setup
local_cache
as returner).master_job_cache
other thanlocal_cache
, e.g. copylocal_cache.py
totest_issue.py
and configuredmaster_job_cache: test_issue
Steps to Reproduce Issue
salt \* test.ping --async
salt \* test.ping --async
[1]:
.minions.py
file created in/var/cache/salt/master/jobs/<some random jid dir>
[2]:.minions.py
absent in/var/cache/salt/master/jobs/<some random jid dir>
Both random jid directories will have a
.minions.<name of syndic>.p
file.When looking up a jid result the
local_cache
returner will create a list containing path to.minions.p
(MINIONS_P) and then extend that list with.minions.<name of syndic>.p
(SYNDIC_MINIONS_P)code from
returners/local_cache.py
:Because MINIONS_P does not exist, process_read_exception exception is raised. Example exception:
Possible solutions:
process_read_exception
takes an optional argument to ignore certain error codes that might be an option? Or perhaps an if statement to check ifos.path.join(jid_dir, MINIONS_P)
exists?Versions Report