Open arizvisa opened 5 years ago
It looks like the reason why returners might get req
as the job instead of a real job id is because of the following code. But again, I'm really unfamiliar with this beastly code-base so I'm not 100% sure:
https://github.com/saltstack/salt/blob/develop/salt/daemons/masterapi.py#L780
def _return(self, load):
'''
Handle the return data sent from the minions
'''
...
if load['jid'] == 'req':
# The minion is returning a standalone job, request a jobid
prep_fstr = '{0}.prep_jid'.format(self.opts['master_job_cache'])
load['jid'] = self.mminion.returners[prep_fstr](nocache=load.get('nocache', False))
# save the load, since we don't have it
saveload_fstr = '{0}.save_load'.format(self.opts['master_job_cache'])
self.mminion.returners[saveload_fstr](load['jid'], load)
...
if not self.opts['job_cache'] or self.opts.get('ext_job_cache'): # XXX: is this right?
return
fstr = '{0}.update_endtime'.format(self.opts['master_job_cache'])
if (self.opts.get('job_cache_store_endtime')
and fstr in self.mminion.returners):
self.mminion.returners[fstr](load['jid'], endtime)
fstr = '{0}.returner'.format(self.opts['master_job_cache'])
self.mminion.returners[fstr](load)
You can see the conditional at XXX
where it's checking to see if the caches don't exist, however it looks like the test for both job_cache
and ext_job_cache
intended to be wrapped in parentheses because this line is essentially saying that if there's _not_ a job cache or there _is_ an external job cache, then ignore the following code which calls the returner()
function and uses the master_job_cache
.
Is this intentional, or was it really supposed to be the following which results in the returner being called if there's an external job cache configured?
if not (self.opts['job_cache'] or self.opts.get('ext_job_cache')):
it looks like adding req
to the jid was on purpose. If you can see this PR: https://github.com/saltstack/salt/pull/16392
if you see this comment: https://github.com/saltstack/salt/pull/16392#issuecomment-58903622 looks like we need to add that code handling the req
return to all the returners.
Ok. Because yeah, if that explicit check doesn't exist then it results in the load data being overwritten since all jids spawned from a minion are 'req'. I'll update my PR then.
Thanks!
Ftr, that unfortunately means that all the returners (other than mysql
and local_cache
) are busted. Maybe not totally busted actually since there's no error (just silence), but essentially any jobs spawned by minions will cease to exist..
Should I create a new issue pointing that out, or perhaps re-open this and change the title?
lets re-title this and track here :)
Okay, cool. Re-titled it.
As another question, is there some way to see the 'req' jobs that are currently being run in a salt environment? Although returners should re-map the 'req' to a new job id, I've been unable to figure out how to view that new job id without actually watching the event queue.. If we go to the prior logic where 'req' is not re-mapped, then you can view the job...but if more than one minion makes a 'req' then the previous results are overwritten.
@Ch3LL, do you mind re-tagging this since it isn't a question anymore?
Ftr, PR #51363 fixes this issue just for the etcd returner. So once/if that gets merged there's one less returner to fix.
if we go to the prior logic where 'req' is not re-mapped, then you can view the job
How were you previously viewing it when it was working?
By using 'req' as the job id. However anytime another client started another job, it would overwrite the previous job data for 'req'.
when you say overwrite do you mean its overwriting the req
job in etcd where the job returns are stored?
yes. it's ovewriting because the job-id is 'req', so any returner that lets you use a non-integral job id will be overwritten if the job id is being used as a unique key. offhand i think cassandra does this, but there's probably others.
what might work as a workaround would be to introduce minion-specific job ids. This way job ids can be unique which corresponds to the assumptions that the sql-based returners make when creating an index for the job id.
Something like req-${MINION_ID}-${TIMESTAMP}
or something similar since it seems that once a job id hits the returner, salt doesn't seem to care about it anymore (not sure about this). But this way the job id is still searchable via returner.list_jobs
(and friends).
This'll definitely need some discussion though as I'm really only familiar with the returner interfaces from my work on etcd. Pretty much the most complete returners are the sql-based one, the local_cache one, the write-only ones (email-based and such), and the etcd one (afaict, and whenever it gets merged).
thanks for clarifying. wanted to make sure i was following correctly.
ping @saltstack/team-core any ideas on how we could approach this?
Apparently I need to dig into returner code, because it's a place I'm not familiar with at all right now 😛
@waynew, if you want to check my PR for etcd support in the returners. I'd be welling to bet that it's the most complete and documented of the returner implementations.
https://github.com/saltstack/salt/pull/51363
Here's a link to the specific file in my submitted branch: https://github.com/arizvisa/saltstack-salt/blob/etcd-returner-is-busted/salt/returners/etcd_return.py
If you use the etcd returner PR, it's very easy to monitor what a client is doing as etcd simplifies monitoring its schema via etcdclient. You can monitor for "/$jid/$id/$num" (or similar, i haven't messed with this in a long time) where your jid is "req" or a result from "prep_jid", and your "num" is an incrementing number for the state that the client is processing.
Otherwise you'll need to use state.event
to watch what's happening as jobs get returned and keep an eye for "$jid" which can be "req" or a jid depending on the issue i mentioned.
Or you can also just plant logs in the returner that you're using (non-local ones).
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
If this issue is closed prematurely, please leave a comment and we will gladly reopen the issue.
This is referenced by #52775 and seems to be the only mention of the special handling of the "req" jid by the returners. Each returner has to explicitly handle this job and so this should probably be left open.
Relevent documentation for how the etcd2 returner deals with this special case can be found in the documentation for the salt.returner.etcd_return
module merged by PR #51363.
Thank you for updating this issue. It is no longer marked as stale.
Description of Issue/Question
Hiya. I'm the author of PR #51363 (etcd returner), and I notice that the mysql returner specially handles the case when the job id is set to "req". This job id seems to be submitted by a minion when it is activated independant of a command from the master. (Relevant code: https://github.com/saltstack/salt/blob/develop/salt/returners/mysql.py#L294)
The way this magic job-id is handled by the returner seems to be that when it is seen, to replace it with a new job id generated from
returner.prep_jid
. However, it appears that only themysql
andlocal_cache
returners handle this job id explicitly, and everything else does not tamper with it at all. Which returner is handling it correctly?Is this something that a returner is actually not supposed to receive and a result of a bug or misconfiguration? Most of the logic that messes with this magic job-id immediately replaces it with a call to
prep_jid
, however there are a few cases where this logic is missed which results in a returner receiving this 'req' job...Setup
To view the job-id in use, simply set the startup_states in a minion to execute some state that takes a while (so that you can race against the state being applied). Configure any of the non-local returners so that you can manually check the job being created.
Steps to Reproduce Issue
As the state is being applied, when you view the active jobs
salt-run jobs.active
you'll see that the job id for the state that was started by the minion has the job-id of "req".Versions Report