Closed myronmarston closed 11 years ago
This is a simple oversight on my part.
I'm thinking about refactoring the history a little bit. In particular, in two ways: first, I'd like to put job history information in a separate key since it's essentially a list we append to. The second is that with things like that with things like retires the current structure breaks down. Currently there's just one JSON blob per time a job is put into a queue, with keys like 'put', 'popped', 'q', etc.
Instead, I'd like it to look a little more like the log messages we generate:
{'q': 'foo', 'what': 'put', 'at': ...},
{'q': 'foo', 'what': 'popped', 'at': ...},
{'q': 'foo', 'what': 'lost lock', 'at': ...},
{'q': 'foo', 'what': 'lost lock', 'at': ...},
{'q': 'foo', 'what': 'completed', 'at': ...},
I think it will help to shed more light on what actually happens with a job, and I like the idea of it being compatible with the logged messages. I'm not sure exactly the stored structure, but I figured it would be something like that. Thoughts?
I like this a lot.
Ok, the updated code now returns:
{"when"=>1368999174, "q"=>"testing", "what"=>"put"}
{"when"=>1368999174, "what"=>"popped", "worker"=>"worker-a"}
{"when"=>1368999174, "what"=>"popped", "worker"=>"worker-b"}
{"when"=>1368999174, "what"=>"done"}
This is an instance where a job timed out with worker-a
, and was subsequently given to worker-b
. Does that work for you guys? Or should I put an explicit 'timed out' even in the history?
This is an instance where a job timed out with worker-a, and was subsequently given to worker-b. Does that work for you guys? Or should I put an explicit 'timed out' even in the history?
I would love an explicit timed-out
event. It helps explain what happened. Also, with the new abort
button we've talked about, it seems like we're moving in a direction where timeouts can happen independent of a worker getting the job. In fact, the current design that causes the job to be immediately handed out when it times out seems a bit odd to me: when the job times out it seems like it should go in the waiting queue and be available for a worker to pop....but there may be other jobs in the waiting queue with a higher priority. Thus, I view the timeout and the popping of the job by a different worker as distinct events.
Fair enough -- consider it done.
Updated the UI, so it looks like this now:
Merged into master and qless
's server, so closing this one out.
I've noticed that in pop.lua, when a worker loses its lock on a job, it gives the job to another worker and does not update the history to reflect the original timeout. The history is meant to reflect what happened to the job, and it timing out is an important even that should be included, IMO.
@dlecocq -- is this by design or a simple oversight?