seomoz / qless-core

Core Lua Scripts for qless
MIT License
85 stars 34 forks source link

When jobs timeout the history does not reflect that #17

Closed myronmarston closed 11 years ago

myronmarston commented 11 years ago

I've noticed that in pop.lua, when a worker loses its lock on a job, it gives the job to another worker and does not update the history to reflect the original timeout. The history is meant to reflect what happened to the job, and it timing out is an important even that should be included, IMO.

@dlecocq -- is this by design or a simple oversight?

dlecocq commented 11 years ago

This is a simple oversight on my part.

I'm thinking about refactoring the history a little bit. In particular, in two ways: first, I'd like to put job history information in a separate key since it's essentially a list we append to. The second is that with things like that with things like retires the current structure breaks down. Currently there's just one JSON blob per time a job is put into a queue, with keys like 'put', 'popped', 'q', etc.

Instead, I'd like it to look a little more like the log messages we generate:

{'q': 'foo', 'what': 'put', 'at': ...},
{'q': 'foo', 'what': 'popped', 'at': ...},
{'q': 'foo', 'what': 'lost lock', 'at': ...},
{'q': 'foo', 'what': 'lost lock', 'at': ...},
{'q': 'foo', 'what': 'completed', 'at': ...},

I think it will help to shed more light on what actually happens with a job, and I like the idea of it being compatible with the logged messages. I'm not sure exactly the stored structure, but I figured it would be something like that. Thoughts?

myronmarston commented 11 years ago

I like this a lot.

dlecocq commented 11 years ago

Ok, the updated code now returns:

{"when"=>1368999174, "q"=>"testing", "what"=>"put"}
{"when"=>1368999174, "what"=>"popped", "worker"=>"worker-a"}
{"when"=>1368999174, "what"=>"popped", "worker"=>"worker-b"}
{"when"=>1368999174, "what"=>"done"}

This is an instance where a job timed out with worker-a, and was subsequently given to worker-b. Does that work for you guys? Or should I put an explicit 'timed out' even in the history?

myronmarston commented 11 years ago

This is an instance where a job timed out with worker-a, and was subsequently given to worker-b. Does that work for you guys? Or should I put an explicit 'timed out' even in the history?

I would love an explicit timed-out event. It helps explain what happened. Also, with the new abort button we've talked about, it seems like we're moving in a direction where timeouts can happen independent of a worker getting the job. In fact, the current design that causes the job to be immediately handed out when it times out seems a bit odd to me: when the job times out it seems like it should go in the waiting queue and be available for a worker to pop....but there may be other jobs in the waiting queue with a higher priority. Thus, I view the timeout and the popping of the job by a different worker as distinct events.

dlecocq commented 11 years ago

Fair enough -- consider it done.

dlecocq commented 11 years ago

Updated the UI, so it looks like this now:

screen shot 2013-05-20 at 9 33 11 am

dlecocq commented 11 years ago

Merged into master and qless's server, so closing this one out.