Closed moteus closed 5 years ago
Thanks for the question!
The philosophy of qless is generally that if a job fails, it may need attention, which is why successfully-completed jobs eventually expire out of the system but failed jobs stay around indefinitely.
If you truly don't want to hold on to failed jobs, then part of your strategy might be to try / catch
everything in the job code and increment your failure counter in the catch
. However, that won't help the retries-exhausted
type failures.
There are two APIs that can help - the failed
API which will indicate what failure groups exist and how many jobs are in each, returning something like:
{
'failure-type-1': 17,
'failure-type-2': 83,
...
}
Realistically, the groups are generally reflective of the uncaught exception, or <queue-name>-failed-retries
.
That same API can also accept a type to get the actual jobs. Using the example above, we could call it with failure-type-1
to get a response something like this:
{
'total': 17,
'jobs': ['job-id-1', 'job-id-2', ...]
}
With a list of all the job IDs you want to cancel, you can use the cancel
API. It accepts an arbitrary number of job IDs, so you can cancel jobs in large batches as well.
Thank you for the answer. My plan is
Do you think is it worth to extend qless-core API to clean up failed tasks?
There are a couple wrinkles with the possibility of extending the core API to clean up failed jobs:
All that said, I wouldn't object to such an API - I think others would use it. I don't personally have the bandwidth for it, though.
What is correct way to keep only last N filed jobs. For now I try to figure out is it possible use qless in my use case. In my use case I can just threw away a job and forget about it We have separate logging infrastructure and we can checkout logs there. I just need write metrics about number of failures to the graphite. Each worker simply will try to complite job or call retry with some delay. If a number of retries exhausted qless now marks such job as failed and never remove them. (I have over 10M messages per day and aroud 40% will be marked as a failed because of they can not be complited) I see only one solution is just make my own counter and mark all jobs as complited. But may be there exists some efficien way to remove all failed jobs from the queue?