Closed uaalto closed 7 years ago
Hi there,
I'm afraid this report is going to need more details.
How can this situation be reached? Am I missing some configuration?
Not sure I understand the question. You're asking if it's normal for tasks to be dropped after a certain number of retries? Not on Carmine's end, but I'm not sure what your application/handler logic is.
If your handler continually returns {:status :retry}
when given a task, it should keep retrying the task forever.
If your handler returns {:status :success}
or {:status :error}
, the task is considered completed and will be garbage collected.
Thanks for you reply @ptaoussanis. I've been busy trying to fix some issues. We are currently using this task system for very sensitive processes that need to be reliable.
We've also experienced another issue where the tasks disappeared spontaneusly too. The bug seemed to not happen when we stopped setting the :threads parameter. Unfortunately, this is very hard to replicate, and we have the system in production. I intended to set up a full testing suite of this to stress the task system and try to trigger the bug, but I didn't find the time for that yet.
Answering your questions:
What version of Carmine? [com.taoensso/carmine "2.13.3-uaalto"]
What do you mean by a "couple" of days? Tasks have been accumulating and disappearing for a couple of days. That's the time they've had to disappear.
What do you mean by "many" times? The amount of times a task is retried before disappearing is not consistent. Might be 10-50 and is variable IRCC.
Have you confirmed that the tasks didn't in fact eventually successfully execute after retrying? The tasks were triggering only the retry. I can't demonstrate 100% that's the case, but the disappearing of tasks as I mentioned has happened under other circumstances as well.
Have you confirmed that your Redis instance hasn't been pruning keys because of memory limitations, etc.? I haven't confirmed that. We don't have memory limitations at the moment, but how is this even triggered in Redis and why it would select these keys?
Being two bugs that I can hardly show evidence for, that seem the same or closely related, I think the best way to find out is to stress-test the system. I planned to do that, but probably won't make it any time soon since I have many important tasks ATM.
Hi Ulysses, sorry for the delay replying.
Think the best way of proceeding on this if you're still having problems (?) would be trying to produce some kind of reproducible example that I could look at and debug from my end.
I coulnd't reproduce this in a long time. Closing. Thanks for your help @ptaoussanis!
No problem, thanks for the update :-)
After a couple of days without visiting the task queues, I found out that over a 500 tasks have failed temporarily and have issued a
{:status :retry}
. However, no task remains in the queues so I can fix the issue and let them succeed. How can this situation be reached? Am I missing some configuration?