Closed tothda closed 3 weeks ago
The reason why Proletarian does not catch Throwable
is because then we would also catch errors that are subclasses of java.lang.Error
, which include serious and critical conditions in the JVM that we could have no reasonable hope of dealing with - StackOverflowError
and OutOfMemoryError
in particular comes to mind. But really, any Error
exception indicate a condition where the best course of action is to throw up our hands and say "all right, let's bail".
You could make a simple wrapper function for your handlers if you cannot prevent AssertionError
from being thrown:
(defn handle-job!
[job-type payload]
;; Do stuff that might throw AssertionError
)
(defn throwable-catching-handler-fn
[f]
(fn [job-type payload]
(try
(f job-type payload)
(catch AssertionError e
(throw (ex-info "AssertionError in handler" {} e))))))
(worker/create-queue-worker
data-source
(throwable-catching-handler-fn handle-job!))
Now, I would not recommend you do this if you have control of the code that throws the AssertionError
.
AssertionError
is something that either is thrown explicitly by the developer ((throw (AssertionError.))
), or implicitly by the Clojure assert
macro. Either way, in my mind an assertion is a very strong signal to (and from!) the programmer that a condition has occured that should not have been possible, and that the code must be fixed. There is no runtime fix possible. This is why Proletarian simply shuts down when encountering an uncaught Throwable
. There is something wrong with the code (in the case of AssertionError
) or the runtime enviroment (in other subclasses of Error
), and it must be fixed. Proceeding with incorrect code is not a reasonable thing to do.
Now, you may have other intuitions about all this, and there may be constraints out of your control that prohibits you from rewriting the code to not throw AssertionError
. Then I propose the above wrapper function as a solution to your predicament.
Although I generally would accurately disable assertions in production, I'd say that Proletarian's default behavior could be improved.
It's simple and easy enough to catch AssertionError in addition to Exception. It's not like that would open the doors to also handling disparate exceptions - you would have to worry about essentially two cases:
AssertionError has well-known semantics for Java and Clojure developers alike. It's been basically always there, it won't go anywhere either, so it seems best handled (which yes, it's a bit annoying, but it's part of making an exhaustive handler IME).
Most practically, this affects how Proletarian behaves in dev/test systems.
For your consideration 🙂
Thanks - V
I can see three approaches for how to handle AssertionError
:
Exception
.How do you suggest we handle AssertionError
? Am I missing an approach here?
Hi! I'd recommend 2
Let's take this defn as a case study:
(defn foo [x]
{:pre [x]}
(.toString x))
Let's say that a job invokes (foo nil)
(.toString nil)
) and the job would failIn both cases, we can say, there was an application bug, so the best course of action is to fix the bug and retry the job.
The only thing that changes is that the AssertionError was thrown a little before.
Normally, a good assertion simply fails faster (and more clearly) code paths that would fail sooner or later anyway. So, because they're the "same", they deserve the same treatment IMO.
Cheers - V
OK, so we agree that AssertionError
should make the job fail, and that this is caused by an application bug that must be fixed before the job can be retried.
Now, the retry semantics of approach 2 is to retry according to a retry strategy. This strategy is provided by the application, and is tailored to the use case of the job. It might be to retry a few times in quick succession and then back off exponentially, or a more hand-crafted approach that takes into account data from the Exception
, or some other strategy. The likelihood that the bug has been fixed by the time the job is retried is in all cases not very great. Every retry until the bug is fixed will fail, both for this job and other instances of the same job type, causing noise and duplication of errors in the log, but more importantly: The job might fail permanently before a fix is deployed, because the retry strategy has determined that there are no more retries, causing the job to go into the Failed state.
Contrast this with the current approach (3). The worker, upon catching a Throwable
, will stop the worker. The job will be left as it is in the queue, invoking no retry mechanism. If there are more worker instances working on the same queue, they will pick up the same job, and they too will fail and stop, leaving the job in the queue. The logs will report on the failed jobs, hopefully with a severity sufficient to catch the attention of the team. A fix must then be developed, tested and deployed, and the work can proceed.
The upside of this approach is twofold:
Exception
s are for errors that might be transient. An AssertionError
is by definition not transient.I can see the downside, too: Jobs that are in the same queue, but that are different job types, might not have the same problem as the failing message. But they would nevertheless be delayed by the worker downtime caused by the AssertionError
. The remedy would be to have separate queues and workers for these jobs. This is more of an operational and team maturity issue, though, than a technical one. One must consider the business process implication of running different types of jobs in the same queue.
As you can tell, I'm not convinced that changing the current approach is the right thing to do. There's a workaround (the wrapper function in my first reply above) available if you need it. There's also the :proletarian/on-polling-error
option, where you can handle Throwable
s however you like (but the retry strategy mechanism is still bypassed, so implementing approach 2 using this is not possible).
I can see the downside, too: Jobs that are in the same queue, but that are different job types, might not have the same problem as the failing message.
Yes, this nuance makes 3
too heavy-handed of an approach IMO.
It would make sense if the retry strategies did take into account the class/contents of the given exception. Approximate proposed logic:
:proletalian/retry? false
This fine-grained logic would seem an overall improvement to me. Even when disregarding the AssertionError topic entirely, it makes sense to me that retriability should be encoded in (/ inferred from) the exception class/instance.
Thanks - V
I'm going to leave the design as it is. My reasoning has been laid out in this thread. A simple workaround is available (wrapper function, see my first comment in this issue) for folks that don't agree. Thanks for raising the issue and providing valuable feedback and pushback. Hopefully the discussion here can leave others capable of making their own decisions when it comes to handling AssertionError
s.
I noticed that when an assert statement fails in one of my jobs the whole proletarian Queue Worker shuts down. A failing assertion generates an
AssertionError
which is not a subclass ofException
. So I guess the code should catchThrowable
here: https://github.com/msolli/proletarian/blob/main/src/proletarian/worker.clj#L61