Swallowed unhandled exceptions because of misusage of clojure.core/future

tamasjung commented 2 years ago

Future calls ExecutorService/submit only, therefore it hides every potential unhandled exceptions until deref of get is called, they remain hidden even from the default unhandled exception handler. Calling future in 'spin-off' way is an issue in general.

https://github.com/walmartlabs/lacinia/blob/e77ea57f95569b0f957f0e5ebf094be47d83a342/src/com/walmartlabs/lacinia/executor.clj#L421

One workaround is setting *callback-executor* but that is not the default setup.

mszajna commented 2 years ago

Let me elaborate this a bit:

The problem is that lacinia's use of future (per link above) prevents the application from leveraging Thread/setDefaultUncaughtExceptionHandler (see Stuart Sierra's blog post https://stuartsierra.com/2015/05/27/clojure-uncaught-exceptions).

Any exceptions bubbling from user registered callback via (on-deliver! result-promise callback) (which could be arbitrary code, like sending the response over HTTP), will get caught by future. The Future is then set to failed state, but given the object is not available outside of execute-query you can't recover the exception. It's a silent failure, no stack trace, no nothing.

If instead the exception was allowed to bubble all the way to the top of the stack you could use Thread/setDefaultUncaughtExceptionHandler to log it. To achieve that, lacinia needs to avoid future and schedule tasks using Executor/execute instead. As it happens, lacinia does exactly that if a *callback-executor* is provided.

So, the fix could be as easy as defaulting *callback-executor* to clojure.lang.Agent/soloExecutor (the executor used by future). I'm happy to provide a PR if maintainers are happy with this approach.

PS. Admittedly Thread/setDefaultUncaughtExceptionHandler is a band-aid. It could be argued that callbacks should just not be allowed to throw. In async code, with inverted call stack, there's just nothing useful an application can really do with exceptions that logically happen way outside of its scope. But in real world mistakes happen, and visibility is key for debugging.

hlship commented 2 years ago

I'll have to refresh my memory on future behavior; essentially I'd prefer to capture the exception in the future's thread, and re-throw the exception when derefing the future; that would more likely be on the Pedestal stack which is a better place to throw exceptions (the interceptor stack may include something to report the exception and return a 500 status response).

mszajna commented 2 years ago

essentially I'd prefer to capture the exception in the future's thread, and re-throw the exception when derefing the future

In this instance, future is never returned to the caller, so there is no opportunity to deref. The reference to that object is effectively lost. As used in the code linked above, future is merely a convenient way to kick off a background task.

The argument is that the pattern of using future for background tasks you're not awaiting the result value of is problematic. That's precisely because the exception will be captured in an unreachable Future object, with no opportunity to get discovered.

hlship commented 2 years ago

The f in (future (f)) has a (try ...) block in it that ensure that exceptions are caught and returned, and the higher levels check to see if the nominal response map is actually an exception.

The problem with allowing exceptions to escape to the thread's uncaught exception handler is that other code, that is waiting for that asynchronous block of code to execute, will wait forever unless they build there own explicit timeout. That means that, in lacinia-pedestal, you end up with a request thread that blocks, or (if async) a request that never gets serviced.

On the other hand, if an exception occurs before it gets to line 421, that will be executed in the caller's thread, where it can be caught normally. In a Pedestal request processing thread, that will be caught and passed through the interceptor stack, to a point where that 500 response can be sent to the client.

So I fail to see the real problem here, is this some theoretical objection, or is there a particular use case in your application that I'm unaware of?

tamasjung commented 2 years ago

If I remember well the reasoning was that this line inside the catch block can throw something https://github.com/walmartlabs/lacinia/blob/8b2ca6488e47a024c2cbef151fdcd6a25c30636b/src/com/walmartlabs/lacinia/executor.clj#L416 , because resolve/deliver! calls the :callback fn of the state and that opens the door to .... literally anything?

Regarding the theoretical part, even when you fix this by wrapping the inside of catch with another try-catch and you do something super safe in the second level catch you force the reader of code to check if the calling of f is safe of not. And that is a very practical concern. Not using future without deref makes the writers and readers life easier, as far as error handling goes, I think.

hlship commented 2 years ago

I'm thinking that the right approach here would be to add a schema/compile option for an executor used with resolver result (it would provide one by default). Then that can be expected to exist, and we can always submit to that executor and not use a future. That's a bit more elegant.

It would have been nice if resolve/resolve-promise took the context as an argument, so there wouldn't be a need to bind *callback-executor*.

tamasjung commented 2 years ago

That sounds quite right 👍

hlship commented 2 years ago

So, what do you think of #424 ?

tamasjung commented 2 years ago

👍 I think it solves the issue.

hlship commented 2 years ago

Fixed by #424.

walmartlabs / lacinia

Swallowed unhandled exceptions because of misusage of clojure.core/future #415