Error reporting difference between Sync{} and Async{}.wait

codekitchen commented 6 months ago

Hi, I recently tracked down a tricky silent failure related to switching from Async to Sync at the top level, as the best practices doc suggests. An exception in the top-level task was being silently swallowed and not reported, so it took a long time to track down why the service was sometimes not behaving. Here is a minimal reproduction:

require 'async'
Sync {
  Async { loop { sleep 1 } }
  raise "whoops"
}

This code just hangs indefinitely without logging the unhandled exception. This code is silly of course, but hopefully you can see how this could arise in practice -- in the actual application, the root task reads input and passes a queue to each long-lived async sub-task to communicate back to the root task. If the root task raises a StandardError due to a bug, it's dead, but the reactor is still waiting on its sub-tasks which are running in a loop so nothing is ever logged or otherwise reported.

But what's interesting is that if you change the top-level Sync{} to Async{}.wait, then the exception is logged in the normal Task may have ended with unhandled exception. way, though it still hangs indefinitely afterwards. But that logging would've been key here. It's not clear to me just looking at the code for the Async and Sync methods why this behavior might be different, I'd need to look at the Reactor impl more closely.

In hindsight, I likely should refactor to have a top-level supervisor task which is the parent of the current root task, since that one is doing a lot of actual I/O work, but this still seems worth probably addressing.

Marginally related, I really wish async didn't just log StandardError failures and continue on, I prefer hard failures on all unhandled exceptions. It'd be nice if the async library had a way to bubble up all unhandled exceptions or register a callback for unhandled errors or something. But that seems like a separate discussion that I should open at some point once I've thought it through further.

codekitchen commented 6 months ago

Looking a bit deeper, this is because Sync when not already in a task sets up a finished Condition and calls reactor.run().wait vs just reactor.run(), which makes total sense. The current error handling behavior logs at the debug level when the task is being waited on, so if I extend my previous example to call Console.logger.debug! then the error does get logged out.

This seems like an unfortunate difference to me between Sync and Async at the top level, but I'm not sure what a reasonable fix would be. Possibly modify the error handling behavior to still log at warning level if the task is the root task? But that feels maybe too special-casey.

codekitchen commented 6 months ago

I'm familiarizing myself with the implementation of async a bit more, and realizing that my earlier analysis is a little off. In top-level calls to both Async and Sync, the reactor.run call doesn't return to the caller until the reactor shuts down, so this isn't related to the subsequent call to #wait. The difference is caused by Sync passing an explicit finished: ::Async::Condition.new to the top-level task, which causes the internal task code to see somebody as waiting on the task result. I can trigger the same behavior with Async(finished: ::Async::Condition.new) {...}

I don't know the intent here in setting an explicit finished arg but it seems possibly related to letting the internal task code know that since this is a Sync call, conceptually there's somebody already waiting on the result.

Anyway, I'm glad I understand it more fully now. But I'm not sure that helped steer me closer to what should change here -- assuming the maintainers agree something should change. I do think it'd be beneficial to not have this logging gotcha, though.

socketry / async

Error reporting difference between Sync{} and Async{}.wait #299