add option to raise standard errors

wjwatkinson commented 2 years ago

Description

Give users the option to have their tasks operate like the rest of Ruby and have all errors, including StandardError, raised without explicitly having to call wait, or result.

Using this gem felt very natural in how it in interfaced with existing synchronous Ruby code. The one place it felt awkward and different was with error handling. Ruby raises exceptions unless you rescue them, but with async it is the opposite, you have to add code in order to raise StandardErrors.

Based on a comment in the code and the git history it seems like the original design was to raise all exceptions by default, but give users the ability to explicitly determine if there were cases in which they did not want to raise StandardErrors.

Based on my experience so far I agree with this line of reasoning and am curious why it was changed. Ideally I think it would be nice to move back to this, but have implemented this feature as opt in to avoid any breaking changes.

Types of Changes

New feature.

Testing

[x] I added tests for my changes.
[x] I tested my changes locally.
[ ] I tested my changes in staging.
[ ] I tested my changes in production.

ioquatix commented 2 years ago

What is the use case?

The problem is, after the task is resumed from the event loop, the exception will bubble up and kill the entire event loop. Is that what you want? Try putting task.sleep(1) before raising the exception.

wjwatkinson commented 2 years ago

Thanks for the quick reply and, yes, that is what I intended (also apologies for the brief description. I have updated it with more information).

The goal is to create an experience more akin to syncronous ruby, where exceptions are raised and kill the execution unless they are explicitly rescued. This comment seems to describe the behavior I am looking for, but it is no longer exposed as an option.

An example use case would be if you are aggregating data from multiple APIs and if any of the APIs fail you cannot proceed. Currently you would have to:

async = Async do |task|
  req1 = task.async do
     # request 1
  end

  req2 = task.async do
     # request 2
  end

  req1.wait
  req2.wait
end

async.result

This feels a bit awkward and verbose. It can also lead to the code taking longer to exit. If, for example req1 takes a long time to complete and req2 errors out immediately the main thread will not error until req1 has finished instead of right after req2 errors.

emiltin commented 2 years ago

Elixir (and Erlang) lets you choose how you start processes (taks). For example, you can choose to link them so that if one crashes, the other one is terminated, or you can keep them isolated. You also have supervisor trees, where a supervisor (parent task) can be used to restart processes (sub tasks) if they crash.

I don't know whether async should offer more options like that. But I think the current approach of isolating tasks to some degree is good. Concurrent code is, by nature, a bit different. One of the basic ideas of concurrency is that things should run side-by-side. To me that implies some degree of isolation. It make sense to me that a crash in one task place should not cause everything else to fail by default.

ioquatix commented 2 years ago

@wjwatkinson I don't think your proposed change works the way you expect. @emiltin gives a good overview of the general concept, but more specifically, I don't think it's reasonable to kill the entire event loop.

That being said, maybe we can achieve want you want in a better way using some helper code for a fan out + exception handling.

ioquatix commented 2 years ago

By the way, tasks that raise unhandled errors will log them so if no one calls wait it will not be missed entirely.

wjwatkinson commented 2 years ago

Thanks for the thoughtful responses. My solution is definitely a bit hacky and killing the entire event loop may be overkill. That said what I struggled with here was how easy and intuitive this package felt until I started testing error handling.

The current design is essentially:

rescue StandardError => e
  Console.logger.error(e)
end

Which is not something I would do in my code. We also use structured logging (JSON logs), so if I did want to log these errors I would want to do it in a different way.

I think putting the user back in control a bit here and having defaults that align more with synchronous ruby code, if possible , in terms of choosing to rescue instead of choosing to raise, would help. If killing the loop on error is problematic this could perhaps be some sort of option that automatically calls task.wait and async.result. Users could then rescue and log StandardErrors if they wanted to.

ioquatix commented 2 years ago

Which is not something I would do in my code. We also use structured logging (JSON logs), so if I did want to log these errors I would want to do it in a different way.

Console supports structured output just redirect stderr to a file and you get newline delimited JSON.

I think putting the user back in control a bit here and having defaults that align more with synchronous ruby code, if possible , in terms of choosing to rescue instead of choosing to raise, would help. If killing the loop on error is problematic this could perhaps be some sort of option that automatically calls task.wait and async.result.

You can use Sync for this behaviour but you won't get fanout.

emiltin commented 2 years ago

Is Async::Barrier similar to the kind of fan out operation you want, except that you want any one child failing to abort all children?

https://socketry.github.io/async/source/Async/Barrier/index.html#Async%3A%3ABarrier%23async

wjwatkinson commented 2 years ago

For my use case the ideal interface would be:

Async do |task|
  task.async do
    # request 1
  end

  task.async do
    # request 2
  end
end

# if either request errors the error is raised in the main thread to be handled.

What could be really cool as well is something like

Async do |task|
  task.async do
    rescue StandardError => e
    # handle error async
  end

  task.async do
    # request 2
  end
end

# if either request errors the error is raised in the main thread to be handled.

I realize rescuing errors in async blocks likely introduces a lot of complexity and this might not be the best interface for everyone's use case.

For my use case, of wanting to make multiple http requests in parallel, that all need to succeed to continue, it would be ideal to have the errors automatically raised in the main thread by default, so they can be handled appropriately.

I do not have a strong opinion on whether the event loop should be killed on error, or not. It seems like for my use case that is the most expedient, but it could be limiting and is not necessary.

ioquatix commented 2 years ago

For my use case, of wanting to make multiple http requests in parallel, that all need to succeed to continue, it would be ideal to have the errors automatically raised in the main thread by default, so they can be handled appropriately.

This is a typical fan out best handled by explicit tasks or a barrier.

You must at some point collect the results of these concurrent requests, no? That's the point where you'd handle exceptions. If you want to have a side channel to cancel the entire operation if one fails, a barrier makes perfect sense.

ioquatix commented 2 years ago

With the following example program:

#!/usr/bin/env ruby

require_relative '../lib/async'

RAISE_ERRORS = ENV['RAISE_ERRORS'] == 'true'

Async do |task|
    task.async do |task|
        while true
            task.sleep 1
            puts "Hello"
        end
    end

    task.async(raise_errors: RAISE_ERRORS) do |task|
        task.sleep 2
        raise "Boom"
    end
end

We can check the difference.

With the latter, the entire program dies.

ioquatix commented 2 years ago

I think a better mechanism is the Sync do...end block which is strictly sequential including exception handling. What are the valid use cases for this new feature?

wjwatkinson commented 2 years ago

Sorry for the slow reply here. My use case is getting data from multiple sources where all of the data is require to proceed, so if one of the requests fails an exception should be raised. I am going to close this PR out for now as I have not been actively working on this and don't want to clutter the repo.

ioquatix commented 2 years ago

For that use case you should definitely just use map-reduce and Async::Barrier to handle exceptions.

socketry / async