socketry / async

An awesome asynchronous event-driven reactor for Ruby.
MIT License
2.04k stars 85 forks source link

Understanding the difference of Sync usage within Falcon #300

Closed travisbell closed 5 months ago

travisbell commented 5 months ago

I came across a difference in Async behaviour in Falcon that I am trying to understand. I'm sure there's a reason, but I couldn't figure it out.

I've had Async usage like this in our codebase for about a year now, and everything has seemingly been fine. Historically, this would have been in the context of a Puma or Pitchfork environment:

results_1, results_2 = nil

Sync do |task|
  task.async do
    results_1 = db_call_1
  end
  task.async do
    results_2 = db_call_2
  end
end

My two tasks are executed asynchronously and the results of results_1 and results_2 are available when we exit the Sync block.

I've been playing around with Falcon, and suddenly the outputs of results_1 OR results_2 aren't available when the Sync block is done running. It's like we're not waiting for the results anymore. Which is what I thought Sync did by design.

I fixed this problem by migrating the above code to use a barrier:

results_1, results_2 = nil
barrier = Async::Barrier.new

Sync do
  barrier.async do
    results_1 = db_call_1
  end
  barrier.async do
    results_2 = db_call_2
  end
  barrier.wait
end

Everything is back to working properly in Falcon.

My question is, why is Sync behaving differently in a Falcon env?

ioquatix commented 5 months ago

Falcon runs every request inside a reactor, while Puma/Pitchfork don't.

Because of that, when your request enters the code Sync do ... end, on Falcon, it's already running in a reactor. Sync does provide guarantees about its own sequential execution, but does not provide any guarantee about the nested tasks.

In Puma/Pitchfork, the Sync do ... end is the top level of the reactor, and thus internally, it will wait for all tasks to complete before exiting. But in Falcon, it's not the top level reactor, and so it does not wait.

The barrier solution you proposed is the correct one. Alternatively, you can write it like this:

Sync do |task|
  results_1_task = task.async do
    db_call_1
  end
  results_2_task = task.async do
    db_call_2
  end

  results_1 = results_1_task.wait
  results_2 = results_2_task.wait

  # or even better:
  results_1 = results_1_task.wait
  results_1.each do |row|
    #...
  end

  results_2 = results_2_task.wait
  results_2.each do |row|
    #...
  end
end

By staggering the waits, you avoid introducing artificial contention.

travisbell commented 5 months ago

Oooooh, this makes perfect sense now. Thanks for closing that loop for me. Your docs state that Sync will only create a reactor if necessary, but I didn't understand that in this context, they were nested tasks. I mean I see it now, but I wasn't connecting it before.

Thanks for the tips about the staggered waits, I shall adopt them.

ioquatix commented 5 months ago

Even better if you can put more of the code in the separate tasks:

Sync do |task|
  results_1_task = task.async do
    results_1 = db_call_1
    results_1.sum do |row|
      #...
    end
  end
  results_2_task = task.async do
    results_2 = db_call_2
    results_2.each do |row|
      #...
    end
  end
end

You'll still need to insert appropriate logic to wait, but the more you can put in the task, the better (IO) concurrency you are likely to get.