socketry / async

An awesome asynchronous event-driven reactor for Ruby.
MIT License
2.04k stars 85 forks source link

Tasks signaling Conditions leave suspended Fibers behind #295

Closed rmosolgo closed 2 months ago

rmosolgo commented 6 months ago

I've learned that left-over, suspended fibers can retain ActiveRecord connections, causing the connection pool to run out of available connections (example: https://github.com/rmosolgo/graphql-ruby/issues/4739#issuecomment-1866671563).

So, I added a test to my project that makes sure that all Fibers created during execution end up not alive?. It currently doesn't pass when I use async as the "backend" for GraphQL::Dataloader. Here's the test: https://github.com/rmosolgo/graphql-ruby/blob/e60d22b138d91c0193e2b24eeb7b20b60648698f/spec/graphql/dataloader_spec.rb#L1020-L1026

In that test, I have to pause GC in order to catch the suspended fiber, because otherwise it will get GC'ed, since the task assigns @fiber = nil.

But, in Rails, that suspended Fiber can be the "owner" of an ActiveRecord connection, and since it's suspended, ActiveRecord thinks that it might still use the connection -- and therefore doesn't reuse the connection šŸ˜– .

Here's a small replication (the condition.signal seems to be the special sauce):

require "bundler/inline"

gemfile do
  gem "async", "~>2.6"
  gem "sqlite3", "~>1.7"
  gem "rails", "~>7.1"
end

require "active_record"

ActiveSupport::IsolatedExecutionState.isolation_level = :fiber
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: "__example.db")
ActiveRecord::Schema.define do
  self.verbose = false
  create_table :things, force: true do |t|
    t.column :name, :string
  end
end

class Thing < ActiveRecord::Base
end

Thing.create!(name: "Pogo Stick")

pp ActiveRecord::Base.connection_pool.stat

def run_process(number)
  puts "#{number} - begin process"
  Sync do |root_task|
    condition = Async::Condition.new
    t1 = root_task.async do |inner_task|
      condition.wait
    end

    root_task.async do |inner_task|
      Thing.all.to_a
      condition.signal
    end.wait

    t1.wait
  end
  puts "#{number} - end process"
end

run_process(1)
run_process(2)
run_process(3)

# For clarity's sake, remove connections which ActiveRecord knows aren't in use:
ActiveRecord::Base.connection_pool.reap

puts "\n\nSome AR connections are in use, even though the process exited:"
pp ActiveRecord::Base.connection_pool.connections.map { |c|
  { in_use: !!c.in_use?, owner_alive?: c.owner&.alive?, owner: c.owner,  }
}
# For exmaple:
# [{:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x0000000107a1f1f0 (resumed)>},
#  {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x000000010b845060 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>},
#  {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x000000010b8e4020 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>},
#  {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x000000010b80ded0 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>}]

run_process(4)
# This process will fail to acquire a connection:
run_process(5)
run_process(6)

It prints like this:

$ ruby async_issue.rb
/Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/io-event-1.3.3/lib/io/event/support.rb:24: warning: IO::Buffer is experimental and both the Ruby and C interface may change in the future!
{:size=>5, :connections=>1, :busy=>1, :dead=>0, :idle=>0, :waiting=>0, :checkout_timeout=>5.0}
1 - begin process
1 - end process
2 - begin process
2 - end process
3 - begin process
3 - end process

Some AR connections are in use, even though the process exited:
[{:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x0000000101b2f2a8 (resumed)>},
 {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x0000000105d11ce8 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>},
 {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x0000000105f18d98 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>},
 {:in_use=>true, :owner_alive?=>true, :owner=>#<Fiber:0x0000000105f13b40 /Users/rmosolgo/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.6.5/lib/async/task.rb:326 (suspended)>}]
4 - begin process
4 - end process
5 - begin process

(It never exits and for some reason it also doesn't raise with a timeout error šŸ¤· )

I think it would be better if this Fiber was terminated, so that ActiveRecord could reap the connection it was using. What do you think?

ioquatix commented 6 months ago

Thanks for the report, I agree we should do something, but exactly what I'm not sure yet.

rmosolgo commented 6 months ago

Thanks for taking a look. I also just saw your answer over at https://bugs.ruby-lang.org/issues/20081 (thanks for that, too šŸ˜… ). That ensure suggestion makes this script work, too:

+ ensure 
    condition.signal 

I'm going to see if I can integrate that into my GraphQL-Ruby code in the meantime šŸ» !

ioquatix commented 2 months ago

Do you mind trying this again on Rails head? (7.2)

rmosolgo commented 2 months ago

:+1: I confirmed the script above terminated successfully on rails master:

gem "rails", github: "rails/rails"

# ... 

# 5 - begin process
# 5 - end process
# 6 - begin process
# 6 - end process

Out of curiosity, do you know what fixed it?

trevorturk commented 2 months ago

I'm not sure, but I wonder if it's related to this recent work: https://github.com/rails/rails/pull/50793 -- it's a lot to follow, but I believe the tldr is that Rails 7.1 and older keep db connections open until the end of the request, but 7.2 and newer will release connections as soon as possible.

Of course I hope @ioquatix will chime in, but I figured this would be interesting in any case if you haven't seen it yet!

rmosolgo commented 2 months ago

I would beleive it -- I noticed something similar with GraphQL's fiber-based Dataloader. I received a bug report that the connection pool wasn't releasing connections automatically until the request was done, and that they had to be manually released in order for things to work right. I went about testing an improvement in https://github.com/rmosolgo/graphql-ruby/pull/4944, but I found that the bug didn't appear in Rails master -- only in < 7.1.

trevorturk commented 2 months ago

Hmm, note there's a workaround for older versions of Rails, just wrapping db work in with_connection blocks:

ActiveRecord::Base.connection_pool.with_connection do
  Thing.find() # etc...
end

...then it'll return the connection to the pool at the end of the block. (Just in case that's helpful for anyone currently!)

ioquatix commented 2 months ago

I think this is enough to show the difference:

#!/usr/bin/env ruby

require "bundler/inline"

gemfile do
  gem "async", "~>2.6"
  gem "sqlite3", "~>1.7"
  # gem "rails", "~>7.1"
  gem "rails", git: "https://github.com/rails/rails"
end

require "active_record"

ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: "__example.db")
ActiveRecord::Schema.define do
  self.verbose = false
  create_table :things, force: true do |t|
    t.column :name, :string
  end
end

class Thing < ActiveRecord::Base
end

ActiveRecord::Base.release_connection

Thing.create!(name: "Pogo Stick")

pp ActiveRecord::Base.connection_pool.stat

binding.irb

Schema operations still pin a connection, so we have to release that.

But, Thing.create! and most other operations won't pin a connection unless you explicitly ask for it - e.g. a transaction, with_connection and so on.

@byroot might have more insight.