Thread.scheduler, and how to use the same IO across fibers

HoneyryderChuck commented 3 years ago

I'm investigating how to have better fiber scheduler support in httpx, and am using async for my tests.

For that, I wrote the following script, which is based on the net-http example described in the ruby 3 announcements:

require 'async'
require 'httpx'

def httpx
  Async do
    ["ruby", "rails", "async"].each do |topic|
      Async do
        HTTPX.get("https://www.google.com/search?q=#{topic}")
      end
    end
  end
end

httpx

When running the script above with HTTPX_DEBUG=1 on, one can be observe that a new TCP socket will be opened for each request. That happens because httpx connection pool is stored in a fiber-local variable (Thread.current[..), so each fiber will instantiate their own connection pool.

Although the above works, I'd rather make the fibers reutilize the connection pool and reuse the socket. For that, I added this commit, and this broke the script above:

/home/dev/httpx/.bundle/ruby/3.0.0/gems/async-1.30.1/lib/async/reactor.rb:170:in `register': this IO is already registered with selector (ArgumentError)
        from /home/dev/httpx/.bundle/ruby/3.0.0/gems/async-1.30.1/lib/async/reactor.rb:170:in `register'
        from /home/dev/httpx/.bundle/ruby/3.0.0/gems/async-1.30.1/lib/async/wrapper.rb:224:in `wait_for'
        from /home/dev/httpx/.bundle/ruby/3.0.0/gems/async-1.30.1/lib/async/wrapper.rb:139:in `wait_writable'
        from /home/dev/httpx/.bundle/ruby/3.0.0/gems/async-1.30.1/lib/async/scheduler.rb:61:in `io_wait'
        from /home/dev/httpx/lib/httpx/selector.rb:105:in `wait_writable'
        from /home/dev/httpx/lib/httpx/selector.rb:105:in `select_one'
        from /home/dev/httpx/lib/httpx/selector.rb:119:in `select'
        from /home/dev/httpx/lib/httpx/pool.rb:37:in `block in next_tick'

This happens because wait_writable will be called on the same socket a second time after the first fiber autoswitches.

Am I right to assume that the async reactor design favours servers, where it's expected that there'll be a 1 socket - 1 fiber mapping? Or is this a limitation of nio4r?

bruno- commented 3 years ago

You can use gem async-http to make multiple HTTP2 requests over a single connection.

require "async"
require "async/http/internet"

Async do |task|
  internet = Async::HTTP::Internet.new

  ["ruby", "rails", "async"].each do |topic|
    task.async do
      internet.get("https://www.google.com/search?q=#{topic}")
    end
  end
end

Run the program with CONSOLE_DEBUG=Async::HTTP::Client ruby script.rb. The debug output should show you only a single connection was made.

HoneyryderChuck commented 3 years ago

I am aware that async has its own client/server implementations, that's not the goal of the issue.

My question is: can I make my http client work with any Thread.scheduler and still keep persistent connection support? I picked up async to answer that question because it was the first fiber scheduler implementation around, but the "net-http" example doesn't make it clear whether that's possible. I understand that I could make a custom connection pool to make it work with async, but then I'd have to do that for every fiber scheduler, and the abstraction of the scheduler wouldn't be of value.

ioquatix commented 3 years ago

@HoneyryderChuck Thanks for your interest in this project.

@bruno- thanks for triaging this issue and offering some working examples.

@HoneyryderChuck your example code looks good and the fact it works out of the box (first example) is a great sign.

As an aside, you might like to consider async-rspec which puts a few extra assertions around your tests to ensure safe behaviour.

Can I make my http client work with any Thread.scheduler and still keep persistent connection support?

The answer should be yes, but there are some limitations in Async 1.x which made this difficult.

The main issue you are running into is the inability to support multiple wait_foo on the same IO instance. This was a limitation of nio4r. In Ruby 3.x and Async 2.x (or just the general fiber scheduler) this limitation no longer applies. However, we are still waiting for Ruby 3.1 / 3.0.3 to be released which includes some critical bug fixes.

Regarding how to make a generic connection pool which supports all modes of execution without being async specific, Async 2 and Ruby 3.1 will bring complete event-driven concurrency support to thread primitives, e.g. mutex, semaphore, queue, condition variable, etc. You can use these primitives and be both thread and fiber safe.

HoneyryderChuck commented 3 years ago

Hi @ioquatix

your example code looks good and the fact it works out of the box (first example) is a great sign.

Indeed. I've been investigating httpx's behaviour under ruby 3 new features, and I have to say that testing Thread.scheduler is a breeze compared to making it work under ractors, which I've given up until ruby 3.1 .

The main issue you are running into is the inability to support multiple wait_foo on the same IO instance. This was a limitation of nio4r.

Yup, that's what it felt like. Knowing what I know about it, I wonder how hard would it be to check for a subscribed monitor in the reactor for a given IO instance to avoid such errors, but I guess you probably tried this already.

In Ruby 3.x and Async 2.x (or just the general fiber scheduler) this limitation no longer applies.

Ok, that's good to hear. will that be a new backend replacing nio4r? I guess I'll have to wait to test it then.

Thx for the feedback!

socketry / async

Thread.scheduler, and how to use the same IO across fibers #127