socketry / async

An awesome asynchronous event-driven reactor for Ruby.
MIT License
2.1k stars 86 forks source link

undefined method `push' for nil:NilClass with async 2.0 #139

Closed emiltin closed 1 year ago

emiltin commented 2 years ago

Hi, After upgrade to async 2.0 and ruby 3.1.0, I get this error:

NoMethodError:
  undefined method `push' for nil:NilClass

                @selector.push(fiber)
                          ^^^^^
emiltin commented 2 years ago

This happens in an rspec test, so it might be hidding some of the backtrace?

       @reactor.async do |task|
         @task = task
         @node = build_node task, options
         @node.start  # keep running inside the async task
       end

     NoMethodError:
       undefined method `push' for nil:NilClass

                    @selector.push(fiber)
                             ^^^^^
     # ./spec/support/testee.rb:137:in `start'
     # ./spec/support/testee.rb:59:in `isolated'
     # ./spec/support/test_site.rb:58:in `isolated'
     # ./spec/site/core/connect_spec.rb:12:in `get_connection_message'
     # ./spec/site/core/connect_spec.rb:53:in `check_sequence_3_1_1_to_3_1_3'
     # ./spec/site/core/connect_spec.rb:92:in `check_sequence'
     # ./spec/site/core/connect_spec.rb:114:in `block (3 levels) in <top (required)>'

The code for testee.rb:137 can be seen at https://github.com/rsmp-nordic/rsmp_validator/blob/master/spec/support/testee.rb#L137

In these rspec tests, the async reactor is interrupted at after each rspec test, and the run again at the start of the next test. Perhaps the error is related to that (unusual?) usage pattern, but it works with previous version of Async.

emiltin commented 2 years ago

Here's a reduction:

require 'async'
reactor = Async::Reactor.new
Async {}
reactor.async {}

=>

/Users/emiltin/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/async-2.0.0/lib/async/scheduler.rb:108:in `resume': undefined method `push' for nil:NilClass (NoMethodError)

                @selector.push(fiber)
                         ^^^^^
    from /Users/emiltin/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/async-2.0.0/lib/async/task.rb:273:in `schedule'
    from /Users/emiltin/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/async-2.0.0/lib/async/task.rb:120:in `run'
    from /Users/emiltin/.rbenv/versions/3.1.0/lib/ruby/gems/3.1.0/gems/async-2.0.0/lib/async/scheduler.rb:273:in `async'
    from test.rb:7:in `<main>'
ioquatix commented 2 years ago

https://github.com/socketry/async/blob/main/lib/async/reactor.rb#L23-L40

Basically, the Async::Reactor sets itself to be the current scheduler when it's allocated. In the past this was done before calling Async::Reactor#run but it creates some life cycle issues. Maybe we can warn on this usage, but basically combining it the way you have is invalid.

ioquatix commented 2 years ago

What we could do is make the Async{} block hook up to the current reactor using Fiber.schedule - this would at least allow the above code to work. I'll think about whether this is possible.

emiltin commented 2 years ago

Thank you for taking a look. My usage can be fixed by changing to:

require 'async'
reactor = Async::Reactor.new
reactor.async {}
reactor.async {}

But conceptually, is it wrong to create multiple reactors? I would think Async {} is similar to creating a reactor and running it, if called outside an existing reactor?

ioquatix commented 2 years ago

The scope of a reactor begins on the thread it is created on and ends when the thread exits or you call close. I should write some conceptual documentation as this scope has changed due to the way the fiber scheduler works. We are heading towards a world where "Async" as a bespoke interface is less important (by design).

As an example:

require 'async'
reactor = Async::Reactor.new

Fiber.schedule do
  $stderr.puts
end

is the non-Async-specific way of spinning up tasks and doing IO. If you want fine grained control over this, you can use Async::Scheduler which is directly compatible with Fiber.set_scheduler. Async::Reactor is just a sub-class which does this in #initialize.

emiltin commented 2 years ago

I admit I probably need to learn more about the fiber scheduler to really understand your latest reply. But If understand correctly, only one reactor can be set a the fiber scheduler, for each thread?

If a reactor can only set it self as the scheduler when it's created, then when you create a new reactor, previously created reactors cannot be used anymore.

I think the solution is either to make it impossible to create more than one scheduler per thread, or make it possible for reactors to set themselves as the sheduler when run() is called, not only then they're created.

ioquatix commented 2 years ago

But If understand correctly, only one reactor can be set a the fiber scheduler, for each thread?

Yes, correct.

If a reactor can only set it self as the scheduler when it's created, then when you create a new reactor, previously created reactors cannot be used anymore.

Not only that, but there is a shut down process, so any tasks will be terminated.

I think the solution is either to make it impossible to create more than one scheduler per thread, or make it possible for reactors to set themselves as the sheduler when run() is called, not only then they're created.

The reason why we don't do that is so the following works:

require 'async'
reactor = Async::Reactor.new # Set the fiber scheduler

Fiber.schedule do # Invokes fiber scheduler hook
  $stderr.puts
end

reactor.run # Run all scheduled fibers
ioquatix commented 1 year ago

In Async 2+, the implementation was split - into the Async::Scheduler which is the core of the fiber scheduler implementation, and Async::Reactor < Async::Scheduler which is basically a compatibility layer on the Async 1.x behaviour.

Because of that, Async::Reactor#initialize, will set the fiber scheduler, while Async::Scheduler doesn't.

The way to use it is like so:

scheduler = Async::Scheduler.new
Fiber.set_scheduler(scheduler)

Fiber.schedule {...}

Because Async{} will create an Async::Reactor, it will call Fiber.set_scheduler(self).

Fiber.set_scheduler(...) will invoke scheduler_close on any currently defined scheduler. Therefore, any previously defined scheduler itself is closed.

If you don't care about Fiber.scheduler working, you can probably use multiple instances of Async::Scheduler on the same thread in an interleaved fashion, in a limited sense. However, many code paths now expect Fiber.scheduler to return the currently active scheduler, so it's unlikely to work, generally.

The original bug report was regarding the exception and message being confusing. With the linked PR, the error message is less confusing, and I don't think there is anything else actionable from this report. If you think of anything, please feel free to open a new issue.

ioquatix commented 1 year ago

The new error message:

irb(main):002:0> reactor = Async::Reactor.new
=> #<Async::Reactor:0x00000000000046dc>
irb(main):003:0> Async {}
irb(main):004:0> reactor.async {}
/Users/samuel/Developer/socketry/async/lib/async/scheduler.rb:269:in `async': Scheduler is closed! (Async::Scheduler::ClosedError)

One other thought I had, which I don't think we should do but I'll mention for completeness, is to make Async{} behave more like Fiber.schedule when an existing scheduler is defined. However, this changes the semantics of the task tree.

Basically, the following idea:

reactor = Async::Reactor.new

# The following lines would be equivalent and would both attach to the current reactor:
Async{...} # Currently, this creates a new scheduler.
Fiber.schedule{...} # This attaches to the current scheduler.