socketry / async

An awesome asynchronous event-driven reactor for Ruby.
MIT License
2.04k stars 85 forks source link

Strange timeout bug. #265

Closed jsaak closed 10 months ago

jsaak commented 11 months ago

I have a class which tries to connect on TCP. When it fails it tries again. If i try again with TCPSocket.new() before it time outs, i get a strange error message, which does not make any sense to me:

require 'async'

Fiber.set_scheduler(Async::Scheduler.new)

class Mqtt3NormalExitException < Exception
end

class Mqtt3
  def initialize(host: 'localhost', port: 1883)
    @host = host
    @port = port
    @reconnect = true
  end

  def run
    Fiber.schedule do
      @fiber_main = Fiber.current
      puts "create new fiber #{@fiber_main.inspect}"
      counter = 0
      #counter = 1
      loop do
        begin
          puts "TCPSocket.new()"
          tcp_socket = TCPSocket.new(@host, @port, connect_timeout: 0.9)
          ret = tcp_socket
        rescue => e
          ret = e
        end

        if ret.is_a? (Exception)
          puts("tcp connect error: #{ret.inspect}")
        else
          puts("connected")
          counter = 0
        end

        if @reconnect
          if counter > 0
            begin
              puts "sleeping for #{counter} sec"
              sleep counter
            rescue Mqtt3NormalExitException => e
              @reconnect = false
              break
            end
          end

          if counter == 0
            counter = 1
          else
            counter *= 2
            counter = 300 if counter > 300
          end
        end
        break unless @reconnect
      end
    end
  end

  def stop
    if Fiber.current == @fiber_main
      raise Mqtt3NormalExitException
    else
      puts "sending Mqtt3NormalExitException to #{@fiber_main.inspect}"
      Fiber.scheduler.raise(@fiber_main,Mqtt3NormalExitException)
    end
  end
end

Fiber.schedule do
  loop do
    mqtt = nil

    Fiber.schedule do
      mqtt = Mqtt3.new(host: '1.1.1.1', port: 1111)
      mqtt.run
    end

    Fiber.schedule do
      sleep 3
      mqtt.stop
    end

    sleep 5
  end
end

produces:

$ ruby error.rb
create new fiber #<Fiber:0x00007f9b05aa55a8 /home/jsaak/.gem/ruby/3.2.2/gems/async-2.6.3/lib/async/task.rb:326 (resumed)>
TCPSocket.new()
tcp connect error: #<IO::TimeoutError: connect(2) for "1.1.1.1" port 1111>
TCPSocket.new()
tcp connect error: #<IO::TimeoutError: connect(2) for "1.1.1.1" port 1111>
sleeping for 1 sec
TCPSocket.new()
sending Mqtt3NormalExitException to #<Fiber:0x00007f9b05aa55a8 /home/jsaak/.gem/ruby/3.2.2/gems/async-2.6.3/lib/async/task.rb:326 (suspended)>
/home/jsaak/.gem/ruby/3.2.2/gems/async-2.6.3/lib/async/scheduler.rb:179:in `io_wait': Mqtt3NormalExitException (Mqtt3NormalExitException)
    from /home/jsaak/.gem/ruby/3.2.2/gems/async-2.6.3/lib/async/scheduler.rb:179:in `io_wait'
    from error.rb:23:in `initialize'
    from error.rb:23:in `new'
    from error.rb:23:in `block (2 levels) in run'
    from error.rb:20:in `loop'
    from error.rb:20:in `block in run'
    from /home/jsaak/.gem/ruby/3.2.2/gems/async-2.6.3/lib/async/task.rb:160:in `block in run'
    from /home/jsaak/.gem/ruby/3.2.2/gems/async-2.6.3/lib/async/task.rb:330:in `block in schedule'

$ ruby -v
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]

If i set counter to 1 (sleeping at least 1 sec before retrying, then all is good)

ioquatix commented 11 months ago

Thanks, if possible, are you able to make a MVP failing test case as a PR? This will help me greatly.

jsaak commented 11 months ago

I am confused, here is a shorter version:

require 'async'

Fiber.set_scheduler(Async::Scheduler.new)

class MyException < Exception
end

@fiber = Fiber.schedule do
  begin
    sleep 1
  rescue => e
    puts e.inspect
  end
end

Fiber.schedule do
  sleep 0.1
  Fiber.scheduler.raise(@fiber, MyException)
end

Am I doing something wrong?

kaorihinata commented 11 months ago

The output of the above snippet is:

/Users/thomas/.sandbox/lib/ruby/gems/3.2.0/gems/async-2.6.3/lib/async/scheduler.rb:132:in `transfer': MyException (MyException)
    from /Users/thomas/.sandbox/lib/ruby/gems/3.2.0/gems/async-2.6.3/lib/async/scheduler.rb:132:in `block'
    from /Users/thomas/.sandbox/lib/ruby/gems/3.2.0/gems/async-2.6.3/lib/async/scheduler.rb:154:in `kernel_sleep'
    from -:10:in `sleep'
    from -:10:in `block in <main>'
    from /Users/thomas/.sandbox/lib/ruby/gems/3.2.0/gems/async-2.6.3/lib/async/task.rb:160:in `block in run'
    from /Users/thomas/.sandbox/lib/ruby/gems/3.2.0/gems/async-2.6.3/lib/async/task.rb:330:in `block in schedule'

I can create a PR for this and @ all involved if you'd like. It looks like Fiber.scheduler.raise(@fiber, MyException) passes down to Fiber#raise on @fiber, so you could also probably shorten that line to simply @fiber.raise(MyException) as I believe Fiber#raise is the intended way to raise a Fiber level exception. I've confirmed that the output is the same.

Regarding the issue itself, while the backtrace notes where the Fiber is currently blocking, I'm not sure if an externally raised exception is bound by that scope, so I'm not sure if you're supposed to be able to catch it this way.

jsaak commented 11 months ago

To me it seems like a bug. If there is an exception, i need to be able to catch it. But I do not mind using something different. Fiber.scheduler.raise is a hack anyway.

My goal is to be able to stop a Fiber from another Fiber. (If possible using the ruby built in methods, to be compatible with other Fiber scheduler implementations) And I still do not know how to do it. See #263

kaorihinata commented 11 months ago

It gets weirder, oddly enough. Your example works just fine the moment you remove the custom exception class. It turns out the fact that the caller is a Fiber might not even be relevant.

This cannot be caught:

require 'async'

Fiber.set_scheduler(Async::Scheduler.new)

class CustomException < Exception
end

fibera = Fiber.schedule do
  begin
    sleep 5
  rescue => e
    puts("exception caught")
  end
end

Fiber.scheduler.raise(fibera, CustomException)

This can, and outputs "exception caught":

require 'async'

Fiber.set_scheduler(Async::Scheduler.new)

fibera = Fiber.schedule do
  begin
    sleep 5
  rescue => e
    puts("exception caught")
  end
end

Fiber.scheduler.raise(fibera, RuntimeError)

I'm using Ruby 3.2. I can create the PR if you'd like. Asking because I'm not the OP.

noteflakes commented 11 months ago

Am I doing something wrong?

A rescue without an explicit exception class will only catch exceptions that are descendants of StandardError. In your example, MyException is a descendant of Exception, which is the lowest-level exception class. To be able to rescue your exception, you can do any of the following:

ioquatix commented 11 months ago

Sorry my family has been sick I have not had time to review this issue.

kaorihinata commented 11 months ago

@noteflakes

Yep, that was it. Not a bug at all then, just rescue behaving differently than I expected. Tyvm.

jsaak commented 10 months ago

Thanks for the solution! Usually i do not find such surprises using ruby. Do you know why ruby behaves this way?

noteflakes commented 10 months ago

Do you know why ruby behaves this way?

This design allows lower-level exceptions, such as Interrupt and SystemExit (which are raised on SIGINT and SIGTERM signals respectively) to propagate to the top-level and be handled centrally by either custom app code or Ruby's default handlers.