socketry / async

An awesome asynchronous event-driven reactor for Ruby.
MIT License
2.12k stars 86 forks source link

puts '' inside Async block causes crash #228

Closed emiltin closed 1 year ago

emiltin commented 1 year ago

Hi, I get a crash on Mac with this snippet:

require 'async'
Async { puts '' }

=>

config/private/async.rb:4: [BUG] rb_sys_fail_path_in(io_writev, <STDOUT>) - errno == 0
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]

-- Crash Report log information --------------------------------------------
   See Crash Report log file in one of the following locations:             
     * ~/Library/Logs/DiagnosticReports                                     
     * /Library/Logs/DiagnosticReports                                      
   for more details.                                                        
Don't forget to include the above Crash Report log file in bug reports.     

-- Control frame information -----------------------------------------------
c:0007 p:---- s:0026 e:000025 CFUNC  :write
c:0006 p:---- s:0023 e:000022 CFUNC  :puts
c:0005 p:---- s:0020 e:000019 CFUNC  :puts
c:0004 p:0005 s:0015 e:000014 BLOCK  config/private/async.rb:4
c:0003 p:0010 s:0012 e:000011 BLOCK  /Users/emiltin/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.5.0/lib/async/task.rb:158
c:0002 p:0008 s:0009 e:000007 BLOCK  /Users/emiltin/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.5.0/lib/async/task.rb:310 [FINISH]
c:0001 p:---- s:0003 e:000002 DUMMY  [FINISH]

-- Ruby level backtrace information ----------------------------------------
/Users/emiltin/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.5.0/lib/async/task.rb:310:in `block in schedule'
/Users/emiltin/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/async-2.5.0/lib/async/task.rb:158:in `block in run'
config/private/async.rb:4:in `block in <main>'
config/private/async.rb:4:in `puts'
config/private/async.rb:4:in `puts'
config/private/async.rb:4:in `write'

-- C level backtrace information -------------------------------------------
/Users/emiltin/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_vm_bugreport+0x9a0) [0x1036644d8]
/Users/emiltin/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_bug_without_die+0x15c) [0x1034a0384]
/Users/emiltin/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_bug+0x1c) [0x103670f40]
/Users/emiltin/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_syserr_new_path_in.cold.1+0x58) [0x10367125c]
/Users/emiltin/.rbenv/versions/3.2.2/lib/libruby.3.2.dylib(rb_syserr_new_path_in+0xcc) [0x1034a48c4]

Only the empty string '' causes a crash.

I guess this indicates a problem with CRuby?

Versions:

code/rsmp (supervisor_debug) % ruby -v
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [arm64-darwin22]

code/rsmp (supervisor_debug) % gem list async

*** LOCAL GEMS ***

async (2.5.0)
async-io (1.34.3)
ioquatix commented 1 year ago

Hmm, thanks for this report, I'll take a look. I may not have time until after RubyKaigi.

emiltin commented 1 year ago

Sure, I see that you have a talk 👍 I tried reintalling Ruby 3.2.2 (with rbenv), but still get the crash.

kaorihinata commented 1 year ago

It’s oddly specific. It only occurs when the target of $stdout.puts or $stderr.puts is a terminal, so using a file doesn’t crash, and using another character device like /dev/null doesn’t crash either. It also only occurs with Ruby 3.2.x, but I’ve been able to reproduce it using a Linux container as well, so it’s probably not just macOS.

These crash:

printf "require 'async';Async { puts '' }" | bundle exec ruby printf "require 'async';Async { \$stdout.puts '' }" | bundle exec ruby printf "require 'async';Async { \$stderr.puts '' }" | bundle exec ruby printf "require 'async';Async { puts '' }" | bundle exec ruby >/dev/stdout printf "require 'async';Async { puts '' }" | bundle exec ruby >/dev/stderr

These do not:

printf "require 'async';Async { puts '' }" | bundle exec ruby >output.txt printf "require 'async';Async { puts '' }" | bundle exec ruby >/dev/null

<< and #write work just fine for both.

ioquatix commented 1 year ago

Haha that's super weird lol, after RubyKaigi this week, I will take a look.

ioquatix commented 1 year ago

puts nil also cause crash.

kaorihinata commented 1 year ago

I believe that's because nil coerces to "". You can also crash it with:

require 'async'

class Foo
  def to_s
    ""
  end
end

tmp = Foo.new

Async { puts tmp }
ioquatix commented 1 year ago

I think I found the issue.

kaorihinata commented 1 year ago

I just looked at rb_writev_internal another time and went, "Oh..."

kaorihinata commented 1 year ago

Okay, yeah. That's an oops. I assume you're talking about the loop.

Edit:

Assuming my understanding of this section is correct:

--- work/ruby-heads-master/io.c 2023-05-13 02:59:15
+++ io.c    2023-05-13 11:27:31
@@ -1329,13 +1329,15 @@
 {
     VALUE scheduler = rb_fiber_scheduler_current();
     if (scheduler != Qnil) {
+        ssize_t retval = 0;
         for (int i = 0; i < iovcnt; i += 1) {
             VALUE result = rb_fiber_scheduler_io_write_memory(scheduler, fptr->self, iov[i].iov_base, iov[i].iov_len, 0);

             if (!UNDEF_P(result)) {
-                return rb_fiber_scheduler_io_result_apply(result);
+                retval += rb_fiber_scheduler_io_result_apply(result);
             }
         }
+        return retval;
     }

     struct io_internal_writev_struct iis = {

Upstream's responsibility though.

ioquatix commented 1 year ago

Yes, that looks like the root cause. Thanks for your proposed patch. Actually I wrote that code originally so it's my fault. I'll see if I can fix it by updating io-event but it might require a new Ruby version.

ioquatix commented 1 year ago

puts nil is generating an iov_len of zero, invoking write with length 0. Basically, it should be invalid, and it's partly io.c that's wrong. Maybe we can just ignore this case in io-event gem, i.e. a write of zero just returns 0 instead of error.

ioquatix commented 1 year ago

Okay the first patch fixes Ruby's IO. https://github.com/ruby/ruby/pull/7806 and prevents IO#puts from generating zero length iovs.

kaorihinata commented 1 year ago

Ahh, I see. Now that I look at the patch, and reread io_binwritev_internal as well, I understand why the loop wasn’t necessary, but good god does a lot go into a single call of rb_io_puts.

ioquatix commented 1 year ago

The semantics of writev are (unexpectedly) more complex than write.

I heard some implementations of writev just make a new buffer, copy everything together, and call write. So some developer may overestimate the value of writev. I asked @axboe about this and he mentioned, IIRC, that writev is worse than write when using io_uring. However, I'm sure it's difficult problem to generalise about.

In any case, I agree "good god does a lot go into a single call of rb_io_puts" and perhaps we should try to simplify the code - we could consider removing writev.

axboe commented 1 year ago

writev is always worse than write, regardless of whether you use io_uring or not. writev needs to copy an iovec, which write does not. Internally in the kernel, outside of the copy overhead, this means that (until recently) it would also use an ITER_IOVEC when passed down the stack, which are more expensive to advance and iterate.

For io_uring specifically, a vectored write also means that io_uring now has to maintain that state. If things complete inline (eg without needing poll deferral), the cost overhead is the same as writev(2) in that it has to do that extra copy. If io_uring does not need defer the request, then it now has to allocate memory to store this iovec state, and free it at the end. This all adds up to more overhead.

IOW - don't use vectored IO, unless you have to. Lots of applications seem to happily just use vectored IO by default even if they are just using a single segment, this is a bad idea. Obviously there are cases where you'd want to use single segment vectored IO, for things like sendmsg() for example. But if that's not the case, don't do it.

ioquatix commented 1 year ago

@axboe just to clarify, when you say copy, you mean copy the iov_data+iov_len buffer? Or do you mean copy the array of struct iov itself?

axboe commented 1 year ago

Just the iovec itself, not the data. io_uring has to ensure that any "meta data" related to the IO is stable beyond submit, so it has to copy in and store the vec as it cannot reimport it safely later on without violating that requirement.

axboe commented 1 year ago

Work went into the 6.4 kernel to improve this situation a bit, most notably this merge:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b9dff2195f8a5847fad801046b26955e05670d31

which converts single segment imports into ITER_UBUF, and Linus himself authored some commits improving the import speed of iovecs in general. Most for x86, but also this generic one:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=487c20b016dc48230367a7be017f40313e53e3bd

Doesn't change the advice I gave above, but it does reduce the overhead on the kernel side a bit in 6.4+ for that particular use case.

kaorihinata commented 1 year ago

Lots of applications seem to happily just use vectored IO by default even if they are just using a single segment, ...

That is deeply unfortunate.

ioquatix commented 1 year ago

I merged a fix in https://bugs.ruby-lang.org/issues/19640

I will remove the io_write hook in Async on versions of Ruby where it causes a crash.

ioquatix commented 1 year ago

I created https://bugs.ruby-lang.org/issues/19642 as a follow-up issue to remove vectored read/write from io.c.