Closed cretz closed 5 days ago
Nice find. I didn't know about this! haha.
I'll investigate the issue today.
I could not reproduce from the given example.
samuel@aiko ~/D/s/async (main)> ruby --version
ruby 3.3.4 (2024-07-09 revision be1089c8ec) [x86_64-linux]
samuel@aiko ~/D/s/async (main)> cat ./test.rb
#!/usr/bin/env ruby
require "async"
Async do
puts "Running on: #{RbConfig::CONFIG['host_os']}"
res = system 'echo', 'hello'
puts "Result 1: #{res.inspect}"
res = system 'does-not-exist', 'hello'
puts "Result 2: #{res.inspect}"
end
samuel@aiko ~/D/s/async (main)> bundle exec ./test.rb
Running on: linux
hello
Result 1: true
Result 2: nil
Can you confirm I've got the correct reproduction script?
I tried to reproduce the original error but couldn't:
#!/usr/bin/env ruby
require "minitest/autorun"
require "async"
class Test < Minitest::Test
BIG_TEXT_A = "The quick brown fox jumps over the lazy dog.\n" * 10
BIG_TEXT_B = BIG_TEXT_A.dup.sub("quick", "slow")
def test_diff
Async do |task|
assert_equal(BIG_TEXT_A, BIG_TEXT_B)
end
end
end
I added some instrumentation to the scheduler process_wait
hook:
# Wait for the specified process ID to exit.
# @parameter pid [Integer] The process ID to wait for.
# @parameter flags [Integer] A bit-mask of flags suitable for `Process::Status.wait`.
# @returns [Process::Status] A process status instance.
# @asynchronous May be non-blocking..
def process_wait(pid, flags)
return @selector.process_wait(Fiber.current, pid, flags).tap{|result|
Fiber.blocking{$stderr.puts "process_wait(#{pid}, #{flags}) -> #{result.inspect}"}
}
end
It appears to work okay:
> bundle exec ./test.rb
Run options: --seed 20496
# Running:
process_wait(927854, 0) -> #<Process::Status: pid 927854 exit 127>
process_wait(927855, 0) -> #<Process::Status: pid 927855 exit 0>
process_wait(927856, 0) -> #<Process::Status: pid 927856 exit 1>
F
Finished in 0.002793s, 358.0683 runs/s, 358.0683 assertions/s.
1) Failure:
Test#test_diff [test.rb:12]:
--- expected
+++ actual
@@ -1,4 +1,4 @@
-"The quick brown fox jumps over the lazy dog.
+"The slow brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
1 runs, 1 assertions, 1 failures, 0 errors, 0 skips
Is it possible something in your test suite is monkey patching Kernel::system
?
Interesting, this is breaking for me:
cretz@DESKTOP:~/work/temp$ cat ./test.rb
require 'async'
Async do
puts "Running on: #{RbConfig::CONFIG['host_os']}"
res = system 'echo', 'hello'
puts "Result 1: #{res}"
res = system 'does-not-exist', 'hello'
puts "Result 2: #{res}"
end
cretz@DESKTOP:~/work/temp$ bundle exec ruby --version
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [x86_64-linux]
cretz@DESKTOP:~/work/temp$ bundle exec ruby ./test.rb
Running on: linux
hello
Result 1: true
It hangs right there. I am not doing any monkey patching. This was also hanging for me inside of a GitHub action for Ubuntu and Ruby 3.3 (but not 3.1 I don't think). I will do some more investigation, possibly including a simple repository that replicates the issue with a simple GitHub action, and get back. (may be a bit, I am traveling)
I was also able to make it hang on that.
# issue-351.rb
require 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'async'
end
Async do
puts "Running on: #{RbConfig::CONFIG['host_os']}"
res = system 'echo', 'hello'
puts "Result 1: #{res}"
res = system 'does-not-exist', 'hello'
puts "Result 2: #{res}"
end
My output was: note where I hit Ctrl-C and it got output, and then the first word Result 2:
$ ruby issue-351.rb
Running on: linux
hello
Result 1: true
^CResult 2:
/home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:317:in `select': Interrupt
from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:317:in `run_once!'
from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:353:in `run_once'
from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:408:in `block in run'
from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:386:in `block in run_loop'
from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:383:in `handle_interrupt'
from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:383:in `run_loop'
from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/async/scheduler.rb:407:in `run'
from /home/olle/projects/slask/tmp/devbox/gems/gems/async-2.18.0/lib/kernel/async.rb:34:in `Async'
from issue-351.rb:8:in `<main>'
See Bundler inline docs.
lol, I cannot reproduce, I tried IO_EVENT_SELECTOR=Select
too.
Okay, can you please tell me:
uname -a
ruby -v
For me it's:
samuel@aiko ~/D/tmp> uname -a
Linux aiko 6.11.6-arch1-1 #1 SMP PREEMPT_DYNAMIC Fri, 01 Nov 2024 03:30:41 +0000 x86_64 GNU/Linux
samuel@aiko ~/D/tmp> ruby -v
ruby 3.3.3 (2024-06-12 revision f1c7b6f435) [x86_64-linux]
samuel@aiko ~/D/tmp> $SHELL -v
fish, version 3.7.1
In Async v2.19, I've introduced more debug output: https://socketry.github.io/async/releases/index#async::scheduler-debugging
Can one of you run the hanging example with CONSOLE_LEVEL=debug
environment variable set, and show me the full output?
If it's not too verbose, it might also be useful to see the output generated by RUBYOPT=-d ...
.
Thanks.
This appears to be a problem in Ruby process.c
which can fork with an active scheduler without cleaning up correctly. I think at best we can fix this in Ruby 3.4 and maybe backport the fix. I'll need to investigate further.
See https://github.com/ruby/ruby/pull/12052 for a possible fix.
Okay, I believe I've found the root cause, a race condition in the process wait code.
I've released io-event
v1.7.4 which appears to fix this issue. Please let me know if you are still having issues.
If you run
system
insideAsync
with a non-existent executable on Linux with Ruby 3.3 it hangs. This behavior is confirmed in 2.17.0 and 2.18.0 (I didn't check any earlier than that).This synchronous code works:
And outputs:
But this code does not:
This hangs on the second
system
call. This was discovered after a deep dive of whyMinitest
assert_equal
was hanging. It turns out, Minitest usesdiff
on the system when reporting anassert_equal
failure, but first checks forgdiff
which we don't have, ref: https://github.com/minitest/minitest/blob/e4417c5b13ab917a60c218700041152136bc07c8/lib/minitest/assertions.rb#L33. So we have toMinitest::Assertions.diff = 'diff'
to getAsync
-based tests to not break here.(sorry if this is already reported, I could not find an issue)