Fix sporadic failures testing with JRuby

headius commented 11 months ago

A collection of fixes to make specs more robust or fix issues causing sporadic failures on JRuby.

headius commented 11 months ago

Existing commits up to this point fix the following:

  1) Concurrent::CyclicBarrier#number_waiting with waiting threads should be equal to the waiting threads count
     Failure/Error: expect(thread_join).not_to be_nil, thread.inspect
       #<Thread:0x40712ee9 /Users/headius/work/concurrent-ruby/spec/support/example_group_extensions.rb:32 aborting>
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-support-3.12.1/lib/rspec/support.rb:108:in `block in Support'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-support-3.12.1/lib/rspec/support.rb:117:in `notify_failure'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/fail_with.rb:35:in `fail_with'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/handler.rb:40:in `handle_failure'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/handler.rb:84:in `block in handle_matcher'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/handler.rb:27:in `with_matcher'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/handler.rb:76:in `handle_matcher'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/expectation_target.rb:78:in `not_to'
     # /Users/headius/work/jruby/lib/ruby/gems/shared/gems/rspec-expectations-3.12.3/lib/rspec/expectations/expectation_target.rb:106:in `not_to'
     # ./spec/spec_helper.rb:57:in `block in <main>'

I have at least one other intermittent failure I need to investigate:

  1) Concurrent ErlangActor on pool behaves like erlang actor asking timing out
     Failure/Error: raise NoActor.new(@Pid) if @Terminated.resolved?

     Concurrent::ErlangActor::NoActor:
       #<Concurrent::ErlangActor::Pid:0x7f58d87f terminated normally with true>
     Shared Example Group: "erlang actor" called from ./spec/concurrent/edge/erlang_actor_spec.rb:975
     # ./lib/concurrent-ruby-edge/concurrent/edge/erlang_actor.rb:706:in `ask'
     # ./lib/concurrent-ruby-edge/concurrent/edge/erlang_actor.rb:82:in `ask'
     # ./spec/concurrent/edge/erlang_actor_spec.rb:934:in `block in <main>'
     # org/jruby/RubyBasicObject.java:2614:in `instance_exec'
     # /home/runner/work/jruby/jruby/lib/ruby/gems/shared/gems/rspec-core-3.12.2/lib/rspec/core/example.rb:263:in `block in run'

headius commented 11 months ago

I discovered this PR and commits by @eregon omitting "flaky tests" on TruffleRuby some years ago. The ErlangActor failure above would be excluded by this.

https://github.com/ruby-concurrency/concurrent-ruby/pull/943

I can certainly add JRuby there, but it would be better to figure out why these tests are flaky and fix them.

headius commented 11 months ago

This error just showed up once on the Ruby 3.2 CI jobs for this PR:

  1) Concurrent::ReentrantReadWriteLock write lock wakes up waiting readers when the write lock is released
     Failure/Error: (1..3).each { |n| expect(threads[n].status).to eql "sleep" }

       expected: "sleep"
            got: "run"

       (compared using eql?)

headius commented 11 months ago

TR failure in https://github.com/ruby-concurrency/concurrent-ruby/actions/runs/6792582483 does not seem related to my changes.

bensheldon commented 11 months ago

TR failure in https://github.com/ruby-concurrency/concurrent-ruby/actions/runs/6792582483 does not seem related to my changes.

Agreed. Those look very racy and asserting that a thread is sleeping seems troublesome. I'd suggest rewriting those something like this:

  it "cannot be acquired when another thread holds a write lock" do
    latch   = CountDownLatch.new
    avoid_latch = CountDownLatch.new # expect this latch never to be reached
    threads = [
      in_thread { lock.acquire_write_lock; latch.count_down },
      in_thread { latch.wait; lock.acquire_write_lock; avoid_latch.count_down }
    ]
    expect { Timeout.timeout(1) { threads[0].join }}.not_to raise_error
    expect(threads[0]).to hold(lock).for_write
    expect(threads[1]).not_to hold(lock).for_write
    wait_up_to(0.2) { threads[1].status == 'sleep' }

    # more reliable expectations
    expect(avoid_latch.wait(0.2)).to be false
    expect(lock.try_write_lock).to be false
  end

But also could probably be in another PR---not that I am anyone to say one way or another.

headius commented 11 months ago

I'm not sure what would be causing the TR segfaults in https://github.com/ruby-concurrency/concurrent-ruby/actions/runs/6792582483 so I'l lleave that for TR folks to figure out. The other TR failure in https://github.com/ruby-concurrency/concurrent-ruby/actions/runs/6792654287/job/18466286918?pr=1012 might be another flaky test, as @bensheldon suggested above, but I'm not sure of the right fix and I have not seen it on JRuby yet.

If it's ok, I'd like to merge what we have here so JRuby can start to have some consistency in concurrent-ruby testing. If we see more sporadic failures, I'll open another issue.

ruby-concurrency / concurrent-ruby

Fix sporadic failures testing with JRuby #1012