Closed ioquatix closed 3 years ago
On Ruby 3 here is an example of how to produce the issue:
def mutex_failure
m = Mutex.new
f1 = Fiber.new do
m.lock
end
f2 = Fiber.new do
m.unlock
end
f1.resume
f2.resume
end
mutex_failure
# ./test.rb:39:in `unlock': Attempt to unlock a mutex which is locked by another thread/fiber (ThreadError)
# from ./test.rb:39:in `block in mutex_failure'
Okay, with that basic semantic, I managed to figure out the sequence of events which cause this to fail:
#!/usr/bin/env ruby
require 'rspec/support/reentrant_mutex'
def rspec_failure
m = RSpec::Support::ReentrantMutex.new
f1 = Fiber.new do
m.synchronize do
puts "f1 A"
Fiber.yield
puts "f1 B"
end
end
f2 = Fiber.new do
m.synchronize do
puts "f2 A"
# Fiber.yield
f1.resume
puts "f2 B"
end
end
f1.resume
f2.resume
f2.resume
end
rspec_failure
gives
f1 A
f2 A
f1 B
f2 B
/Users/samuel/.gem/ruby/3.0.0/gems/rspec-support-3.10.2/lib/rspec/support/reentrant_mutex.rb:40:in `unlock': Attempt to unlock a mutex which is locked by another thread/fiber (ThreadError)
from /Users/samuel/.gem/ruby/3.0.0/gems/rspec-support-3.10.2/lib/rspec/support/reentrant_mutex.rb:40:in `exit'
from /Users/samuel/.gem/ruby/3.0.0/gems/rspec-support-3.10.2/lib/rspec/support/reentrant_mutex.rb:25:in `synchronize'
from ./test.rb:17:in `block in rspec_failure'
cc @eregon I believe we discussed that there should be no use case where a mutex is locked from one fiber and released on another, but maybe this is a valid use case? Does this mean we should relax the condition, or can you see some other solution?
#!/usr/bin/env ruby
require 'monitor'
def monitor_failure
m = Monitor.new
f1 = Fiber.new do
m.synchronize do
puts "f1 A"
Fiber.yield
puts "f1 B"
end
end
f2 = Fiber.new do
m.synchronize do
puts "f2 A"
# Fiber.yield
f1.resume
puts "f2 B"
end
end
f1.resume
f2.resume
f2.resume
end
monitor_failure
Maybe not a general bug with RSpec, we should consider moving to bugs.ruby-lang.org.
The m.synchronize
should probably block (forever in this example) because f1 has the monitor.
Might a bug of the monitor stdlib.
if you replace Monitor.new
with Mutex.new
then CRuby detects a deadlock (correct behavior), and would otherwise wait if it does not detect the deadlock.
The example code here does have an intrinsic problem: a mutex/monitor section is supposed to be atomic but here the atomicity is broken by switching fibers (specifically f1 A happens without f1 B, a data structure could become corrupted if those puts
are replaced by mutating state/ivars).
Fiber.yield
inside synchronize
is probably best to avoid in general, but I can imagine that's not always possible.
After more thoughts, this example does not acquire the lock in a reentrant way (=on the same execution stack, fibers have different stacks), so it should behave exactly the same as Mutex, and the difference is a bug of Monitor (and ReentrantMutex).
So it should raise a deadlock error, or if there is a Fiber scheduler active then schedule another Fiber.
@ioquatix Could you create an issue on bugs.ruby-lang.org?
The solution here is to change Thread.current
to Fiber.current
right?
Okay so here is the initial problem:
I tried adding the following code as a hack:
require 'rspec/support/reentrant_mutex'
require 'fiber'
module RSpec
module Support
class ReentrantMutex
def enter
@mutex.lock if @owner != Fiber.current
@owner = Fiber.current
@count += 1
end
end
end
end
Now I get other issues I need to figure out.
Okay, I actually think using Fiber.current
works correctly.
By the way, on Ruby 2.7.2 I'm getting a dead fiber called (FiberError)
instead of Attempt to unlock a mutex which is locked by another thread/fiber
. This is exactly what I would expect from the second f2.resume
call. Is the second call necessary? On Ruby 3.0 it fails without the second f2.resume
.
And I don't see that this situation changes if we replace Thread.current
with Fiber.current
in ReentrantMutex
.
To illustrate this:
require 'fiber'
class ReentrantMutex
def initialize
@owner = nil
@count = 0
@mutex = Mutex.new
end
def synchronize
enter
yield
ensure
exit
end
private
def enter
@mutex.lock if @owner != Fiber.current
@owner = Fiber.current
@count += 1
end
def exit
@count -= 1
return unless @count == 0
@owner = nil
@mutex.unlock
end
end
m = ReentrantMutex.new
f1 = Fiber.new do
m.synchronize do
puts "f1 A"
Fiber.yield
puts "f1 B"
end
puts "f1 finish"
end
f2 = Fiber.new do
m.synchronize do
puts "f2 A"
# Fiber.yield
f1.resume
puts "f2 B"
end
puts "f2 finish"
end
f1.resume
f2.resume
Fiber.current
:$ ruby --version
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin18]
$ ruby 1.rb
f1 A
1.rb:29:in `unlock': Attempt to unlock a mutex which is locked by another thread/fiber (ThreadError)
from 1.rb:29:in `exit'
from 1.rb:14:in `ensure in synchronize'
from 1.rb:14:in `synchronize'
from 1.rb:45:in `block in <main>'
1.rb:20:in `lock': No live threads left. Deadlock? (fatal)
1 threads, 1 sleeps current:0x00007fec1b409480 main thread:0x00007fec1b409480
* #<Thread:0x00007fec1c07c0f0 sleep_forever>
rb_thread_t:0x00007fec1b409480 native:0x000000010ca9f5c0 int:0
from 1.rb:20:in `enter'
from 1.rb:11:in `synchronize'
from 1.rb:45:in `block in <main>'
$ ruby --version
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin18]
$ ruby 1.rb
f1 A
Traceback (most recent call last):
3: from 1.rb:45:in `block in <main>'
2: from 1.rb:11:in `synchronize'
1: from 1.rb:20:in `enter'
1.rb:20:in `lock': deadlock; recursive locking (ThreadError)
Thread.current
:$ ruby --version
ruby 3.0.0p0 (2020-12-25 revision 95aff21468) [x86_64-darwin18]
$ ruby 1.rb
f1 A
f2 A
f1 B
f1 finish
f2 B
1.rb:29:in `unlock': Attempt to unlock a mutex which is locked by another thread/fiber (ThreadError)
from 1.rb:29:in `exit'
from 1.rb:14:in `synchronize'
from 1.rb:45:in `block in <main>'
$ ruby --version
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-darwin18]
$ ruby 1.rb
f1 A
f2 A
f1 B
f1 finish
f2 B
f2 finish
So from my point of view, replacing Thread.current
with Fiber.current
:
Am I missing something?
The piece you are missing is that with a fiber scheduler, locking a mutex on one fiber will allow other fibers to be scheduled.
Good point. I struggle to understand it from rdoc what is the default behaviour for the default fiber scheduler, is it non-blocking or blocking?
There is no default fiber scheduler, trying to lock a mutex, transferring to a different fiber and then trying to take the lock again is an error without a scheduler in place.
Also, speaking of re-entrance, it's puzzling why we nullify @owner
. In a scenario like:
m.synchronize do
m.synchronize { ... } # ok, re-entry of the same fiber
Fiber.yield # other fiber locks the mutex. if the scheduled and the fiber are non-blocking, we proceed to the next line
m.synchronize { ... } # BOOM, we lock again, because the `@owner` is the other fiber now
end
transferring to a different fiber and then trying to take the lock again is an error without a scheduler in place
We don't set_scheduler
in the snippet above. Is Attempt to unlock a mutex which is locked by another thread/fiber
an expected error in this case?
The sequence is:
m
m
Abstracting out from ReentrantMutex
implementation, I would expect:
f2.resume
would resume it. No errors.yield
inside synchronize
The owner
is only reset if count == 0, so it's not reset in your example:
https://github.com/rspec/rspec-support/blob/d63133f478408c1d965e673b96ad10ef5a5d183f/lib/rspec/support/reentrant_mutex.rb#L38-L39
Your example is probably just a variant of @ioquatix's, Attempt to unlock a mutex which is locked by another thread/fiber
is a bug of ReentrantMutex
is lock/unlock are paired (or synchronize
is used).
https://github.com/rspec/rspec-support/pull/502#issuecomment-827075412 is the proper fix.
Just to follow up with everyone here, https://bugs.ruby-lang.org/issues/17827 was back ported and should be in the next 3.0.x release.
@ioquatix See https://github.com/rspec/rspec-support/pull/553, maybe you could help to write a test or describe how to that represents the Fiber scheduler case + ReentrantLock? The current test doesn't use a Fiber scheduler and so doesn't really test anything interesting on Ruby 3.2+.
Thanks, I can take a look.
Subject of the issue
Your environment
Steps to reproduce
This is a tricky issue to reproduce.
Essentially it looks like some of the assumptions made by
reentrant_mutex.rb
are no longer true in Ruby 3.0. I will review the code and try to give an update, but essentially it looks like trying to lock aReentrantMutex
from a different fiber on the same thread might be a problem.