rspec / rspec-support

Common code needed by the other RSpec gems. Not intended for direct use.
https://rspec.info
MIT License
99 stars 103 forks source link

ReentrantMutex hangs with fibers, even after 3.11.0 #552

Open rmosolgo opened 1 year ago

rmosolgo commented 1 year ago

Subject of the issue

Hi! I'm following up on a reported Rspec hangup with GraphQL-Ruby's "Dataloader," a fiber-based data batching library (Original bug report: https://github.com/rmosolgo/graphql-ruby/issues/4172). The original report also included a replication: https://github.com/khamusa/gql-dataloader-rspec-mocks-hanging/pull/1/commits/fd6c05b4394fe9be72960a5b7548f94caf8d1d71#diff-cd6746fe941432087bd4ad1ba284ce7e6d3f94ef62e3954e8c04fcc007ba3968R3

I see that there has been recent work on this in #503, etc, but even on 3.11.0, I see this stuck process.

I cloned the replication repo and triggered the reported bug, then inspected the stuck ruby process with rbspy snapshot, and here's what I saw:

Stack trace of stuck ruby process

``` block in spawn_fiber - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/dataloader.rb:304 block in run - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/dataloader.rb:183 block (3 levels) in coerce_arguments - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/member/has_arguments.rb:249 block in evaluate_selection - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:429 evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:530 after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:929 block in evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:529 call_method_on_directives - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:830 block (2 levels) in evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:518 trace - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/tracing.rb:67 block (3 levels) in evaluate_selection_with_args - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/execution/interpreter/runtime.rb:502 resolve - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:747 after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033 block in resolve - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:727 with_extensions - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:836 block (2 levels) in resolve - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/field.rb:723 public_send [c function] - (unknown):0 resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/relay_classic_mutation.rb:70 resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:116 after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033 block in resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:115 after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033 block (2 levels) in resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:113 after_lazy - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema.rb:1033 block (3 levels) in resolve_with_support - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/graphql-2.0.13/lib/graphql/schema/resolver.rb:112 public_send [c function] - (unknown):0 resolve - /Users/rmosolgo/code/gql-dataloader-rspec-mocks-hanging/spec/debugging_issue_spec.rb:24 block (2 levels) in define_proxy_method - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/method_double.rb:65 proxy_method_invoked - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/verifying_proxy.rb:162 proxy_method_invoked - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/method_double.rb:81 message_received - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/proxy.rb:367 message_received - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/proxy.rb:232 invoke - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:476 invoke_incrementing_actual_calls_by - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:630 call - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:765 map [c function] - (unknown):0 block in call - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-mocks-3.11.1/lib/rspec/mocks/message_expectation.rb:764 block (5 levels) in - /Users/rmosolgo/code/gql-dataloader-rspec-mocks-hanging/spec/debugging_issue_spec.rb:78 block in let - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:343 fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:182 fetch [c function] - (unknown):0 block in fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:181 synchronize - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:26 enter - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:36 lock [c function] - (unknown):0 ```

It seems to me like GraphQL-Ruby is executing a query, which happens inside a newly-created fiber, and then that query calls a mocked method (specifically, a method with a block-style mock), and during that block, it calls a let(...)-ed method, and during that call, the process hangs. (The let(...)ed method never returns.)

I expect this test to continue running, but is there another way it should be written? (Or, maybe something is out-of-sorts with GraphQL-Ruby's use of Fibers?)

Your environment

Steps to reproduce

git clone git@github.com:khamusa/gql-dataloader-rspec-mocks-hanging.git
cd gql-dataloader-rspec-mocks-hanging
git checkout bug-demo
bundle
bundle exec rspec

Expected behavior

The let(...) value should return and the test should continue running

Actual behavior

The test hangs forever, with the rbspy snapshot pointing at:

fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:182
fetch [c function] - (unknown):0
block in fetch_or_store - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-core-3.11.0/lib/rspec/core/memoized_helpers.rb:181
synchronize - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:26
enter - /Users/rmosolgo/.rbenv/versions/3.0.3/lib/ruby/gems/3.0.0/gems/rspec-support-3.11.0/lib/rspec/support/reentrant_mutex.rb:36
lock [c function] - (unknown):0
JonRowe commented 1 year ago

:wave: Seeing as you likely have knowledge of whats going on with fibers in the GraphQL package, can you setup a reproduction without the GraphQL library but just an equivalent usage of Fibers? Is Thread.current disrupted within your usage?

rmosolgo commented 1 year ago

Hey @JonRowe, sure thing, I just worked up a minimal reproduction here: https://github.com/rmosolgo/rspec-support/commit/8a32cffcf155f775821edee6de5529a9c7193907

Interestingly, I found that subject(...) was also required to replicate the problem -- if I moved that call into the test body, it didn't hang!

I don't think there's any messing with Thread.current in the code there.

mttkay commented 1 year ago

I don't this is a problem with ReentrantMutex; it is likely a duplicate of this issue with rspec-core: https://github.com/rspec/rspec-core/issues/2064 due how locking semantics changed with Ruby 3.

mttkay commented 1 year ago

I actually verified that ReentrantMutex is not the cause in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/102625#note_1156203743 by patching rspec-support to use Ruby's Monitor type, which is also reentrant.

Speaking of which, why does rspect-support not use Monitor to provide reentrancy?

pirj commented 1 year ago

Speaking of which, why does rspect-support not use Monitor to provide reentrancy?

Off the top of my head, Monitor was introduced after Ruby 1.8.7 that we still support in RSPec 3.

It’s an option to use Monitor, and fall back to ReentrantMutex, or just switch to Monitor in RSpec 4. Mind sending a PR to the 4-0-dev branch with your patch?

mttkay commented 1 year ago

Ah I see - makes sense :+1:

I wonder if this work is necessary at this point since it is being discussed to remove locks around memoized blocks completely in https://github.com/rspec/rspec-core/issues/2064

I would say let's wait for a decision about what should happen to threadsafe since that is the crux of the matter, not the lock implementation.

JonRowe commented 1 year ago

If you research the original development of this you'll find Monitor was actually the source of the original ReentrantMutex definition, with the desired chunks being essentially vendored to remove the dependency on it (we consider anything that requires a require a dependency to be minimized to avoid environment poisoning).