rmosolgo / graphql-ruby

Ruby implementation of GraphQL
http://graphql-ruby.org
MIT License
5.38k stars 1.39k forks source link

Unable to find subscription with identifier: {"channel":"GraphqlChannel"} (RuntimeError) #5086

Closed steobrien closed 1 week ago

steobrien commented 2 months ago

Describe the bug

Breaking this out of a discussion on a similar issue, https://github.com/rmosolgo/graphql-ruby/issues/4702.

Very frequently – around a thousand times per day – we see this issue fire in Sentry in production. It's not clear if this aligns with user-facing defects, although we do hear occasional reports of real-time features unexpectedly stopping working until refresh.

Versions

graphql (1.12.19)
rails (7.1.4)
graphql-pro (1.26.5)
graphql-batch (0.4.3)
"react-relay": "^15.0.0"
"@rails/actioncable": "^6.1.4"
"graphql-ruby-client": "^1.14.0"

GraphQL schema

Our subscriptions are handled by ActionCable:

class TinesSchema < GraphQL::Schema
  use GraphQL::Subscriptions::ActionCableSubscriptions, broadcast: true
  # …
end

We have a straightforward implementation of GraphqlChannel, following https://graphql-ruby.org/api-doc/2.0.14/GraphQL/Subscriptions/ActionCableSubscriptions.html.

The frontend is powered by Relay, following the ActionCable guide in https://graphql-ruby.org/javascript_client/relay_subscriptions.

GraphQL query

We see these for various subscription queries, here's a random example:

subscription newLogSubscription($storyId:ID!) {
  newLog(storyId:$storyId) {
    log {
      action {
        hasLogs,
        id,
        working
      },
      createdAt,
      id,
      inboundEventId,
      level,
      message
    }
  }
}

Steps to reproduce

Unfortunately, I haven't found a way to reproduce the problem, even when I modify our client-side code to repeatedly dispose of a subscription, as discussed here: https://github.com/rmosolgo/graphql-ruby/issues/4702#issuecomment-2327277240.

Expected behavior

No exception.

Actual behavior

Here's the full stack trace we see, which appears to match https://github.com/rmosolgo/graphql-ruby/issues/4702.

Click to view exception backtrace ``` RuntimeError: Unable to find subscription with identifier: {"channel":"GraphqlChannel"} (RuntimeError) from actioncable (7.1.4) lib/action_cable/connection/subscriptions.rb:77:in `find' from actioncable (7.1.4) lib/action_cable/connection/subscriptions.rb:49:in `remove' from actioncable (7.1.4) lib/action_cable/connection/subscriptions.rb:20:in `execute_command' from actioncable (7.1.4) lib/action_cable/connection/base.rb:100:in `block in handle_channel_command' from activesupport (7.1.4) lib/active_support/callbacks.rb:101:in `run_callbacks' from actioncable (7.1.4) lib/action_cable/connection/base.rb:99:in `handle_channel_command' from actioncable (7.1.4) lib/action_cable/connection/base.rb:92:in `dispatch_websocket_message' from actioncable (7.1.4) lib/action_cable/server/worker.rb:58:in `block in invoke' from activesupport (7.1.4) lib/active_support/callbacks.rb:121:in `block in run_callbacks' from actioncable (7.1.4) lib/action_cable/connection/tagged_logger_proxy.rb:28:in `tag' from actioncable (7.1.4) lib/action_cable/server/worker/active_record_connection_management.rb:16:in `with_database_connections' from activesupport (7.1.4) lib/active_support/callbacks.rb:130:in `block in run_callbacks' from actioncable (7.1.4) lib/action_cable/engine.rb:79:in `block (4 levels) in ' from activesupport (7.1.4) lib/active_support/execution_wrapper.rb:92:in `wrap' from actioncable (7.1.4) lib/action_cable/engine.rb:74:in `block (3 levels) in ' from activesupport (7.1.4) lib/active_support/callbacks.rb:130:in `instance_exec' from activesupport (7.1.4) lib/active_support/callbacks.rb:130:in `block in run_callbacks' from activesupport (7.1.4) lib/active_support/callbacks.rb:141:in `run_callbacks' from actioncable (7.1.4) lib/action_cable/server/worker.rb:41:in `work' from actioncable (7.1.4) lib/action_cable/server/worker.rb:57:in `invoke' from actioncable (7.1.4) lib/action_cable/server/worker.rb:52:in `block in async_invoke' from concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:359:in `run_task' from concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:350:in `block (3 levels) in create_worker' from :187:in `loop' from concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:341:in `block (2 levels) in create_worker' from concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:340:in `catch' from concurrent-ruby (1.3.4) lib/concurrent-ruby/concurrent/executor/ruby_thread_pool_executor.rb:340:in `block in create_worker' ```
rmosolgo commented 2 months ago

Hey, thanks for sharing all these details. Here are some additional points I can glean from the backtrace:

So, I think it's some weirdness with unsubscribing. In the library, it could be this code which might make it send "unsubscribe" multiple times:

https://github.com/rmosolgo/graphql-ruby/blob/dda906bd9e7eadb54dba9a133b11f196043e5498/lib/graphql/subscriptions/action_cable_subscriptions.rb#L226-L227

For context, do any of your subscriptions terminate with #unsubscribe?

steobrien commented 2 months ago

For context, do any of your subscriptions terminate with #unsubscribe?

No. We don’t call that, or otherwise terminate subscriptions, anywhere on the backend.

Here’s how we manage subscriptions from the frontend, abridged and simplified to omit irrelevant details:

  1. Set up the relay environment with your builtin subscription handler
    
    import * as ActionCable from "@rails/actioncable";
    import createRelaySubscriptionHandler from "graphql-ruby-client/subscriptions/createRelaySubscriptionHandler";
    import { Environment, Network, RecordSource, Store } from "relay-runtime";

const cable = ActionCable.createConsumer("/cable"); const subscriptionHandler = createRelaySubscriptionHandler({ cable });

const environment = new Environment({ network: Network.create(fetchQuery, subscriptionHandler), store: new Store(new RecordSource()) });


2. Define GraphQL subscriptions
```ts
import { graphql, requestSubscription } from "react-relay";

const subscription = graphql`
  subscription newLogSubscription($storyId: ID!) {
    newLog(storyId: $storyId) {
      …
    }
  }
`;

export const newLogSubscription = (variables) => requestSubscription(environment, {subscription, variables})
  1. Set up subscriptions in relevant components with an in-house React hook

    useFocusedSubscriptions(
    () => [newLogSubscription({storyId: props.storyId})],
    [props.storyId]
    );
  2. Internally, the hook activates and deactivates based on component mount

    useEffect(() => {
    const subscriptions = getSubscriptions()
    
    return () => {
    subscriptions.forEach((subscription) => subscription.dispose())
    }
    })

I'm not 100% sure what code path the .dispose() calls go down – I guess that's ending up here? https://github.com/rmosolgo/graphql-ruby/blob/4836730486ea77714a8343ad5805cd4cdff0dc3e/javascript_client/src/subscriptions/createActionCableHandler.ts#L69

rmosolgo commented 2 months ago

Hey, thanks for following up on this. Everything there certainly looks good to me.

I've been working on improving GraphQL-Ruby's tests for ActionCableSubscriptions but got a bit stuck (https://stackoverflow.com/questions/78999661/why-doesnt-this-actioncable-broadcast-work-in-test). I'm hoping that once I get the suite running, I can replicate this issue in test and really fix it!

rmosolgo commented 1 month ago

Hey, I've continued to investigate this but working through it server-side didn't give me any leads :S

I've worked up a change which will disable double-unsubscribes: #5109.

Do you think you could copy that into your project and see if it makes this problem go away? (I expect it won't help, because I can't find any mechanism that would cause unsubscribe to be called twice. But obviously something is telling ActionCable to terminate a connection that has already been terminated ... so I think it is worth a try!)

steobrien commented 1 month ago

On it!

steobrien commented 1 month ago

Hey there, we tried this out in production for approximately 24 hours, but it didn't resolve the exception we're seeing.

I feel like you've gone above and beyond here already… I think we should try monkeypatching a few things on our end next to see if we can isolate the problem further. It feels to me like this could even be a problem within ActionCable itself.

rmosolgo commented 1 month ago

Bummer ... I'm sorry to hear it didn't make a difference.

within ActionCable itself

I pondered that too, but afaict, nobody has encountered this outside the GraphQL-Ruby world. Do you see any evidence of it out there?

Another possibility for debugging further would be to replicate it in a sample app. But then again, we don't know what actually causes it, do we? 😖