Closed Nealsoni00 closed 9 months ago
Sorry for the trouble!
My first guess is that limiter_key: ...
is getting left out when context is serialized for storage, then parsed to re-evaluate the subscription. What subscription backend are you using? (Ably, Pusher, ActionCable, something else?)
If you're using Pusher or Ably, the thing to check is dump_context
and load_context
: https://graphql-ruby.org/subscriptions/ably_implementation#serializing-context. Do those methods include limiter_key:
?
If that doesn't help, could you please share the full backtrace of the error? That would give me some more clarity about what GraphQL-Ruby is doing when it encounters this problem. Maybe there's a spot I overlooked!
Appreciate the help! I'm using ActionCable using AnyCable with the RPC bridge to connect between the two services. I am including the context value in both the graphql_channel
and graphql_controller
which are the only places we define context in the graph layer. How would I go about dumping context with the ActionCable implementation?
The ActionCable implementation in GraphQL-Ruby doesn't require dumping context because it's stored in memory for the duration of the subscription -- it's never written to storage. But AnyCable writes subscription context to storage (see point 3: https://github.com/anycable/graphql-anycable/#data-model). It looks like it uses query.context.to_h
, so I would certainly expect limiter_key:
to be persisted if it was present in the query context (source.
Could you please share the full backtrace of the error? That would help me understand what GraphQL-Ruby is doing when it runs into this error. Maybe some other part of the system needs an update!
It hasn't happened for a while but it just happened again and here is the trace:
It is incredibly inconsistent
Thanks for sharing the stack trace. I went digging into several lines, but everything looks in order to me.
You mentioned it hasn't happened in a while -- has it gotten less frequent over time?
One possibility is that there are some long-running subscriptions which started before the limiter key was added to context, so even when updates happen properly, there isn't that key in context. That might explain why it's happening less often (if it is), because those long-running subscriptions are finally terminating, and fewer and fewer are left over time.
That said, I wouldn't expect that to be a problem. With ActionCable, redeploying the Rails app causes all connections to be disconnected and reconnected, which also re-establishes all GraphQL subscriptions. I don't know if AnyCable works the same way or not, though.
In any case, here's a work-around you might try:
class CustomRuntimeLimiter < GraphQL::Enterprise::RuntimeLimiter
def limiter_key(query)
if query.subscription_update? && query.context[:limiter_key].nil?
# Somehow log this to keep an eye on frequency or to help debug
"mysterious-no-limiter-key-subscription-update"
else
super
end
end
def limit_for(key, query)
if key == "mysterious-no-limiter-key-subscription-update"
nil # don't apply a limit here -- for some reason this happens from time to time but we should permit it
else
super
end
end
end
Then, update your schema class:
use CustomRuntimeLimiter, ...
That would cause these updates to run without an error (doc: https://graphql-ruby.org/limiters/runtime#customization)
Would a work-around like that work in your case?
This has all been in local development and on development environment (not released to production) and only happens on some users. We have destroyed the socket connection and restarted all servers multiple times and the issue persists randomly.
I'll implement ur suggestion and a temporary workaround and create an error in datadog when it happens. Appreciate it!
This has been the issue preventing me from implementing Graphql Enterprise into production.
Anycable does not reconnect all websockets on release, rather the RPC (which resolves all subscriptions) gets a rolling release and the wesockets are maintained. We've had limmiter_key
set on both controllers for weeks in all environments so i'd be surprised if the cache was the issue.
Hey, just curious -- did that patch help shed any light on what's going on here?
I believe the issue was due to anycable graphql subscriptions not having a default expiry and it resolving really old subscriptions with old context values that did not include the limiter_key
. Thus, we had to clear those old ghost
subscriptions manually.
I don't believe I need to use this fix now as there should only be active subscriptions with the proper limiter_key
set.
Though for other users that implement this limiter in the future they should verify all contexts in memory have this limiter_key
set before the server requires it with the use GraphQL::Enterprise::ActiveOperationLimiter
Ah, interesting ... thanks for sharing what you found! I'll keep this in mind in case I get any future reports of it.
I have put a
limiter_key
in 2 places:graphql_channel
andgraphql_controller
as so:I'm using both graphql enterprise limiters:
However, when I run a mutation that triggers a subscription, i get the following error:
Code that causes error:
Would love any help!