rmosolgo / graphql-ruby

Ruby implementation of GraphQL
http://graphql-ruby.org
MIT License
5.37k stars 1.38k forks source link

Subscriptions #613

Closed rmosolgo closed 7 years ago

rmosolgo commented 7 years ago

Right now we accept a subscription root type, but we don't provide any special support for registering , managing or delivering them. With the RFC well on it's way, what should be added to this library to support a third-party subscription backend?

I don't think this library should include a subscription backend, except maybe an in-memory one for prototyping. What could we support here that would actually be widely useful? ActionCable is cool, but I haven't heard of anyone actually running it in production yet (and not many folks are on Rails 5 to begin with). Using Redis Pub/Sub directly would be cool, but I think delivering those subscriptions would require a Websocket server outside the Rails server. Another option is Pusher, which I currently use to deliver live updates and which manages subscription status & delivery "for you".

So, certainly to start, I'd like to figure out what's missing in order to support subscriptions over those various backends. (Or, perhaps it's impossible to generalize about them!)

theorygeek commented 7 years ago

I spent some time the past few days rigging up a subscription system using graphql-ruby, so here are some of my preliminary thoughts.

Problems

The first thing that I ran into when setting up subscriptions was essentially, "Here's a GraphQL document with a subscription query; what is the user subscribing to?" In other words, this is the process of mapping the fields in the subscription query back to events in the system.

Given this GraphQL document for example:

subscription {
  somethingHappened(id: "...") { stuff }
  somethingElseHappened(id: "...") { moreStuff }
}

I need to somehow know that the user just subscribed to somethingHappened and also somethingElseHappened, and it's also important to know how they subscribed to. That probably involves looking at the arguments passed in, but maybe it involves the query context.

I looked at this problem from a couple of angles:

The thing that kinda sucks about the dry-run approach is two things:

Here's another problem that I ran into pretty quickly: The arguments that the user supplied to the subscription might need to be transformed (quite a bit) before it gives me information that I can use as a hook to my underlying event system. Just an example, the user might be passing in global node ID's, but those global ID's are really only a concept inside of my GraphQL layer. My subscription layer is publishing messages using database ID's or other information, not node ID's.

A third problem I ran into was: What if the user puts multiple subscriptions into the GraphQL document? If I'm only firing the somethingHappened subscription, how can I execute just that part of the query (and not the somethingElseHappened bit)? Maybe go back to doing some AST kung-fu to slice the query apart?

And the final problem I ran into: If the subscription is in response to an event, how do I get event-specific data into the field resolver?

Solution

Pulling all of this together, it seems that the subscriptions feature should have these characteristics:

So perhaps an API for creating a subscription could look like this:

# Much like mutations, subscriptions seem like they'll have a decent amount of boiler-plate. So
# that can all get pulled into its own define API.
NewUserJoinedChatSubscription = GraphQL::Subscription.define do
  # Name is used to derive the name of the auto-generated object type
  name "NewUserJoinedChat"

  # Specify the arguments for the subscription field. Unlike mutations, these are first-class
  # arguments, rather than being wrapped inside an input object type.
  argument :channelId, !types.ID
  argument :friendsOnly, types.Boolean, default_value: true

  # This is an optional, special resolver. It's executed on the initial execution of the subscription
  # query, so that you can return any data that helps you figure out how to tie into your underlying
  # event source.
  # Default behavior is to just return the arguments that were passed in
  metadata_resolve -> (root_object, arguments, context) {
    { 
      channel_id: Node.item_id(arguments[:channelId]),
      user_id: context[:access_token].user_id,
      friends_only: arguments[:friendsOnly]
    }
  }

  # Specify the fields that are on the auto-generated object type
  return_field :newUser, UserType
  return_field :channel, ChannelType

  # Finally, specify a resolver
  resolve -> (root_object, arguments, context) {
    # normal resolve that we all know and love
  }
end

Once you've defined a subscription using that API, you attach it to your root subscription type as a field, the same way as you do with a mutation.

The real magic happens when you execute a subscription query. Instead of getting your normal response, what you get is instead an array of object that return the metadata about the user's subscription:

subscription_query = <<~GRAPHQL
  subscription {
    somethingHappened(id: "...") { ...f1 }
  }

  fragment f1 on SomethingHappenedPayload {
    stuff
Ā    otherStuff
  }
GRAPHQL

result = Schema.execute(subscription_query, variables: {}, context: {})
# => Array

# Here's how I imagine you could interact with result:
result.each do |item|
  # Get the name of the subscription
  item.subscription.name
  # => "SomethingHappened"

  # Get the metadata from the `metadata_resolve`. By default, it just returns the arguments:
  item.metadata
  # => { "id" => "..." }

  # Callable that you can use to execute just this particular part of the query
  # To inject event-specific data, you can override the root_value, or you can add more stuff
  # to the query context:
  item.execute(root_value: some_event, extra_context: { "event" => some_event })
  # => { "data" => { "somethingHappened" => { ... } }, "errors" => [] }
end

I think this would be enough to implement a subscription system for almost any transport. You can take the metadata you get from the initial run of the query and use it to figure out what events to listen to. And whenever the event happens, you can execute the corresponding query and send the results to the client through whatever transport mechanism.

The nice thing about the callable approach is that it's probably the easier thing to build, and I'm guessing it's suitable for all websocket-like use cases. The downside is that you might want your subscriptions to be durable (ie, maybe you shovel it into a database), or maybe an entirely different process is going to execute the query and transmit the results).

So an alternative to the callable approach would be to produce an already-sliced GraphQL document and set of variables, which could be easily serialized. And then sometime later you could just execute that sliced query with those variables.

Those are my preliminary thoughts. I'd be interested to hear what others think?

rmosolgo commented 7 years ago

Wow, thanks for the detailed post! Having spent some time on subscriptions myself, it's nice to hear a lot of the same challenges. Maybe that means we're on the right track šŸ˜†

First, here a few ways I've dealt with some of the questions posed above:

How do those sound to you?

Second, here are some questions that I haven't found a passable solution for:

Finally, I think it's very important to maintain this simple API for Schema#execute:

render json: MySchema.execute(query_str, ...)

Otherwise, must a user inspect incoming queries and determine whether they're subscriptions or not?

theorygeek commented 7 years ago

I took something similar to the dry run approach

The only sad thing about it is that you get "can't return null" errors if your subscription fields are non-null. ā˜¹ļø

Otherwise, must a user inspect incoming queries and determine whether they're subscriptions or not?

So... I realized that I was thinking about it very differently. Because for us, normal queries are POST'd to /queries, but subscriptions come through a websocket. So we always know if the query is normal or a subscription. In fact, it'd be weird if those got criss-crossed.

Finally, I think it's very important to maintain this simple API for Schema#execute:

Maybe the real feature here (that I'm after) is the ability to specify, during the call to Schema#execute, a different execution strategy. And then you could just pass one in that has this behavior. That'd probably allow the development of different ways of handling subscription queries, including the one where you get those callable objects I mentioned to execute just a single field of the original query.

But I could definitely see a different way of doing it if we built some tools for manipulating GraphQL documents (a la https://github.com/rmosolgo/graphql-ruby/issues/455).

Given a document that has an operation X, and that operation queries fields A, B, C, then you get three documents back (A', B', and C') which contain only the variable declarations, fragments, and field selections to query from each of those fields respectively.

Input:

query X($inputB: SomeInput!) {
  someAlias: fieldA { selections }
  fieldB(input: $inputB) { ...b }
}

fragment b on BType { moreSelections }

Slice it up:

GraphQL.slice_document(document, operation: "X")

Get two results:

query X_someAlias {
  someAlias: fieldA { selections }
}
query X_fieldB($inputB: SomeInput!) {
  fieldB(input: $inputB) { ...b }
}

fragment b on BType { moreSelections }

Then you could take a subscription query, slice it into the individual fields, and register each one as a subscription. And maybe this is enough information that you don't even have to execute it.

rmosolgo commented 7 years ago

"can't return null" errors

šŸ˜– maybe the same library improvement for skipping fields in the response can be applied in this case (instead of returning null)

subscriptions come through a websocket

:+1: That makes sense, that's how I sketched it out over ActionCable a while back too.

But now I'm looking at "what if you can't upgrade to Rails 5" (or what if ActionCable gives you the heebie jeebies) so you want to use a third-party pubsub provider (we use Pusher).

rmosolgo commented 7 years ago

A couple of thoughts as I work over at #672:

I'm still trying to figure which parts of ^^ above I should try to support from the framework vs leave to the application.

rmosolgo commented 7 years ago

I might be able to tease apart storage from transport ... hmmm

rmosolgo commented 7 years ago

I've posted the example I use for developing here:

https://gist.github.com/rmosolgo/ba31acf93f07f8007d99ba365a662d8f

I'll try to keep the gist up-to-date as I iterate šŸ˜¬

rmosolgo commented 7 years ago

Some other thoughts: if delivering a subscription means "unfreezing" query data & re-running the query, we can't currently support dynamic filtering.

I've advised people to pass a filter into Schema#execute (probably in their Rails controller action) but this isn't possible without custom user code.

I think the solution to this is to allow adding filters after initializing the query. This way, filters could be added during before_query instrumentation hooks. Besides supporting subscriptions, this seems like a better overall architecture because it's harder to accidentally omit a filter function.

brb šŸ˜¬

mull commented 7 years ago

Liking everything in here! @rmosolgo do you have a rough estimate when you think you would release it (even if it's beta?) We're building a new project right now and exploring the options we have in GraphQL. One of them is whether to do polling or subscriptions. Subscriptions are our long term goal but we thought we'd try something out relatively soon. If subscriptions are relatively close we would hold out on trying anything right now :)

rmosolgo commented 7 years ago

Hi @mull, thanks for taking a look!

My guess is end-of-August at the latest. There are a few more # TODOs over at #672, which I'll finish on the upcoming graphql-ruby hack day at the latest (tentatively scheduled for Aug 25, more info coming soon).

Besides that, there are a few other folks implementing subscriptions on their own, so those might be released even sooner!

mull commented 7 years ago

@rmosolgo thanks for the answer! We love the library and will be purchasing the pro license in a few hours :heart:

mgwidmann commented 7 years ago

There are some considerable issues to deal with when it comes to even a small bit of scaling. Think of 1000 connected users from 5 different platforms each using a different subscription query. Will you render the document 5 times or 1000? What happens if 500 subscribe to server A and 500 subscribe to server B?

Absinthe (in Elixir) is just finishing getting subscriptions completed and will be out in their next release. It may help to learn how other implementations handled the issues you've come across (as well as bring to light new ones you haven't yet thought of). Sangria (in Scala) solved things in a similar way.

You can see their explanation here of the issues they ran into here: https://youtu.be/d2qNlXtpWXM?t=25m36s

Just some food for thought. Thanks for all your work on this gem!

rmosolgo commented 7 years ago

Thanks for the link @mgwidmann ! I love having a looking into other implementations, there are some really smart folks with good ideas there šŸ˜„

I'll share some of my thoughts on these specific topics:

Think of 1000 connected users from 5 different platforms each using a different subscription query. Will you render the document 5 times or 1000?

My priorities are:

  1. Safety by default: out of the box, we should prevent people from accidentally leaking unauthorized data into subscription responses
  2. Efficiency: efficient storage, execution and runtime, within the bounds of safety

So, the default implementation will be to handle each subscription separately: load the document, run the subscription, transport the result, x1000.

One thing that concerns me is that many Rails apps use Thread.current for "global" scope, such as the logged-in user or the logged-in organization. Then, other parts of the app use this to scope DB access or record metrics. If we ran queries concurrently, Thread.current would not be accurate.

For someone with this kind of volume, another scaling approach might be to use a cache for data fetching. This way, the database client (GraphQL) has complete flexibility, but the master database isn't swamped with queries.

What do you think of that approach? Are there other ways to be safe-by-default but still have good efficiency?

What happens if 500 subscribe to server A and 500 subscribe to server B?

Storage in memory is not sufficient! You need to manage state with a shared database. For this reason, the API will have two parts: a Subscriptions module with generic hooks for storage and delivery, and some implementations of those hooks with various tradeoffs.

mgwidmann commented 7 years ago

@rmosolgo I hear your concerns, especially with regard to Thread.current, its a super bad practice. I don't know anything about your implementation so I don't know if what you said will work.

I've talked several times with one of the team members on the Elixir GraphQL implementation and he said they had to implement functionality similar to how batching works inside a single document but across multiple documents as a subscription is fulfilled. For example, if you have a post w/ an author and 1000 subscriptions w/ half of them having an author and half without, the author and post will only be fetched once, not 500 or 1000 times. They do this once per server, so if you have 10 servers, you'll only fetch the post 10 times and the author 10 times.

Storage in memory is not sufficient! You need to manage state with a shared database. For this reason, the API will have two parts: a Subscriptions module with generic hooks for storage and delivery, and some implementations of those hooks with various tradeoffs.

There are problems with that too. Subscriptions are only relevant as long as a user is connected to some sort of live transport such as a websocket. When they're disconnected, those things need to be cleaned up or they'll leak. Regardless of where you put it, (database or redis or wherever) you cannot rely on the machine with the active connection to be the one to clean up. Servers frequently go down for numerous reasons and if they went down without running the "cleanup" code, those subscriptions would be leaked and clients would immediately reconnect to another up server, duplicating the subscription. Absinthe (and I suspect Sangria as well) keeps the subscriptions in memory since they're tied to the websocket being live and that way they're cleaned up if the websocket goes down even if you kill -9 or unplug the server.

Also, as a general rule, I'd warn against any direct coupling of GraphQL to any 3rd party service such as a database or redis just because GraphQL is supposed to be storage agnostic. I'd suggest just relying on a pub/sub system w/ an adapter interface, then build out several projects to be able to use things like redis/memcache/ect as the pub sub.

rmosolgo commented 7 years ago

across multiple documents

šŸ‘Œ I think the current implementation would support using Schema#multiplex for subscription delivery, I'll add a test to make double-sure before releasing.

against any direct coupling

Same, the current implementation leaves storage & transport to the application, so you could use whatever implementation you want (eg ActionCable, Pusher)

Jose4gg commented 7 years ago

@rmosolgo Do you have another date estimate for the release or is still Agust 25?

rmosolgo commented 7 years ago

still August 25

yep šŸ˜¬ working through an example implementation on ActionCable now :)

theorygeek commented 7 years ago

As I've had a few more subscriptions crop in my app, I've noticed that a common use case is:

The operation usually has a unique ID, which is unknowable until the mutation is finished. However it's possible that status updates happen between the time when the client receives the mutation response, and when it creates the subscription.

So somehow, you want to get this as close to atomic as possible. One way that it could be achieved is to include a timestamp in the result of the mutation, showing when the last status update happened. Then at subscription creation, you provide both the ID and the timestamp; if new status updates have happened since that timestamp, the subscription is triggered immediately.

Just thought that use-case might help when thinking through the feature.

rmosolgo commented 7 years ago

Great point, a similar question of atomicity was also discussed during the spec formalization: https://github.com/facebook/graphql/issues/283 (in the end, left vague).

Your idea of putting the timestamp in the ID is really cool! So when you subscribed, the server can replay updates between that timestamp and the current time. I'd love to try implementing it sometime šŸ˜„

blevine commented 7 years ago

@rmosolgo Is subscription support now available? Should I use the 1.7.x branch for this?

rmosolgo commented 7 years ago

Sorry, I ended up writing GraphQL::Tracing during the hack week instead! Keep an eye on #672

rmosolgo commented 7 years ago

I merged #672, should be released this week with 1.7.0

mgwidmann commented 7 years ago

I'd be a little weary about using timestamps between systems. Timestamps and distributed systems never end well as they're almost certainly not the same value at any given time. Phoenix's pubsub/presence system uses CRDTs (which is some crazy math proof) instead of using timestamps like you described. This guy explains some of the problems and how CRDTs fix it. https://vimeo.com/171317273#t=1682s (rewind if you want to hear more about the problem they're solving) he specifically mentions timestamps as being a bad option (around 32 mins)