graphql / graphql-spec

GraphQL is a query language and execution engine tied to any backend service.
https://spec.graphql.org
14.31k stars 1.12k forks source link

Subscriptions RFC: Are Subscriptions and Live Queries the same thing? #284

Closed robzhu closed 7 years ago

robzhu commented 7 years ago

Re-define Live Queries? Can Live Queries make Subscriptions unnecessary? @smolinari @paralin @laneyk @dschafer @taion @Siyfion @jamesgorman2 @leebyron

Continuing the conversation from: https://github.com/facebook/graphql/pull/267#issuecomment-281576156

taion commented 7 years ago

I think in many cases, developers use subscriptions to approximate live queries, but subscriptions are more powerful and easier to implement.

For example, in my case, where I have many microservices on my backend, where some nested fields go to other services, it's not really straightforward to define how live queries would work, and I've chosen explicitly to model things as event streams.

Live queries would be a nice abstraction on top, but it's only that – it's not, in the general case, a great backend building block.

stubailo commented 7 years ago

I don't think this discussion should be judging about whether or not live queries are better than subscriptions, just whether they are different enough that they should be considered independently.

I think "building block" is a great way to look at it though - subscriptions are a great well-specified unit of realtime data push that can be used to build a lot of other cool stuff. The fact that it's very easy to implement a spec-compliant subscription on the server side is pretty awesome, even if it's not always the thing you want as a client-side developer.

smolinari commented 7 years ago

I'd like to know first what people consider live queries to be. What is the definition? I ask, because I think there are different perspectives or ideas in play here, and thus the discussion can run in unnecessary tangents.

So, what is a live query? 😄

Scott

stubailo commented 7 years ago

Here's my impression of a live query in one sentence:

"A live query is a query where some or all of the parts can be marked as 'live', and the client expects to receive updates whenever any of those parts would have ended up with a different result if refetched again."

In short, they should be a drop-in replacement for polling.

robzhu commented 7 years ago

I'll quote my original definition in the RFC:

Live Queries- the client issues a standard query. Whenever the answer to the query changes, the server pushes the new data to the client. The key difference between Live Queries and Event-based Subscriptions is that Live Queries do not depend on the notion of events. The data itself is live and includes mechanisms to communicate changes.

Another way @stubailo and I have described it is: "infinitely fast/cheap polling".

smolinari commented 7 years ago

Here is my definition.

A live query is a query, which is designated by the client as "live". This designation is passed on to the GraphQL server (one could say it is a subscription). The server then observes for triggers or data input from the underlying data sources needed to fulfill any part of the query. This in turn means any updates from the underlying data sources will be passed to the client automatically via bi-directional communication.

Scott

paralin commented 7 years ago

If you want a demo of a live query, complete with a GraphiQL editor, see http://rgraphql.org

Live queries are infinitely more powerful than subscriptions because you can model live, reactive data in a way that efficiently encodes changes all the way from the data source to the browser, into something like react and angular. And it's not true that they cannot be done at scale - is definitely possible with a good enough scheduler / balancer.

stubailo commented 7 years ago

Live queries are infinitely more powerful than subscriptions

I agree with this.

And it's not true that they cannot be done at scale - is definitely possible with a good enough scheduler / balancer.

I also agree with this.

However, I think they are significantly different from subscriptions nonetheless, and both have their place in the GraphQL ecosystem.

paralin commented 7 years ago

However, I think they are significantly different from subscriptions nonetheless, and both have their place in the GraphQL ecosystem.

Agreed. I don't think live queries are necessarily difficult to do, though, either - the argument I'm arguing against is - "live queries at scale isn't a solved science, so we're going to ignore the concept entirely."

robzhu commented 7 years ago

"live queries at scale isn't a solved science, so we're going to ignore the concept entirely."

I don't think anyone is ignoring the concept. But "live queries at scale isn't a solved science" has some truth to it. We hope to share more details in the coming months as we continue to learn from our live query experiments at Facebook. However, assuming live queries work perfectly, we believe live queries and subscriptions are different tools in the real-time API toolbox.

stubailo commented 7 years ago

Yeah I think it's important that the spec proposal doesn't say "this is the only thing we will ever do for realtime data" or even "this is the best way to do realtime data" - it should just say "this is the way of doing realtime data that is understood clearly enough to specify"

robzhu commented 7 years ago

For example, in my case, where I have many microservices on my backend, where some nested fields go to other services, it's not really straightforward to define how live queries would work, and I've chosen explicitly to model things as event streams.

@taion I'm curious to know if you think of subscriptions and live queries as semantically different. Suppose you had both subscriptions and live queries at your disposal, when would you use one over the other?

smolinari commented 7 years ago

That is a great question, but it blows my definition of live queries out of the water. Doesn't it? Hehehe... LOL! 😄

If I may answer too. I think with Christian's (@paralin) rqraphql system, live queries are a server-side and domain specific decision. From what I understand from the rgraphql docs, if you want a live query, the ability to observe for updates is "baked" into one or more resolvers for that query. And, I believe this is where this concept has a general concern (and something still missing in the spec too). It requires a front-end dev to have intimate knowledge of the back-end decisions, as the type of query (live or not) cannot be directly "seen" through introspection, whereas, it should be. Sure, one could add some type of comment, but is that really a good solution for flagging queries as "potentially live" with introspection?

The other question that burns in my mind is, how does the server know who to broadcast these updates to? The docs mention killing long running processes. That is only scratching the scaling issue.

I guess I am the stupid guy on the fence between these two solutions. I don't think GraphQL should be working with events internally. They aren't needed, as Christians's rgraphql system proves. Yet, I don't think pure live queries, without some sort of subscription system, are also the right solution either.

Oh. And just because a live query has a subscription system tagged to it, doesn't mean it can't be called a live query. 😉

Scott

taion commented 7 years ago

@robzhu

I do think of them as semantically different. It would be awkward to do a toast notification with a live query rather than an event-based subscription stream, for example.

That said, I am mostly using subscriptions as a poor man's live query system, with easier-to-understand semantics on the backend. If I had a reactive backend that supported live queries, I would mostly move to using live queries – but I don't, and I decided it wasn't worth the architectural trade-offs required to do so.

Additionally, I expect the majority of users of GraphQL subscriptions as-is to use them to do something somewhat similar to my use case of emulating live queries in an easier-to-implement manner for complex back end systems.

paralin commented 7 years ago

@smolinari don't you mean my system? :)

smolinari commented 7 years ago

@paralin - Sorry about that. You are right. Me goes correcting

Scott

Siyfion commented 7 years ago

So are we saying that the difference between a "Live Query" and a "Subscription" is essentially how the updates are pushed? A LQ will send you any updates automatically that effects the original query, whether it be an add/remove/update, a Subscription needs a "manual" push of new data, allowing the programmer to be selective about what updates are sent?

paralin commented 7 years ago

@siyfion in rgraphql at least the developer still has fine control over what gets sent to the client, the system just manages getting those changes to the client and applying them properly.

The only difference I can see really is that subscriptions are limited to the root level of the query only, and cannot be updated after they have begun. These properties are probably good for when you're subscribing to general streams of events. I wouldn't use it for live data though.

Imagine you're trying to build a news feed with comments. What happens if someone edits a comment? Do you just push a event saying it was edited via a subscription? But then all of the logic to apply the updates has to be hand built separately for each of the types of things you might want to update. That seems wrong to me.

Instead you can just subscribe to the same streams of data on the server, interpret them correctly, and then send back updates to the client tailored to the data they already have.

smolinari commented 7 years ago

@paralin - How does your system know when to send the news feed updates or rather, to which clients?

Scott

paralin commented 7 years ago

@smolinari that's up to the developer to decide. In Go we have strong concurrency patterns around streams of data, and Magellan supports all of those patterns when resolving fields. When a user subscribes to some live query the server decides how it will fill that query, and the developers code can return many different permutations of result representations, including ones that change over time.

smolinari commented 7 years ago

When a user subscribes to some live query

I missed how this can be done with rgraphql. Can you point me to the docs (or code), where this is explained (done)?

Scott

paralin commented 7 years ago

@smolinari http://github.com/rgraphql/soyuz

Not much in the way of docs yet, mostly focusing on optimizing and getting in mutations right now. But the interface is the same as in Apollo. Call query, returns an observable, subscribing to the observable triggers the query to actually be applied. The system merges together the entire tree of active queries into one query object and keeps that in sync with the server.

There is a lot of information on how it works in the protocol.md doc under I think Magellan (I'm on my phone right now, apologies for the lack of a link)

laneyk commented 7 years ago

What happens if someone edits a comment? Do you just push a event saying it was edited via a subscription?

Yup, that's how we would do it.

But then all of the logic to apply the updates has to be hand built separately for each of the types of things you might want to update. That seems wrong to me.

For us, the subscription payload that gets pushed to the client is the same type as a comment_edit mutation payload, and the client already has logic for updating the comments UI in response to a comment_edit mutation response. In general, on our native clients and in Relay, we have client-side infra that is smart about taking GraphQL responses, sticking them into a GraphQL cache, and updating the UIs accordingly, so it's not actually as bad as you make it sound to add logic to handle a subscription response.

paralin commented 7 years ago

And yet you have to base every change on mutations.

I'm building a app right now that is extremely reliant on outside data - that is, sensor data, position data, connectivity, etc from a large number of sources. To make a mutation to affect every little change to this data would be impossible. This type of live data is something well suited to GraphQL, because the client can subscribe to only what it needs. It's also something that cannot be done with subscriptions in any tractable way.

This example I believe reveals that there are actually two types of live data that a GraphQL user might want to have: streams of updates to individual fields, along with batch updates as a result of measurable transactions.

I believe this is the best argument yet for building two different live mechanisms into GraphQL.

laneyk commented 7 years ago

Just catching up on everything in this thread. I'm seeing two general questions being discussed here: (1) How hard is it to implement live queries? (2) Are GraphQL subscriptions useful in their own right, even in a world with working live queries?

Re: (1), we believe based on experience at Facebook and discussions with other folks that the general problem of implementing live queries at scale is not easy. This doesn't mean that it is always hard; with an efficient reactive backend, implementing live queries becomes fairly straightforward. As @taion mentioned, though, some folks might have "many microservices on [the] backend." Some might have tens or even hundreds of different DBs and services backing the data in their GraphQL schema. The general problem of moving all of the backing data for a GraphQL schema to a reactive backend is quite challenging.

However, I think we're getting off-topic by focusing on question (1). The more relevant question for this RFC is (2). Based on my experience working with a bunch of Facebook product teams building real-time features and rolling out GraphQL Subscriptions at scale over the past two years, I believe that the answer to question (2) is yes. We've seen cases where product folks explicitly design their real-time experience around events. They need control over things like which specific events get priority when the rate is too high to deliver all updates. @paralin said previously that "Live queries are infinitely more powerful than subscriptions." I'm not sure if I agree with this, and I'm also not sure that it's useful to debate the meaning of "powerful" (super relevant talk: https://www.youtube.com/watch?v=mVVNJKv9esE) but one thing I will say about subscriptions is that they put more control into the hands of the product developers over which updates they'll receive.

We have also seen examples that lend themselves nicely to live queries, and some people in this thread have mentioned examples of that sort. Internally, we are still experimenting and working with product teams to arrive at a general understanding of which use cases are better served by subscriptions and which are better served by live queries, but we are confident that the former is not an empty set.

paralin commented 7 years ago

@laneyk Agreed in full. I don't dispute that I've been overstating the worth of live queries a bit, primarily because I'm passionate about seeing them considered due to their value in my particular niche applications. I don't believe that live queries are the only way to do it, just that they are an effective mechanism in a lot of small to mid scale applications.

It makes sense that live data and events would have very different mechanisms.

calebmer commented 7 years ago

One thing I will say about subscriptions is that they put more control into the hands of the product developers over which updates they'll receive.

This is one thing that I see consistently in the design of GraphQL. Besides the debate between live queries vs. subscriptions it may be worth thinking about this client-developer-control as a key design point of GraphQL.

If you think about mutations, they require a lot of work on the client developer’s side to update the cache. This is a problem that Apollo Client, Relay, and any future GraphQL clients will struggle with. A lot of GraphQL beginners really want mutations to be “magical.” They want to send a mutation to submit a comment and have that comment be automagically inserted into their pre-existing list with zero boilerplate, but GraphQL wasn’t designed to be magical it was designed to be practical.

In its practicality GraphQL tries to enable both the server and the client developer as much freedom and flexibility to work in and around the query language without over-prescribing. The server developer may require a token in an HTTP header, or return a JSON blob as a scalar field. The client developer may implement super custom updates to their data based on a mutation or subscription which takes into account variables only the client knows, like a local priority based on what screen the user is on. However, this practicality comes at the cost of some higher-level “magic” features that would make development much faster such as live queries or zero boilerplate mutations on the client.

I like that GraphQL has chosen to be practical. It’s the same choice React has made whereas Angular has chosen the “magic” route. If you want magic in the data API space I heavily encourage you to check out Falcor. Unlike GraphQL, Falcor’s design is optimized for some of these magic features like live queries and simple mutations that people would like to see (Albeit you probably won’t get any magic from Falcor in its current form, but I think the design is there. Also forget about the fact that Falcor doesn’t have a schema! You could easily write a version of Falcor with static types and get the same GraphiQL experience).

What do you think? Do you see the same consistent choice in design decisions? Do you agree that live queries are a “magical” feature?

My point isn’t so much to argue for-or-against live queries (or even for-or-against magic!), I just wanted to make an observation about the design of GraphQL that I’ve noticed from time to time 😊

(since it was mentioned this talk is amazing https://youtu.be/mVVNJKv9esE and its concepts apply to this observation as well)

paralin commented 7 years ago

@calebmer You don't need to have a feature in the spec to build it. Projects like mine that add real-time to GraphQL operate with GraphQL in its current form, and declare their own rules as to how data is handled. Therefore they are derivative of GraphQL and perhaps compatible while not GraphQL in their own sense.

GraphQL definitely can support these types of things, and I believe it's productive to at least discuss inside the bounds of GraphQL without deferring to other products entirely.

Your point absolutely holds - GraphQL's spec doesn't really need to have real-time built in. It would be nice, but it would always be labeled as an optional feature anyway. Maybe it's best to leave these features to derivative projects to define, with loose guidelines in the spec? I believe subscriptions should be in the spec for sure, but real-time maybe not. That talk's really good and definitely applies here, thanks for the link!

jamesgorman2 commented 7 years ago

It seems like a lot of the discussion has been about things outside of graphql (how upstream implements events as per @laneyk, but also the semantics encoded within events, eg update vs new state). Is there any difference in the (external) behavior between what has been proposed and live queries beyond how the request is interpreted?

As best I can tell subscriptions (with a bit of hand waving) represent a subset of one or more possible live query specs in that subscription is based solely on the arguments passed to the root, whereas a live query will use the whole query.[1] Everything else is either transport level stuff (subscription, errors, etc) and so common between the two, or event semantics (new state, update etc) and opaque to graphql.

[1] I'm cheating by ignoring any details how to design a specific live query syntax, complexity of implementation, etc, since it's moot to my point, as well as anything about the client updating their query

taion commented 7 years ago

As best I can tell subscriptions (with a bit of hand waving) represent a subset of one or more possible live query specs in that subscription is based solely on the arguments passed to the root, whereas a live query will use the whole query.[1] Everything else is either transport level stuff (subscription, errors, etc) and so common between the two, or event semantics (new state, update etc) and opaque to graphql.

This is incorrect. A subscription doesn't have to correspond to anything in your query proper. It can be a pure event-only stream.

jamesgorman2 commented 7 years ago

A subscription doesn't have to correspond to anything in your query proper. It can be a pure event-only stream.

Not sure I follow this. I am under the impression that to subscribe you have to make a query. The root of this may contain zero or more arguments that the subscription resolver would use to determine what to send and when. The query object below the root would then be used to filter the response.

paralin commented 7 years ago

@jamesgorman2 Yeah, but the data returned from the subscription doesn't necessarily conform to the selections in the query, is what he's saying I think. Although that wouldn't make much sense. Why have a query body if you're not going to respect it?

Edit: see @taion clarification below, makes sense.

taion commented 7 years ago

I'm saying that the data in the subscription doesn't have to be something that can be grabbed from a normal query. Imagine something correspondingly purely to a transient event stream.

jamesgorman2 commented 7 years ago

Ah, looks like we're talking at cross purposes then. I'm using query in the sense of the message that is sent by the client to the graphql server during subscription, eg

subscription {
  foo(bar: 'baz') {
    # stuff here
  }
}

There is a subscription mechanism and transport protocol from client to server over which this is sent. There is also an event generating mechanism between the server and upstream that is opaque to a graphql server framework. The graphql server framework's job is to render the subscription query readable to the subscription resolver, then convert the events returned from upstream and emitted by the resolver into the correct graphql response object.

paralin commented 7 years ago

@jamesgorman2 I had considered wrapping my "live queries" in the subscription tag but I find it's more flexible to blend regular and live data together side by side. As someone browses through an app, the app will constantly be subscribing and unsubscribing from just the fields it needs. This is the powerful part of live data - my frontend can pull literally exactly what it needs at any given time, nothing more, nothing less.

In terms of limiting what is live from a developer's perspective. I will probably add a @live directive that "turns on" live fields for all subfields. In rgraphql at least, @defer and @stream are implicit - results are streamed back as they are resolved, and because of this every field is deferred and every array is streamed. However, @live could be used to turn on and off live updates for a query tree and/or its sub-trees.

smolinari commented 7 years ago

@paralin - you have me sold now. Live queries with the @live directive to subscribe to them for the win! 😄

However, I am still not clear on how you'd turn off the query/ subscription. How would that work?

And how does the server know who to send the updated data from a live query to? There could be 100 clients waiting and listening, yet let's say only 2 subscribed to get the updated data, not all 100. How can that be controlled? I can't imagine we'd want to simply broadcast any and all updates.

I guess I am wondering if GraphQL should even be considered a gateway in this fashion (using live queries). It seems to me, there needs to be a subscription system in front of GraphQL (no matter what). The subscription system would control who gets what, when and to whom and GraphQL would only concerns itself with the feeding the subscription system with the "what".

This also might be showing my lack of knowledge in terms of a websocket kind of communication and exchange of messages. So, go easy on me. 😄

Edit: also thanks for that link to the talk. Very interesting take on abstraction.

Scott

paralin commented 7 years ago

The client.query<T>(options) function returns an Observable. The system subscribes to the query fields when at least 1 subscriber is viewing them. It keeps a refcount, so if your app requests the same thing 10 places, it will only actually send a single field subscription.

Every client running a query gets its own resolver function call for each field. The Go server does these in parallel, so they're extremely fast in general. You could return an existing Go channel from these functions for example, to direct the system to subscribe to a stream of values coming from elsewhere, and send them to the client until the client unsubscribes. Or, you could manage the stream of data more tightly, using a "live function" that terminates the field when it returns. The system adapts to your code AST dynamically at runtime.

paralin commented 7 years ago

I suppose the general answer to this thread is - no - the two are NOT the same thing.

A good follow-up would be, "Live Queries: do they belong in the spec?"

smolinari commented 7 years ago

Well, if you add subscriptions as a necessary part of live queries (which I think we can all agree on, they are), then we are still talking about a subscription system in general, which would be the same thing.

However, if we are talking about how to resolve those subscriptions, either with active pushing of updates in live queries or with an elaborate pub/sub event system, then we are talking about two different things. 😄

So, I think the better question is, what is the best solution for subscribing to changes in data and getting the results of those subscriptions? 😄

Scott

paralin commented 7 years ago

Subscriptions as they are defined in GraphQL are not necessary for live queries. Furthermore, resolving subscriptions is going to be part of the spec, and is designed around a very different use case from live queries.

My previous suggestion still stands.

smolinari commented 7 years ago

Subscriptions as they are defined in GraphQL are not necessary for live queries.

I thought subscriptions in general are necessary for live queries, in order for them to scale well. I realize the subscription system defined here doesn't match a live query subscription system. That is beside the point. (Edit: or maybe it is the point?)

Actually, my main concern is the definition of an event system out of the domain layer to trigger the GraphQL subscription system. I just don't think that should be necessary, or rather, an event system should be a domain responsibility and not part of the spec. Your live query system proves this is possible @paralin Christian).

I see GraphQL subscriptions as a live query system with subscriptions. How the data is force-fed into GraphQL "live" from the backend data sources is then a domain specific concern. Right now, as I understand the current RFC, we are going from a bi-directional data flow to the client to a request/ response data flow from the backend data sources, plus some sort of events. But, what if I can connect to a data-source bi-directionally (I am also thinking about bi-directional microservice communication like gRPC too)? With events, I'm locked into working with or around a prescribed triggering methodology. With live queries, I am not.

One might argue that most backend data sources are request/response systems anyway and why we need events. But, that is due to past technologies. For truly reactive systems, live queries are the future and it is why newer databases and data storage/ persistence technologies are adding live query technology in their systems. Live queries makes the system a good bit more efficient.

I can understand why Facebook might want it this way. It probably fits their systems better. I just think that is a bit closed minded. Sorry, if I step on toes by saying that. I actually think this might be something that could be effectively big in savings for Facebook in the end, if my assumptions are correct and they step out of their own box. I certainly appreciate the chance to try and make that happen. 😄

I'll bow out of the conversation now, unless someone mentions me. I've said what I wanted to.

Scott

paralin commented 7 years ago

You are correct in that the concept of subscriptions is important for live queries - for example, you would subscribe to the result of a query on the client to inform the system that you're interested in knowing about it. I agree with all of your other points as well - particularly around the "domain level" implementation of reactivity.

Recapping/summarizing now:

GraphQL subscriptions refer to the "subscription" operation as opposed to the "query" or "mutate" operations. They are tightly bound by the spec, and designed more for event streams than live updates.

It's become clear now that live queries are still a "query" operation with some parameters or directives applied. The actual handling of those queries is done at a framework level, like in Relay or Magellan or Soyuz or Apollo, etc. As of now, the spec has everything necessary to handle these queries - directives (which are server defined), and the "query" operation. In the context of the language, we have everything we need, as far as I can see.

robzhu commented 7 years ago

Thanks everyone for the enlightening discussion here. I think the consensus is clear: Live Queries are not simply "better Subscriptions". Subscriptions and Live Queries are mechanically similar but semantically distinct:

Executable

A standing query can be made "live" with the addition of a "live" directive, which means nothing is inherently live about the query itself. Live queries will always return a meaningful initial result. Subscriptions typically do not return an initial result because the data to "execute" the subscription's selection set is only present when an event has been triggered.

Domain-specific reason for data change

When a subscription pushes data to the client, the reason why the data changed is implicit; for example, someone liked a comment, a new email has arrived, a friend has logged on, etc. Live Queries only observe data, and reason why the data changed must be explicitly modeled on the schema and queried.

Note on Isomorphism: Subscriptions and Live Queries are isomorphic. For example, take an online friends list: we can create a subscription to two separate events: friendLoggedOn and friendLoggedOff or we can model a CQRS-like event table and execute a live query against it. However, just because it's possible to model Subscriptions as Live Queries and vice-versa does not mean it is desirable. Doing so may compel us to model our data in awkward ways and we continue to discover use cases that are more ergonomically served by one model or the other.

smolinari commented 7 years ago

Although this is closed, I'd like to put in my last input to this daunting question. It's probably too late to make any difference, but here goes anyway.

I reread this part of the blog post about the subscription system in GraphQL and Relay, and it dawned on me what the difference is between a subscription system and a live query system. A subscription system is business logic centric. A live query system is data centric.

If you read that article (please do, if not again), it's reasoning for not using live queries is, because, with a live query, you can't make heads or tails of which business logic should be covered to evoke the update to the client.

So, I ask two questions.

  1. Who care's what the business logic reasons are to update the client?
  2. With a pub/sub event based solution, are you possibly actually forcing these business logic decisions to be made, in order to update the client properly?

Provocative questions for sure. And, I'd love to hear the answers of Dan or Lany or anyone else who thinks live queries offer the stumbling block of not knowing why an update should happen.

Until, and if, anyone answers, let me explain why I think the reasoning in that article is a bit flawed and that this stumbling block isn't one. Let's look at the example fragment used in the article to argue against live queries again.

fragment StoryLikeData on Story {
  story {
    likers { count }
    likeSentence { text }
  }
}

Instead of worrying about why the client should be updated for changes in likes or the likeSentence, the client only wants to know about changes in the like count and the likeSentence text. Right? The query or request itself is very data-centric. In other words, the client doesn't really give a hoot about why these things change, from a business logic perspective, because the client only wants to update the "view" of that data, whenever it changes.

Oh, and by the way, aren't all those business logic questions presented in the blog post about why a like count should be changed already made in the business logic? We don't need to ask them again (answering the questions above). At one point, the like count or the text will change. All the client cares about are those changes.

Also, if the client is going to request that some data be updated automatically (with a subscription), the query / request should be very specific. It shouldn't cover a fragment with multiple unrelated data items. It should only cover one specific piece of information (or very much related information for the scenario, where there is an insertion of data).

Why?

A whole section of the RFC for subscriptions is dedicated to under- and over-pushing. If the like count above changes, why would I want the likeSentence text with it? In an update scenario, that would be over-pushing. Right? Of course, it depends on what is being presented, but I don't see any correlation.

The better solution for that example is two separate subscriptions: One on the likes count and one for the text updates of the likeSentence.

So now, the only need to make this happen is to watch these two data sources. When they get changed (for whatever reasons), then an update is pushed to the client. This might be a watcher, which following the aggregation system used for aggregating likes and also the "update channel" to the likeSentence text.

That is live querying.

I wonder if we weren't talking in circles here. That in the end, Facebook actually has built a live query system, but is using an event based pub/sub system to make it happen and thus, feels this client real-time updating system can't be called "live queries". If so. Great! But, let's do call it a live query subscription system, based on a pub/sub event system. That way, everyone can understand and should that the events getting triggered should be data centric and not business logic centric. 😄

Scott

laneyk commented 7 years ago

Who care's what the business logic reasons are to update the client?

Have you watched https://www.youtube.com/watch?v=ViXL0YQnioU, particularly the part starting from 19:55? Because our backends at FB are not reactive (which is also the case for many other folks at other companies using GraphQL), we had the idea that we could instead implement live queries by using data dependencies to "fake" a reactive backend--which would require knowing what you call "the business logic reasons" that can cause a given field to change. The blog post and the talk both explain why this was ultimately not possible for us.

With a pub/sub event based solution, are you possibly actually forcing these business logic decisions to be made, in order to update the client properly?

If I understand this question correctly, then yes--we're putting the control into the hands of the client developer to decide which events to subscribe to and what information to query for each subscription. One of the main principles of GraphQL is that the client should have control over decisions about precisely what data it wants (as opposed to something like REST where the client hits an endpoint on the server which decides what data to return). This principle also holds true for GraphQL Subscriptions: the client developer has full control over which events they care about and what query should be executed when those events happen. @calebmer's comment above has some good insight into this issue of developer control versus magic.

And, I'd love to hear the answers of Dan or Lany or anyone else who thinks live queries offer the stumbling block of not knowing why an update should happen.

It's Laney 🙂 To be clear, that "stumbling block" existed for us because we don't have a reactive backend. From my previous comment: "we believe based on experience at Facebook and discussions with other folks that the general problem of implementing live queries at scale is not easy. This doesn't mean that it is always hard; with an efficient reactive backend, implementing live queries becomes fairly straightforward." You previously commented that "I can understand why Facebook might want it this way. It probably fits their systems better. I just think that is a bit closed minded." Perhaps @stubailo can chime in here with examples of other non-Facebook users of GraphQL who reached the same conclusion about live queries and GraphQL Subscriptions.

So now, the only need to make this happen is to watch these two data sources.

Your whole chain of reasoning rests on the ability to "watch these two data sources" which is equivalent to having a reactive backend. I also noticed that you previously wrote that "I don't think GraphQL should be working with events internally. They aren't needed, as Christians's rgraphql system proves." Again, the confusion here seems to center around the assumption that a reactive backend is always available. A live query demo using a reactive backend certainly does not prove that everyone using GraphQL has an efficient reactive backend. You previously wrote that "For truly reactive systems, live queries are the future and it is why newer databases and data storage/ persistence technologies are adding live query technology in their systems. Live queries makes the system a good bit more efficient." If you have a reactive backend that's efficient for your particular application or use case, that's great. If you implement live queries using a directive, we'd love to hear what you learn. But it's incorrect to assume that everyone who uses GraphQL either started with such a backend or has the resources/desire to convert their existing backends to reactive ones.

in the end, Facebook actually has built a live query system, but is using an event based pub/sub system to make it happen and thus, feels this client real-time updating system can't be called "live queries". If so. Great! But, let's do call it a live query subscription system, based on a pub/sub event system. That way, everyone can understand and should that the events getting triggered should be data centric and not business logic centric.

I don't fully follow this reasoning. What we've built (i.e. GraphQL Subscriptions) is described quite clearly in the spec from @robzhu (and in the blog post and talk I linked earlier). I certainly would not call this a "live query system." (To be more concrete, you commented on March 6 with your definition of live queries; what we built and what we describe in the spec is definitely not that.) Finally, regarding your last sentence, the events being triggered in GraphQL Subscriptions are not data centric. They are tied to conceptual events such as a post being liked or a comment being written.

wincent commented 7 years ago

@smolinari: Thanks a bunch for the thoughtful comment. Your post is quite long so forgive me if I am over-simplifying it if I try to summarize it. I get the sense that you think:

  • Live queries and subscriptions may be isomorphic, but live queries are intrinsically superior to live queries because they more naturally capture the desired outcome of "up-to-date-ness".
  • Subscriptions may be a mere implementation detail — only really chosen because they are easier to implement and scale — that you could use to construct a reactive (or apparently reactive) system, so you may as well put the public focus on reactivity and keep the implementation details private.
  • Business logic is, in a way, an implementation detail, so why would clients care about it when what they actually want is having and keeping data up-to-date.
  • If live queries are strictly better and more powerful than subscriptions, it's better to have only them in the spec.

Again, please forgive me if I've misread you on any of those points. Based on that reading I have a couple of comments:

So the tl;dr is:

stubailo commented 7 years ago

I hope people don't feel like adding subscriptions is closing the door on live queries - in fact, I think once people get used to subscriptions getting live queries started will be much easier because a lot of transport-related questions will already be validated.

smolinari commented 7 years ago

Wow! This is sooo cool. I do not take it for granted that I am getting this attention and I appreciate it enormously. So, before I even begin, let me say thank you so much for your time and effort! 😄

Puh...where to begin.

Hmm.....Ok.

@laneyk Thanks for your very clear and understood response and the link to the video. That sets a great basis to this continued discussion.

Your whole chain of reasoning rests on the ability to "watch these two data sources" which is equivalent to having a reactive backend. I also noticed that you previously wrote that "I don't think GraphQL should be working with events internally."

Let me clarify the events point first and thanks for bringing it up. I was mistaken in my wording or rather, I think I might have said something different at some point somewhere else (maybe not even in this discussion, but elsewhere). I am not against an events system. My thoughts on the event based / pub-sub solution are, as you even say in the video....it is an implementation detail. The whole time the FB team has fought against adding implementation details into the spec and I believe this is also one. I'll leave it at that, because the rest of my argumentation (and questions) will hopefully clarify the reasoning to why I say that.

Let's put it another way. Adding the event based pub/ sub solution makes the FB GraphQL system reactive. 😄

@wincent made a point, which might be the core to our sort of tangential discussion.

Any time you care about why something changed, subscriptions are a natural fit.

My question to that is, how can client devs using GraphQL care about why something changes? I don't see how they can. They have pieces of data or certain activities, which they want "news" about. They are simply asking the questions:

Are there newly created data? Or are there new updates to changes of that data? Or are there activities going on, which I want to be aware of? If the answer is yes, let me know about them, without me asking again.

None of that has anything to do with "why" the requested data changed, but rather "did" something change. This is the big difference and to me, it is the definition of a live query.

@laneyk - also wrote.

If I understand this question correctly, then yes--we're putting the control into the hands of the client developer to decide which events to subscribe to and what information to query for each subscription.

How does a client define the event? She is working with GraphQL and data. There are no event definitions available. Are there? This is the open question in my mind, which isn't making the "click". To me, there is only a flow of data. So, when you say, "it's in the hands of the client developer to decide what events he or she should be subscribing to", I don't see it. I see the client developer asking for updates on single or multiple pieces of data or certain activities which yes, might be events, but they aren't defined as events. At least I have yet to see a good example of this. The only thing that comes to mind is the "Someone is typing a reply" display in FB comments. That is an event based subscription. How is that set up? It is a rare example of something that isn't a live query, because nothing is persisted. Well, not unless FB is logging our "typing prowess". LOL! ;smile:

Btw, Laney, you said there were "data dependencies" that cause a lot of reasons for the like count to change the video. I call them business decisions. All the reasons you noted as to why a like count might grow or drop in number are decisions or processes made by the business. And again, the client dev is not at all interested in these decisions (or the events that cause a like count to increase or decrease) and cannot be bothered by them. Can we agree on that? The client dev only cares about the change happening. Her reasons for subscribing to the like count changes are purely data centric.

So again, I feel Facebook has built a pub/sub/ event based live query system for the client developer, because it is 99.9% data centric. It has to be, because the client dev really can't be involved in all of the business decisions or events made available as to why a piece or multiple pieces of data change. They only want to display the change. All the logic as to why the data changed is hidden in the server and is business logic, which the client dev has no clue about, nor wants to know about. She or he just gets the "live" updates. Or maybe it could be called "Reactive Subscriptions"? 😄

In other words, because there is the pub/sub/ event based system behind Facebook's GraphQL subscription system, it doesn't mean it is not a live query system. And, as @stubailo mentioned, how others get those updates through to the client can be pub/sub/event based, or could also be done with a purely reactive systems (which was also noted FB is also working towards, right?). The one solution for getting reactivity in the data doesn't exclude the other. The way the updates are pushed to the GraphQL API is an implementation detail and doesn't matter to the client or even the person responsible for the API. The response is always the same. The client (and the API itself) gets a response to a data centric, and not business decision/event based, request for updates. They are live queries, at least from the client's and API perspective.

I hope that all makes sense. It's all just a matter of semantics I guess. I feel the query subscription that is being created by the client is a live query. I am certain, the client dev has (should have) no clue as to why he or she gets the updates and in the end, really doesn't care. Any subscription request is purely data centric in nature, as is the response. It's just that now the query is reactive. It is live. How that "reactiveness" happens in the backend is unimportant for the API, or at least it should be.

Maybe that is the issue for this whole discussion from the beginning? I've been thinking in terms of the client and everyone else was thinking in terms of the implementation in the server? Hmmmm.......

At any rate. Thanks again for everyone's time and also efforts and making all this possible to begin with. I see huge potential with GraphQL and this discussion really is peanuts in comparison. I am honored to even have the chance to be the small thorn in everyone's side here. 😄 I certainly don't want to waste anyone's time, so I hope, at least, I could offer a different perspective and we could agree on it and maybe Facebook can now sell the subscription system as "Live Query Subscriptions". 😜

Scott

laneyk commented 7 years ago

Hey Scott,

They have pieces of data or certain activities, which they want "news" about.

None of that has anything to do with "why" the requested data changed, but rather "did" something change. This is the big difference and to me, it is the definition of a live query.

How does a client define the event? She is working with GraphQL and data. There are no event definitions available. Are there? This is the open question in my mind, which isn't making the "click".

Yes, the "event" corresponds to the GraphQL Subscription that the developer chooses to subscribe to. She may subscribe to comment creations or likes on a post or event RSVPs or someone starting to type, for example. These are all events that happen in the world of FB. She is not asking for any arbitrary updates about some piece of data; rather, by choosing a subscription to conceptual event X, she is saying that she wants to get updates pushed to her when and only when event X happens. In our system, we trigger publishes to each subscription event stream when the conceptual event happens (i.e. when someone starts typing or when someone creates a comment.)

I don't see it. I see the client developer asking for updates on single or multiple pieces of data or certain activities which yes, might be events, but they aren't defined as events.

They are events and they are defined as events. We have about 100 subscriptions in Facebook which correspond to conceptual events. The typing subscription, which you mentioned, is one example. To give another example, when someone subscribe to comment creations, they are not asking for any updates to the relevant data (the list of comments). If the list of comments changes because someone who had previously written a comment un-blocks the subscriber (so the comment suddenly appears in the list), the subscriber will not get an update. The data has changed, but the subscribed event has not happened. The subscriber will only get an update when the subscribed event happens--that is, when someone comments the post. I think that @wincent explained this really well. ("Note here that the event is driven by the business logic, but the logic is not an implementation detail: it actually has valuable semantic content.")

And again, the client dev is not at all interested in these decisions (or the events that cause a like count to increase or decrease) and cannot be bothered by them. Can we agree on that? The client dev only cares about the change happening. Her reasons for subscribing to the like count changes are purely data centric.

No, I don't agree on that, and I think this is the central point of misunderstanding. In the subscriptions system that we've built, the client developer thinks in terms of actions or events. There's no way in our subscription system to say "tell me when any of this data has changed for any reason." There's only a way to say "tell me when this conceptual event has occurred, and return the result of this subscription query to me when that happens."

All the logic as to why the data changed is hidden in the server and is business logic, which the client dev has no clue about, nor wants to know about. She or he just gets the "live" updates.

What you're describing is not what we've built and not what we are describing in this GraphQL Subscriptions spec. The system you've described is much more in line with what we call "live queries" which is in a significantly earlier stage of investigation at Facebook with a lot of unanswered questions. I'm not sure that I can put it any better/differently than @wincent did above, but my one-sentence summary is that live queries let you say "when any of the data in this query has changed for any reason, tell me the new result of the query" whereas GraphQL Subscriptions let you say "when this particular event has happened, tell me the result of my subscription query." I hope that makes the distinction more clear.

I recommend checking out the reference implementation when it's ready and playing around with that; it may help make some of these questions more concrete.

smolinari commented 7 years ago

Ok. I guess I'll have to.

But, I must ask these last questions.

From the video you linked to, what delivers the "pub" signal that triggers the updates on subscriptions to a like count? Where does the event come from and what is it called? Is there an aggregation system (business logic) which sends that pub event? If yes, is it the same logic that updates the persistence layer to store the new like value too?

Scott