Subscriptions RFC: Are Subscriptions and Live Queries the same thing?

robzhu commented 7 years ago

Re-define Live Queries? Can Live Queries make Subscriptions unnecessary? @smolinari @paralin @laneyk @dschafer @taion @Siyfion @jamesgorman2 @leebyron

Continuing the conversation from: https://github.com/facebook/graphql/pull/267#issuecomment-281576156

laneyk commented 7 years ago

First, I would recommend thinking about the "likes" example as a subscription to likes (events) rather than a subscription to the like count (data).

In our system, when someone adds a new GraphQL subscription, they also add logic to publish to that subscription. These publish events live in the same codebase as the business logic. They are manually added for each new subscription. The events are called whatever the developer named them, like "comment_create" or "story_like." (For a system that uses GraphQL mutations for its writes, you could imagine the subscription publish would happen in the mutation execution code.)

For the example of the like subscription, there's a common codepath that gets executed every time anyone likes or unlikes a post. This is where the subscription publish happens. This is also where we update our database to indicate that person X has liked (or no longer likes) post Y.

The reason that subscribing to likes is not equivalent to subscribing to like count data is that there are ways that the like count can change that will not trigger publishes to the like subscription. For instance, if Alice had previously liked the post but then she deletes her FB account, the like count will go down by one. This action does not go through the "like/unlike post" codepath so it does not trigger the like subscription, although the "like count" data has changed.

smolinari commented 7 years ago

Ahhhhh. Very interesting.

The logic to show a different value than what is persisted sort of eludes me. I guess I can't question FB's business decisions in the end. So, I won't. 😄

I also still say the event based solution is an implementation detail on how to make data change events reactive. But, I digress on that too.

Thank you so much for your patience. I'll be very much looking forward to the reference implementation and learning and hopefully also helping a whole lot more in the future.

Scott

taion commented 7 years ago

Maybe another way to think about it is that event-based subscriptions are one option for live query implementation, but in that context they're a transport-level concern. By contrast, for an event stream, the subscriptions actually do map to what's logically happening.

The "like count" thing is an interesting example, because visually it resembles live queries, so I'd argue that it's closer to a workaround over real reactivity there being really, really hard – but having tried to build conceptually similar things on our end with subscriptions, it's a very defensible one.

laneyk commented 7 years ago

@taion: I agree that likes is not the best example to talk about subscriptions in this particular discussion since it's something that is probably a better fit for a live query (assuming both options exist).

taion commented 7 years ago

One example where product requirements might specifically dictate events over live queries is something like Twitter's timeline, which shows a badge for new updates rather than immediately displaying new updates – if the user's about to interact with a timeline entry, you don't want to bump the timeline down in an unsolicited manner and make the poor user retweet the wrong thing, or something like that.

paralin commented 7 years ago

@taion live queries still apply there, you would just restrict the query to never add new entries without an explicit argument change.

acjay commented 6 years ago

What's funny is that the same argument plays out over and over again.

For example, Redux, at its core, is event driven, although they call them actions. It gives you a structure for producing a live view of your state in the form of its reducers and selectors. MobX has you mutate your live model directly, and to the extent that events need to trigger processes, you need to handle that in your mutation logic.

There are strong reasons to build systems around the changing data itself. You don't have to worry about accounting for all the causes.

There are strong reasons to build systems around events. Sometimes, user experience does care about the causes.

Events can be depicted in a live query model by having field that will be the most recent event or null. After all, the schema need not restrict itself to depicting only things that are literally persisted. Clients then would be responsible for queuing up any events that happen to appear. It would be awkward, but possible.

Likewise, subscriptions can support live queries by pushing the full state (or changes thereof) in every event. The event becomes "your data has changed". Also awkward to set up, use, and optimize.

I think it's probably a good idea to have first-class approaches to both paradigms, even in the same application.

taion commented 6 years ago

@acjay

I think we're in general agreement there, and most of us are targeting live queries. The core issue is just that live queries, even at a schema level, require making more decisions – e.g. do you use something like JSON patch to communicate the updates? Or if not what do you use?

Right now a number of implementations mock live queries with polling, but I think a general solution requires the kind of general consensus on how to push live query updates that does not yet exist.

smolinari commented 6 years ago

What events can happen that aren't persisted? If they aren't persisted, can they be? If they can be, and I know they can be, then those events can be "triggered" over live queries. Right?

Can every live query be modeled into an event system? Sure they can. But, then you'd be building another separate system. I've seen this done for MongoDB in many ways for example. So it is clear the want for live querying is relatively large. Why is that?

Obviously too, only databases that can send live query messages can be used in a proper live query system. Otherwise, you are back again to needing a messaging/ queuing events system/ bus, etc.

I can understand why FB went with events. AFAIK, they don't have databases that support live queries. But, maybe they should? If they did, I bet this whole discussion and any solutions would get a whole lot easier. 😉

Scott

taion commented 6 years ago

Any sort of stream data – trades, clickstream, &c. aren't nicely modeled by live queries and would have to be emulated there.

acjay commented 6 years ago

@smolinari If you have a 15 minutes, check out the other issue I made above your comment. I'm increasingly convinced that all the pieces needed for a live query system more or less already exist in today's subscriptions. Although, since it's so far just been a big thought experiment, some details might be missing.

robzhu commented 6 years ago

There was a recent talk on Live Queries at GraphQL Summit by @rodmk, one of the engineers who works on the Live Query system at Facebook. I think it addresses several of the recurring questions in this thread. https://www.youtube.com/watch?v=BSw05rJaCpA.

smolinari commented 6 years ago

@acjay - Absolutely. I never doubted GraphQL's capabilities to accommodate Live Queries. My whole argumentation here was to say that the added event driven system to make subscriptions work is basically unnecessary for (proper) GraphQL, because it can and should support live queries and that is the better answer to subscriptions and state management. Maybe my thinking was a bit ahead of its time???? 😄

@robzhu - Hah! Wow! Excellent video! Rodrigo demonstrates everything I've been trying to get across here. I'm all giddy now. 😛 And no, I don't mean to say, "I told you so.". 😄 I do still get FB's need to not go straight away with a live query solution, because of FB's legacy systems, which Rodrigo also mentions. (i.e. you can't rewrite all of the PHP code.) It demonstrates how FB's own internal issues drive directions in its open source projects and that is all fine and dandy, as a lot of dev shops out there will have those same kinds of issues. But there are also those, who are starting anew and want the best they can get too and Live Queries are the better/ simpler solution, granted only with a true reactive data store.

I've enjoyed this whole discussion and I'd like to thank you all again for the opportunity.

Scott

acjay commented 6 years ago

@taion I just re-read https://github.com/facebook/graphql/issues/284#issuecomment-346492667, and now I think I get exactly what you mean. And from my side thread, my conclusion to the title question, "Are Subscriptions and Live Queries the same thing?" is now "yes", qualified only by the need to answer the question of how to send updates.

In the best case scenario, those semantics can be defined at the spec level, leaving very little to be decided by library and application developers.

But, what if there's no natural one size fits all solution to describing updates? Much as scalar leaves basically every aspect of implementation to the client and server, could something similar be done for the concept of updates? If so, I think there's one major advantage to implementing live queries within subscription: you can subscribe to both new events and the changing state.

@robzhu, since you closed this ticket with the opposite conclusion, namely that live queries should be something separate from subscriptions, I'm curious whether this would address your concerns.

taion commented 6 years ago

The spec thing sort of is the thing, though. We were more or less able to ship subscriptions as of v0.4.8 that added support at a parsing level. The v0.10.0 release that changed the API to add first-class support – that was very, very nice from an API perspective, but ultimately didn't amount to much more than a minor API refactor: https://github.com/edvinerikson/relay-subscriptions/pull/39/files

By contrast, contra @rodmk, I can't see how to nicely implement live queries in a way that lets me handle lists efficiently, without pushing down the entire list every time the query updates, without some additional spec-level support. A subscription is so similar to a mutation from the schema perspective. A mutation isn't.

There is another distinction, too. Ultimately it's not that awkward to subscribe to add, delete, and change events. Doing something like Twitter's "new tweets" alert (instead of reactively showing new tweets) with subscriptions is... possible, but extremely annoying. And there are cases where you either want to or have to ship updates in that manner (e.g. we're doing HIPAA-related stuff, we may want to only indicate the availability of new data, rather than pushing down new private-ish data to the client... ).

acjay commented 6 years ago

I can't see how to nicely implement live queries in a way that lets me handle lists efficiently, without pushing down the entire list every time the query updates, without some additional spec-level support.

I'm not sure if my point isn't clear, or if I'm missing something you're saying. I think we agree that lists would seem to be the trickiest data type for coming up with a globally accpetable scheme of representing updates.

But do you get my point in analogizing that with the scalar situation? The handling of custom scalars is one of the more interesting (and initially confusing) parts of GraphQL to me. The spec basically completely punts on anything having to do with how they're represented. They're just dumb leaf data. It's up to the client and server to determine the convention for their representation. This is great, because it avoids clogging up the spec with arbitrary choices for things like dates and times.

Can't the same approach be used for the representation of updates, since there are several reasonable approaches? On a really simplified level, the server needs to implement some function (lastState, newState) => changeRepresentation for each type, and the client needs a corresponding set of functions (lastState, changeRepresentation) => newState. For argument's sake, let's just say the reference server implementation provides a default for all types could just be just send newState directly, ignoring lastState. Presumably, the reference server implementation would allow this default to be overriden by something more optimized.

There is another distinction, too. Ultimately it's not that awkward to subscribe to add, delete, and change events.

Yeah, I get that, but for reasons I think everyone agrees with, the event approach just isn't a great fit for every application. I'm just trying to say, I don't think it's actually that much more complex to do live queries using the exact same mechanism as has been built for events, with really just one additional concept of what I might call "modular update representation".

I hope this makes my point clearer, and sorry if I've misunderstood what you're trying to say.

taion commented 6 years ago

@acjay

What you're saying makes sense. The distinction I was drawing was that, with subscriptions, there was an "obvious" choice of the semantic GraphQL payload to send back to the client that exactly matches what things look like with a mutation.

The issue with live queries (esp. lists) is exactly as you say – the specific implementation needs to define its own format to use for encoding deltas, which is a problem that didn't arise with event subscriptions. It's just more stuff to decide for the app developer.

smolinari commented 6 years ago

Just to throw in what I've been understanding as a live query, which seems to be different to the discussion here and even a bit to what Rodrigo explained too, but I believe live queries shouldn't return whole datasets or deltas of the changed data, but rather only send a trigger to the client to re-request its "affected" query again. That way, the back-end can stay fairly dumb, because the client is the one asking for the new data through the particular query and only the updated data gets "pulled" back into the client.

Does that make any sense?

Scott

paralin commented 6 years ago

That sucks because the server has to look everything up again and can't keep any context in memory.

You're getting caught up in the implementation details. There are a lot of ways of accomplishing this. Two way socket, pub-sub change notification channel, long polling, merkle tree data hash comparison and state sync, server-side in-memory meekle tree result caches....

robzhu commented 6 years ago

@taion I just re-read #284 (comment), and now I think I get exactly what you mean. And from my side thread, my conclusion to the title question, "Are Subscriptions and Live Queries the same thing?" is now "yes", qualified only by the need to answer the question of how to send updates.

since you closed this ticket with the opposite conclusion, namely that live queries should be something separate from subscriptions, I'm curious whether this would address your concerns.

Re-reading the thread now, I have not found compelling arguments for why the answer to this question is "yes". To quote from Rodrigo's presentation at GraphQL Summit, "Live Queries observe data, subscriptions observe events"

For example, suppose you had a server-side clock that tracks the current time. The current time has two interesting properties: the value itself, and when it "ticks".

If you want to observe the current time, use a Live Query. If you want to observe the "tick" event, use a Subscription.

These are (awkwardly) isomorphic because you can always record the set of events in a list and observe that list. For example, you can use a CQRS-style log, but it seems silly to have a CQRS log for seconds in the day.

Another angle: a Live Query is essentially a Query. You can poll any Query to simulate its behavior as a Live Query. By contrast, polling a Subscription (where the subscription does not have a stateful channel between polls) doesn't make sense.

Hope that communicates my current thinking. I'm not seeing the recent arguments cover new ground, so I'm inclined to keep the issue closed, but please let me know if I'm missing some context.

paralin commented 6 years ago

@robzhu summarizes it nicely. It's easily possible to add an events (subscriptions) implementation on top of whatever live query system, and it's also probably possible to make a live query system using subscriptions as some kind of awkward transport.

At the end of the day data is data and the way you transfer it depends on what you want to do with it and how often it changes.

acjay commented 6 years ago

@paralin But the point I'm trying to make is that if we can "forget" for a minute that subscription was created with an event paradigm in mind, it's actually very close to being suitable for live queries, as well. What seems to be missing is simply a concept of a difference between the intial response and the stream of updates and a (modular?) scheme for representing those updates. Not to minimize those issues, but it feels like a manageable hump. Which is also why I'm thrilled the answer has been revised to "yes" :D

paralin commented 6 years ago

@acjay Those two things that you just described - including "modular scheme for representing those updates" - is a live queries system. There's no reason to use a subscription channel as your transport for a live queries system. It adds nothing over just a websocket transport. Therefore a subscription channel is not suitable for live queries, as well. It's suitable for the event-based paradigm, which was what it was designed for.

I built a prototype of an efficient live-queries system with magellan and it doesn't look anything similar to the subscriptions system - for performance I binary encode and batch changes to different parts of the result tree, which wouldn't be possible via a subscriptions channel anyway.

acjay commented 6 years ago

@paralin Maybe I'm missing something, but if the assumption is that a web socket server could simply choose to interpret a vanilla query as being a subscription for live query updates, why wouldn't the exact same thing work for events? It's just a single query that's responded to multiple times, when the server deems it appropriate.

taion commented 6 years ago

@acjay I think that's exactly right. A minimum (not especially efficient) implementation could just hold onto the full query and re-run the entire thing and push the results down to the client every time it gets an update. That's in fact how I read the "call to make a prototype" bit at the end of @rodmk's talk.

paralin commented 6 years ago

@taion @acjay I would struggle to call that a live query system at all. As we're discussing what a real implementation of something like that would look like, or in essence trying to figure out what the "best approach" would be, I'm not really considering hacks like sending the entire state over a subscription channel as a "live query system."

You can do the exact same thing with just a websocket and a server-side polling [run query, check if changes happened, wait 3 seconds] loop, and remove the entire graphql stack. In that way it's not useful to have the subscriptions stack in the mix at all for something like this. It is for this reason that I would say that the two things are entirely separate and should be treated as such.

I went and watched Rodrigo's talk and while I would argue that saying Subscriptions and Live Queries are interchangeable is misleading, he is right in that you can build almost any application with either approach. One approach will just be better for certain types of things than the other.

taion commented 6 years ago

@paralin Let's move this discussion to #386 instead of continuing to comment on a closed issue.

smolinari commented 6 years ago

Live Queries observe data, subscriptions observe events

Live Queries observe "data store events", i.e. record creations, updates and deletes. Also, those data store events could be due to other events.

Scott

graphql / graphql-spec

Subscriptions RFC: Are Subscriptions and Live Queries the same thing? #284