Federated Subscription Performance

wundergraph / graphql-go-tools

GraphQL Router / API Gateway framework written in Golang, focussing on correctness, extensibility, and high-performance. Supports Federation v1 & v2, Subscriptions & more.

https://graphql-api-gateway.com

MIT License

701 stars 131 forks source link

Federated Subscription Performance #712

Open steve-gray opened 10 months ago

steve-gray commented 10 months ago

When subscribing and resolving documents, we notice that it seems to be the case that messages are getting resolved sequentially. Is there a way to buffer and in-parallel expand the messages for a subscription so that sub-resolvers can be executed ahead-of-transmission of the previous messages? This in principal works because queries are intentionally idempotent, so we can "run ahead" of the transmission of each message resolving the next fews child nodes.

Just trying to find where this is in the code, and happy to add it as a PR.

jensneuse commented 10 months ago

Hey, I'm not sure I fully understand the mechanics of your service. Do you like to have "batched" reads on a subscription, similar to Kafka, and then resolve in parallel to leverage single flight during resolving? I'm not sure I can follow. Are you able to explain your use case in more detail?

steve-gray commented 10 months ago

Essentially we're subscribing to a NATs stream, but the rate of messages arriving from the stream is faster than the sub-resolvers per item can be resolved. This is causing backpressure on the server side for the consuming channel, messages to get discarded etc.

What we'd like to do is be able to execute up to N items ahead of resolvers on a subscription, so that when I query:

subscription {
     foo(input: $input) {
          bar {
              name
              fizz {
                  buzz
              }
        }
     }
}

Essentially would like to be able to resolve the sub-resolvers ahead of transmission for items, so that when 4 messages dequeue from the queue, they all process in parallel, up to some configurable limit, to enable the supergraph federation to keep up with the subscription data rate.

Otherwise, in the scenario where messages come every 10ms, but the queries take 11ms to resolve, you'll gradually fall further and further behind time, until you either drop the subscription, lose messages or OOM something in the chain.

jensneuse commented 10 months ago

Hey, we're currently implementing something specifically for Federation and Event-Driven Architecture on top of NATS. It would be interesting to have a conversation. Are you interested to book a call and we discuss this a bit further? You can book via this link: https://wundergraph.com/meet/jensneuse

steve-gray commented 10 months ago

Hey,

We're not interested in NATS at all (our delivery guarantees are iron-clad, cannot-drop, at-least once with streaming retries), but are not good fits for the JetStream architecture. Is this talk something that would be more abstract and not tied to NATS?

What we really want to know is where in the code to start even looking for the code that does the subgraph resolver calls, so we can buffer that. We're happy to do the work and submit it back as a PR, but unpicking where it happens in the Engine/EngineV2/EngineNewNewV2, EngineV2Final.psd labyrinth has eluded us so far.

jensneuse commented 10 months ago

This is not tied to NATS, we're just using it to develop the feature as it allows us to test everything in memory. This feature is not yet committed so it's not yet visible. Once done, you'll be able to implement your own interfaces with other message oriented middlewares. What I'm interested in is understand the access patterns you need and if our event source interface is sufficient.

steve-gray commented 10 months ago

Can you email through your calendar to steve (the at-sign) zeroflucs (dot) io? We'll be keen to have the chat. Timezone wise I'm UTC+10, and I've got an engineer who's UK time, and we'd both like to be on this call - so maybe some time that works for all 3 of us?