graphql / graphql-js

A reference implementation of GraphQL for JavaScript
http://graphql.org/graphql-js/
MIT License
20.06k stars 2.02k forks source link

Errors thrown when iterating over subscription source event streams (AsyncIterables) should be caught #4001

Open aseemk opened 10 months ago

aseemk commented 10 months ago

Context

Hi there. We're using graphql-js and serving subscriptions over WebSocket via graphql-ws (as recommended by Apollo for both server and client).

In our subscriptions' subscribe methods, we always return an AsyncIterable pretty much right away. We typically do this either by defining our methods via async generator functions (async function*), or by calling graphql-redis-subscriptions's asyncIterator method. Our subscribe methods effectively never throw an error just providing an AsyncIterable.

However, we occasionally hit errors actually streaming subscription events, when graphql-js calls our AsyncIterable's next() method. E.g. Redis could be momentarily down, or an upstream producer/generator could fail/throw. So we sometimes throw errors during iteration. And importantly, this can happen mid-stream.

Problem

graphql-js does not try/catch/handle errors when iterating over an AsyncIterable:

https://github.com/graphql/graphql-js/blob/2aedf25e157d1d1c8fdfeaa4c0d2f3d9d3457dba/src/execution/mapAsyncIterable.ts#L38-L40

There's even a test case today that explicitly expects these errors to be re-thrown:

https://github.com/graphql/graphql-js/blob/8a95335f545024c09abfa0f07cc326f73a0e466f/src/execution/__tests__/subscribe-test.ts#L1043-L1047

graphql-ws doesn't try/catch/handle errors thrown during iteration either:

https://github.com/enisdenjo/graphql-ws/blob/e4a75cc59012cad019fa3711287073a4aef9ed05/src/server.ts#L813-L815

As a result, when occasional errors happen like this, the entire underlying WebSocket connection is closed.

This is obviously not good! 😅 This interrupts every other subscription the client may be subscribed to at that moment, adds reconnection overhead, drops events, etc. And if we're experiencing some downtime on a specific subscription/source stream, this'll result in repeat disconnect-reconnect thrash, because the client also has no signal on which subscription has failed!!

Inconsistency

You could argue that graphql-ws should try/catch these errors and send back an error message itself. The author of graphql-ws believes this is the domain of graphql-js, though (https://github.com/enisdenjo/graphql-ws/discussions/333), and I agree.

That's because graphql-js already try/catches and handles errors both earlier in the execution of a subscription and later:

So it's only iterating over the AsyncIterable — the "middle" step of execution — where graphql-js doesn't catch errors and convert them to {data: null, errors: ...} objects.

This seems neither consistent nor desirable, right?

Alternatives

We can change our code to:

Doing this would obviously be pretty manual, though, and we'd have to do it for every subscription we have.

Relation to spec

Given the explicit test case, I wasn't sure at first if this was an intentional implementation/interpretation of the spec.

I'm not clear from reading the spec, and it looks like at least one other person wasn't either: https://github.com/graphql/graphql-spec/issues/995.

But I think my own interpretation is that the spec doesn't explicitly say to re-throw errors. It just doesn't say what to do.

And I believe that graphql-js is inconsistent in its handling of errors, as shown above. The spec also doesn't seem to clearly specify how to handle errors creating source event streams, yet graphql-js (nicely) handles them.

I hope you'll consider handling errors iterating over source event streams too! Thank you.

yaacovCR commented 7 months ago

I agree that the spec is agnostic, and that it would be useful for graphql-js to be consistent and provide explanatory errors. I think the spec should also be improved.

For context, it seems from https://github.com/graphql/graphql-js/pull/918 that prior to that PR, all subscribe errors threw, and that the argument was made there that explanatory errors would be helpful in some cases. The parts of the PR that I skimmed through doesn't seem to indicate why explanatory errors to the client would not be helpful with iteration errors; my suspicion is that the PR was attacking the low-hanging fruit, and the authors/reviewers there would not necessarily object to even more explanatory errors. :)

@robzhu @leebyron

yaacovCR commented 6 months ago

I think the next step would be to raise this topic at a working group meeting. @aseemk are you interested in championing this there? (I am potentially dangerously assuming that this hasn’t happened already…)

yaacovCR commented 5 days ago

https://github.com/graphql/graphql-spec/pull/1099 has editorial changes to the event stream that I am not sure are 100% clear on this point. The way forward I think still goes through a discussion at a WG meeting.