Open kkom opened 7 months ago
I think that's expected when field aren't optional, if you make the failing field option it should work as expected. Here's a test without the exception: https://play.strawberry.rocks/?gist=05181d8366f2584d390ccdac21d5d000
Thanks!
I think that's expected when field aren't optional
When you say this, do you mean according to the GraphQL spec or Strawberry's design decision?
@kkom from the spec: https://spec.graphql.org/October2021/#sec-Handling-Field-Errors
Here's an example that better shows the behaviour of failures in nested fields: https://play.strawberry.rocks/?gist=b1ca6f1c33325331a76e36fb0d80ebfa
Ah, I see it now – thanks! Yeah, the short-circuiting behaviour makes perfect sense :)
I tested it in anger in a lot of configurations and every behaviour agreed with the GraphQL spec.
PS: I only found some small non-determinism in the errors
field. The speed with which coercible and incoercible errors occur affects the contents of errors
. Say you have two fields that fail – one nullable and one not. If the nullable one fails first – you'll see two error entries. But if the non-nullable one fails first – we won't include the error for the nullable one. I don't think that's a huge deal though and doubt the spec worries about it.
However, we still have the problem of on_execute
and on_operation
hooks completing before resolvers can finish. 😓
We use a custom Strawberry extension for context management. Basically same as this – it opens and closes a SQLAlchemy session which can be used in the resolvers.
If a response is prematurely concluded because of an incoercible field error, the unfinished resolvers will run in the background. However, it's very likely they'll raise more errors – which we do see in our Datadog logs.
The errors are varied, but all of them are noise – they don't matter anymore and happen because the context for the field resolvers is just messed up by the lifecycle hooks...
on_execute
and on_operation
lifecycle hooks and return the response.
¯\_(ツ)_/¯
and the clear error logs benefits seem more important to us now.What do you think? Any ideas appreciated! 🙏
I guess there is theoretically the 4th option of the lifecycle hooks waiting for also the unneeded resolvers to finish.
I wonder though if the HTTP response concluding before the on_execute
/ on_operation
lifecycle hooks finish, would cause even more problems...
Oh, I think there is a decent variation on option 4!
What if it was possible to configure a schema extension to wait for all resolvers to finish before concluding? Then we could selectively apply it to the extension where it matters (e.g. the one that manages our DB session).
Not sure how difficult it would be to implement and how much complexity it would add to the code, but I think it would solve all our problems. 🤞
What if it was possible to configure a schema extension to wait for all resolvers to finish before concluding? Then we could selectively apply it to the extension where it matters (e.g. the one that manages our DB session).
Is there any particular reason why you're doing this on an extension and not using your web framework functionality?
In any case I think we could introduce a hook for when the request is done 😊 @nrbnlulu what do you think?
What an endeavour @kkom :upside_down_face:!
So what I understand from this issue
When a NonNull
field fails on_execute
| on_operation
exits BEFORE all the other resolvers are gathered
This can be problematic for life-cycle resources because they might have been cleaned by the exit
hook.
I think this issue does not relate to extensions but for graphql-core here They don't cancel tasks if the execution is terminated, So I suggest to open an issue there.
BTW:
@defere / @stream
support was added).Is there any particular reason why you're doing this on an extension and not using your web framework functionality?
That's a very good question! The context we prepare is highly dependent on whether the GraphQL operation is a query or a mutation.
That's why we use the on_execute
hook – to leverage self.execution_context.operation_type
already prepared by Strawberry. It is quite elaborate, but the performance benefits of using different DB access patterns are huge.
But thanks for the suggestion – maybe there is a decent way to leverage the web framework for it! We could use a heuristics on the body of the request to distinguish queries from mutations before Strawberry kicks in. Or always construct both kinds of contexts (though this may be inefficient).
I think this issue does not relate to extensions but for graphql-core here They don't cancel tasks if the execution is terminated, So I suggest to open an issue there.
Thanks so much for the pointer! I'll try to report it there – feels like fixing this should be valuable whichever way we go! :)
Ok, reported this to graphql-core
: https://github.com/graphql-python/graphql-core/issues/217 - thanks for the suggestion @nrbnlulu !
@patrick91 – I'll play with using FastAPI for it, but I've realised that there may be a problem. If these tasks are somehow abandoned on the event loop, I'm not sure if even FastAPI would be able to wrap around them. Remember that the HTTP response is returned by that time...
A little update from graphql-core - it was acknowledged as something the maintainer did want to address as well. :) https://github.com/graphql-python/graphql-core/issues/217#issuecomment-2015919484
So the issue is definitely real - but not yet sure when it'll be addressed.
TL;DR
Strawberry short-circuits the HTTP response whenever there is an uncaught exception. This reduces the latency, but leads to:
~(i) (a) incomplete and (b) nondeterministic responses~ (edit: established in the comments that it's expected) (ii) hooks being completed before some resolvers, leading to apparent violation of a contract
I wonder if it would be possible to make Strawberry run all resolves to the end, even if some of them raise uncaught exceptions?
Describe the Bug
errors
, as soon as an (edit: incoercible) exception is raised.on_execute
andon_operation
.This last point can lead to issues – it violates the invariant that
on_execute
/on_operation
lifecycle hooks wrap around all resolver executions.This can be problematic when these hooks do state management, like in the example given in Strawberry's docs. As a result, in addition to seeing the original uncaught exception in our observability suite, we have additional noise from knock-on failures – caused by premature completion of hooks.
Is this natural behaviour given how various async tasks are orchestrated, or is possible to tweak this a little? I'm thinking:
~In fact, 2 may have other benefits – making the responses more (a) complete and (b) predictable. Currently, the GraphQL responses (i.e. which fields will return data and which won't) are non-deterministic (albeit a little faster thanks to the uncaught exception short-circuit).~ (edit: established in the comments that the short-circuiting is expected)
Repro code
Schema:
Logging extension:
Example query:
Example response:
Logs demonstrating that the resolvers continue being executed after hooks complete:
System Information
0.220.0
Additional Context
Upvote & Fund