strawberry-graphql / strawberry-django

Strawberry GraphQL Django extension
https://strawberry.rocks/docs/django
MIT License
421 stars 122 forks source link

SynchronousOnlyOperation error after upgrading to version 0.46.2 from 0.46.1 while trying to access root fields in resolver #588

Closed pfcodes closed 4 months ago

pfcodes commented 4 months ago

Describe the Bug

After upgrading to version 0.46.2, resolvers that previously worked started triggering a SynchronousOnlyOperation error. This error happens after attempting to access any field other than id on the root of the resolver. It looks like refresh_from_db is being called (see additional context section).

After downgrading to 0.46.1 everything works again.

async def creative(root: Model, info: strawberry.Info) -> Creative:
    creative = await info.context["creative_loader"].load(root.creative_id)
    return creative

class JobApplication(strawberry.relay.Node):
    creative: Creative = strawberry_django.field(resolver=creative)

System Information

Additional Context

File "/Users/phil/PycharmProjects/nova/jobs/strawberry/object_types/job_application.py", line 15, in creative
    creative = await info.context["creative_loader"].load(root.creative_id)
                                                          ^^^^^^^^^^^^^^^^
  File "/Users/phil/PycharmProjects/nova/venv/lib/python3.11/site-packages/django/db/models/query_utils.py", line 178, in __get__
    instance.refresh_from_db(fields=[field_name])
  File "/Users/phil/PycharmProjects/nova/venv/lib/python3.11/site-packages/django/db/models/base.py", line 724, in refresh_from_db
    db_instance = db_instance_qs.get()
                  ^^^^^^^^^^^^^^^^^^^^

Upvote & Fund

Fund with Polar

bellini666 commented 4 months ago

Hrm, I think the reason here is that creative_id didn't get included in the only because of this change

The main idea is that when the field has a custom resolver, the user should be defining the hints required for it. The optimizer was still optimizing for the field itself when the name matched an existing django model field, but while this is nice for some cases (e.g. yours), it can end up doing extra select_related/prefetch_related that can be problematic.

That change was mostly to fix that unexpected behaviour, but I wasn't expecting that someone would be actually relying on it (sorry about that)

For the current version you have 3 possibilities to overcome this:

1) If this is a ForeignKey (not sure if it is, but since the optimizer was adding the creative_id before I'm assuming it was) you could just do:

class JobApplication(strawberry.relay.Node):
    creative: Creative

And the optimizer would take care of adding it to select_related and adding only/etc to any of its nested relations.

2) If you really need it as a data loader, you can tell the field to add the id you need to access it to the only list, like:

async def creative(root: Model, info: strawberry.Info) -> Creative:
    creative = await info.context["creative_loader"].load(root.creative_id)
    return creative

class JobApplication(strawberry.relay.Node):
    creative: Creative = strawberry_django.field(resolver=creative, only=["creative_id"])

You can even convert this to a decorator if you prefer:

class JobApplication(strawberry.relay.Node):
    @strawberry_django.field(only=["creative_id"])
    async def creative(self, root: Model, info: strawberry.Info) -> Creative:
        return await info.context["creative_loader"].load(root.creative_id)

Any of those choices should fix the SynchronousOnlyOperation for you, and also make it more "explicit"


Having said that, I wonder if we should still rely on the old behavior somehow? I mean, I really prefer to have fields with resolvers having to explicitly give hints to the optimizer, but if this is something that you and other users might be relying on, it might be worth to think in an alternative

e.g. for a situation like this, we could add the field to only list at least? This is the option that is almost harmless as the performance implication of one extra only field is negligible in case it is not actually accessed. What do you think?

pfcodes commented 4 months ago

So the reason why I'm using the dataloader here is because the Creative model is a polymorphic model with two sub-types.

The child class needs to be resolved via a call to the django-model-utils InheritanceManager's select_subclasses queryset method. So in order to prevent an n+1 query in the async def creative() resolver, I call select_subclasses before the resolver is called and cache the results in the dataloader.

async def applications(root, info) -> List["JobApplication"]:
    # Load the creatives in the cache to prevent an n+1 query later when calling select_subclasses()
    creatives = (
        CreativeModel.objects.prefetch_related(
            "professions",
            Prefetch(
                "user",
                queryset=UserModel.objects.select_related("profile").prefetch_related(
                    "friends__from_user__profile",
                    "friends__from_user__creative",
                    "friends__to_user__profile",
                ),
            ),
            Prefetch(
                "creative_work_portfolio_projects",
                queryset=PortfolioProjectModel.objects.prefetch_related("files"),
                to_attr="prefetched_creative_work_portfolio_projects",
            ),
        )
        .filter(job_applications__job_posting=root)
        .select_subclasses()
    )
    info.context["creative_loader"].prime_many(
        {creative.pk: creative async for creative in creatives}
    )
    return root.job_applications.all()

So option 1 wouldn't work (unless theres another way to call select_subclasses beforehand which I might not be thinking of). Will try option 2.

pfcodes commented 4 months ago

Resolved this issue by correctly checking for polymorphism using hasattr in the resolver for the polymorphic type. No dataloader/prefetching needed.