strawberry-graphql / strawberry-django

Strawberry GraphQL Django extension
MIT License
391 stars 115 forks source link

N+1 in some resolvers when enabling `only` extension #521

Closed taobojlen closed 2 weeks ago

taobojlen commented 2 months ago

Describe the Bug

When the optimizer is enabled with the only extension, resolvers like the following lead to an N+1:

@strawberry_django.type(Milestone, filters=MilestoneFilter, order=MilestoneOrder)
class MilestoneType(relay.Node):
    @strawberry_django.connection(
            ListConnectionWithTotalCount["IssueType"]
    )
    def issues(self) -> List["IssueType"]:
        return self.issues.all()

Resolving this Relay connection leads to the following SQL queries:

SELECT "projects_milestone"."id" FROM "projects_milestone" WHERE "projects_milestone"."id" = 1 ORDER BY "projects_milestone"."id" ASC LIMIT 1
SELECT "projects_issue"."id" FROM "projects_issue" WHERE "projects_issue"."milestone_id" = 1 ORDER BY "projects_issue"."name" ASC LIMIT 101
# one query per issue:
SELECT "projects_issue"."id", "projects_issue"."milestone_id" FROM "projects_issue" WHERE "projects_issue"."id" = 1 LIMIT 21
SELECT "projects_issue"."id", "projects_issue"."milestone_id" FROM "projects_issue" WHERE "projects_issue"."id" = 2 LIMIT 21

Additional Context

You can reproduce this issue by adding the above resolver to MilestoneType in tests/projects/schema.py, then use the following test:

@pytest.mark.django_db(transaction=True)
def test_optimizes_queryset_in_resolver(db, gql_client: GraphQLTestClient):
    query = """
    query TestQuery ($id: GlobalID!) {
      milestone(id: $id) {
        orderedIssues {
          edges {
            node {
              id
            }
          }
        }
      }
    }
    """
    milestone = MilestoneFactory.create()
    issue_1 = IssueFactory.create(milestone=milestone, name="A")
    issue_2 = IssueFactory.create(milestone=milestone, name="B")

    # note: these query counts may not be the finally correct ones, but as long
    # as the assertion fails you'll see the N+1 queries
    with assert_num_queries(3 if DjangoOptimizerExtension.enabled.get() else 2):
        res = gql_client.query(query, {"id": to_base64("MilestoneType", milestone.pk)})

    assert isinstance(res.data, dict)
    result = res.data["milestone"]
    assert isinstance(result, dict)

    expected = [to_base64("IssueType", i.pk) for i in [issue_1, issue_2]]
    assert [edge["node"]["id"] for edge in result["orderedIssues"]["edges"]] == expected

I've done a bit of debugging, and I can see that the N+1 queries are executed in DjangoOptimizerExtension.resolve(). Specifically, this line:

    def resolve(
        self,
        _next: Callable,
        root: Any,
        info: GraphQLResolveInfo,
        *args,
        **kwargs,
    ) -> AwaitableOrValue[Any]:
        ret = _next(root, info, *args, **kwargs)
        # ...

This line executes three queries:

SELECT "projects_issue"."id" FROM "projects_issue" WHERE "projects_issue"."milestone_id" = 1 ORDER BY "projects_issue"."name" ASC LIMIT 101
# one query per issue:
SELECT "projects_issue"."id", "projects_issue"."milestone_id" FROM "projects_issue" WHERE "projects_issue"."id" = 1 LIMIT 21
SELECT "projects_issue"."id", "projects_issue"."milestone_id" FROM "projects_issue" WHERE "projects_issue"."id" = 2 LIMIT 21

I'm continuing to debug to isolate where these queries are coming from, and happy to open a PR once I fix it! But if anyone has a hunch or any pointers, let me know.

Upvote & Fund

Fund with Polar

taobojlen commented 2 months ago

Looks like the issue is in the optimizer, on this line:

qs = qs.only(*(only_set | select_related_only_set))

Before this line, evaluating the queryset executes 1 SELECT query. After this line, evaluating it executes 3 queries. I thought this may be a Django bug but I cannot reproduce it in my own Django project.

The extra queries could come from something calling __repr__ on each item in the queryset. I think so because of the LIMIT 21 clause in the SQL queries, and this code: https://github.com/django/django/blob/53719d6b5b745dd99b1ab9315afb242f706ebbf1/django/db/models/query.py#L376

Alternatively, it's because something is calling .get() on each item in the queryset: https://github.com/django/django/blob/53719d6b5b745dd99b1ab9315afb242f706ebbf1/django/db/models/query.py#L643

bellini666 commented 1 month ago

I did some tests here and to fix this issue, I would cause others :(

To properly resolve this I need to first resolve https://github.com/strawberry-graphql/strawberry-django/issues/337