Query Optimizer Annotate not working when used in nested query

jaydensmith commented 2 weeks ago

Describe the Bug

Consider a schema like the following.

@strawberry.django.type(models.Product)
class ProductType(relay.Node):
    id: auto
    name: auto
    price: auto
    tax_rate: auto
    group: 'GroupType'

@strawberry.django.type(models.Group)
class GroupType(relay.Node):
    id: auto
    name: auto
    product_count: int = strawberry.django.field(
        annotate=Count('products')
    )

Querying the list of groups including the productCount field works as expected, but when querying the group like the following, there is a database error Cannot resolve keyword 'products' into field. Choices are: name, price, tax_rate, group_id, id:

query GetItems {
    products {
        edges {
            node {
                id
                name
                price
                taxRate
                group {
                    id
                    name
                    productCount
                }
            }
        }
    }
}

System Information

Operating system: MacOS
Strawberry version (if applicable): 0.234.2

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

bellini666 commented 2 weeks ago

Thanks for the report and the example! Will take a look at that during the week :)

For curiosity, did that work before the nested optimizer feature from https://github.com/strawberry-graphql/strawberry-django/releases/tag/v0.44.0 or the issue predates that?

jaydensmith commented 2 weeks ago

Thanks for the report and the example! Will take a look at that during the week :)

For curiosity, did that work before the nested optimizer feature from https://github.com/strawberry-graphql/strawberry-django/releases/tag/v0.44.0 or the issue predates that?

Thanks @bellini666, I do not believe so. I updated to 0.44.0 to see if that fixed the issue. I first noticed the issue on 0.43.0.

bellini666 commented 2 weeks ago

@jaydensmith can you test with 0.41.1 to see if the issue still happens? Because https://github.com/strawberry-graphql/strawberry-django/releases/tag/v0.42.0 changed the way we collect sub-fields, so I want to make sure that it is not a regression from that release as well

jaydensmith commented 2 weeks ago

@bellini666 no worries, I just tested and the issue is present on that version too.

bellini666 commented 2 weeks ago

@bellini666 no worries, I just tested and the issue is present on that version too.

Ahh good to know that it is not a regression, it is probably a corner case that was not considered when annotate was implemented :)

Will take a look at that soon in the next couple of days.

bellini666 commented 1 week ago

Hey @jaydensmith ,

Was investigating this and I think what is going to...

The issue is the join in the group. What the optimizer is trying to do in your case is:

Product.objects.all().select_related(
    "group",
).annotate(
    products_count=Count("group__products"),
)

If you look at that queryset, there's no way currently for Django to annotate products_count inside product.group object, the annotation will end up inside product.

A way to workaround this would be to convert the select related to a prefetch related when it is nested and contains an annotate config. That way it would look like this:

Product.objects.all().prefetch_related(
    Prefetch(
        "group",
        Group.objects.all().annotate(
            products_count=Count("products"),
        )
    )
)

The downside of this is that instead of a simple join, now we need an extra query. Which is not a big deal, but something to be aware of

I'll need some more time on how to do that properly, as it will require some more complex changes to the optimizer.

In the meantime, your best bet is to use data loaders to solve situations like this.

Let me know if you have any doubts/issues regarding that

diesieben07 commented 1 week ago

It would be great if in the meantime the Optimizer could throw a descriptive error message when this happens instead of producing an invalid QuerySet.

bellini666 commented 1 week ago

It would be great if in the meantime the Optimizer could throw a descriptive error message when this happens instead of producing an invalid QuerySet.

Great callout!

jaydensmith commented 1 week ago

Hey @jaydensmith ,

Was investigating this and I think what is going to...

The issue is the join in the group. What the optimizer is trying to do in your case is:
Product.objects.all().select_related(
    "group",
).annotate(
    products_count=Count("group__products"),
)
If you look at that queryset, there's no way currently for Django to annotate products_count inside product.group object, the annotation will end up inside product.

A way to workaround this would be to convert the select related to a prefetch related when it is nested and contains an annotate config. That way it would look like this:
Product.objects.all().prefetch_related(
    Prefetch(
        "group",
        Group.objects.all().annotate(
            products_count=Count("products"),
        )
    )
)
The downside of this is that instead of a simple join, now we need an extra query. Which is not a big deal, but something to be aware of

I'll need some more time on how to do that properly, as it will require some more complex changes to the optimizer.

In the meantime, your best bet is to use data loaders to solve situations like this.

Let me know if you have any doubts/issues regarding that

I'm not sure if there's a way to resolve the main issue, but my problem with the prefetch related solution is that it makes the types a lot less versatile if I have to consider how the data is fetched whenever nesting it on another type. If there was a way to make the annotate function work, even if it resulted in another query, that would be preferable to me.

diesieben07 commented 1 week ago

I think what @bellini666 was saying is that the optimizer should automatically convert a select_related into a prefetch_related if it detects this situation. For you as the user it would be transparent, the optimizer would issue the best query possible given the circumstances.

bellini666 commented 1 week ago

I'm not sure if there's a way to resolve the main issue, but my problem with the prefetch related solution is that it makes the types a lot less versatile if I have to consider how the data is fetched whenever nesting it on another type. If there was a way to make the annotate function work, even if it resulted in another query, that would be preferable to me.

The idea is exactly what @diesieben07 said

In the meantime, I think data loaders would be the way to go for you. E.g.

@strawberry.django.type(models.Group)
class GroupType(relay.Node):
    id: auto
    name: auto

    @strawberry_django.field(only=["pk"])
    async def product_count(self, root: Group, info:Info):
        return await info.context.data_loaders.load_product_count(root.pk)

# and the dataloader definition
async def load_products_count(pks: list[int]) -> list[int]:
    count_dict = {}

    async for group in Group.objects.filter(
        pk__in=pks,
    ).only(
        "pk",
    ).annotate(
        product_count=Count('products'),
    ):
        count_dict[group.pk] = group.product_count

    return [count_dict.get(pk, 0) for pk in pks]

If you need help with that, feel free to ping me in our discord channel and I can help you with that.

strawberry-graphql / strawberry-django