strawberry-graphql / strawberry-django

Strawberry GraphQL Django extension
https://strawberry.rocks/docs/django
MIT License
411 stars 118 forks source link

optimizer over fetching rows #340

Closed tasiotas closed 3 months ago

tasiotas commented 1 year ago

Hi,

It looks like Query Optimizer is over fetching rows when working with Relay connections. I works fine with List but not with Connections.

So when I build schema using List, query below gets executed as single SQL But when using Connection, I get two SQLs. I guess thats how it works? Or could it be "optimized" ?

strawberry-graphql-django = "^0.16.0" repro https://github.com/tasiotas/strawberry_debug

thank you

#query
query MyQuery {
  user(pk: "1") {
    products {
      edges {
        node {
          name
        }
      }
    }
  }
}
# first query
SELECT "app_user"."id"
  FROM "app_user"
 WHERE "app_user"."id" = '1'
 LIMIT 21
# second query
SELECT "app_product"."id",
       "app_product"."name",
       "app_product"."size", # <- this shouldnt be here, didnt ask for it in the query
       "app_product"."user_id"
  FROM "app_product"
 WHERE "app_product"."user_id" IN ('1')

Upvote & Fund

Fund with Polar

bellini666 commented 1 year ago

I think this happened because the optimizer currently is skipping any nested connections: https://github.com/strawberry-graphql/strawberry-graphql-django/blob/main/strawberry_django/optimizer.py#L320C29-L320C29

Maybe we should allow this as long as the parent is a single result and not a list? This would make that specific case you are reporting work, but you would still have that same issue if you were retrieving a list/connection of users and then a connection of products inside those

tasiotas commented 1 year ago

I see. Seems like a complete solution might take a while to figure out. Sure, if you don't mind patching my case with that workaround, I will gladly take it. Nested connections are rare, but definitely can happen.

Thanks

bellini666 commented 1 year ago

@tasiotas ,

I was taking a look at this now and I'm a bit confused. The optimizer should actually be working for this because the products resolver would return a queryset, and that queryset would get optimized.

Can you show me how your python code is written for this type so I can try to reproduce it?

tasiotas commented 1 year ago

sure, take a look at this repro https://github.com/tasiotas/strawberry_debug

bellini666 commented 1 year ago

@tasiotas oh sorry, I did not see the link

The first thing I noticed was that you passed a prefetch_related in the products. Before I can test it, can you try it without the prefetch_related (i.e. type it as products: ListConnectionWithTotalCount[ProductType] = strawberry.django.connection() only)?

The reason I say this is because you don't want to prefetch_related a connection, as it will be sliced and the prefetched data will be thrown away (so it is even worse performance-wise). This can also be interfering in the optimizer since it will skip already prefetched querysets

tasiotas commented 1 year ago

sure, without prefetch_related I am getting 3 SQL queries

SELECT "app_user"."id"
  FROM "app_user"
 WHERE "app_user"."id" = '1'
 LIMIT 21
SELECT "app_product"."id",
       "app_product"."name"
  FROM "app_product"
 WHERE "app_product"."user_id" = '1'
 LIMIT 101
SELECT "app_product"."id",
       "app_product"."user_id"
  FROM "app_product"
 WHERE "app_product"."id" = '1'
 LIMIT 21

Is it possible to get just one query?

bellini666 commented 1 year ago

1 query is impossible, but 2 is what should be happening. One for the user and another one for the products list.

Now there seems to be an issue with the query optimizer here. The second query is the relay's connection one, and it should have selected the user_id as well as you can see in this condition. I think it did not do that because it actually doesn't know it is being retrieved using a relation through the user. I'll see what I can do about optimizing that

tasiotas commented 9 months ago

Hi @bellini666,

Do you think you will have time and energy to look into optimizer related issues this year? It's one of the last issues I need to address in my backend.

Currently I have few queries with 3 or even 4 level deep nested connections. Such query takes seconds to resolve without prefetching data.

Or is there a workaround for such cases? Can we disable optimizer and write query manually? I'm not sure it that's is even feasible solution.

Thank you! 🙏

bellini666 commented 9 months ago

Hey @tasiotas ,

I'm hoping to have the time and energy to look into it ASAP. If I were to give you an estimated ETA I would say probably in this years Q1 still (but more towards the end of it) :)

tasiotas commented 9 months ago

Glad to hear that, thank you.

rsomani95 commented 7 months ago

@tasiotas i'm running into this exact same scenario. Would be very helpful if you could share any workarounds you're using.

tasiotas commented 7 months ago

I don't have any. Luckily I can wait for solution from @bellini666