Closed TWeidi closed 1 month ago
Hi @TWeidi ,
This seems to be the same issue reported in https://github.com/strawberry-graphql/strawberry-graphql-django/issues/340 and https://github.com/strawberry-graphql/strawberry-graphql-django/issues/337
As I mentioned in both, can you try removing the prefetch_related
from the connection and tell me if it starts working properly?
Hey,
thank you for your response. I tried all possible combinations of field definitions on both Place and Restaurant types:
Variant1:
some_m2m_relation: ListConnectionWithTotalCount[SomeRelatedType] = connection(prefetch_related='some_m2m_relation'])
Variant2:
some_m2m_relation: ListConnectionWithTotalCount[SomeRelatedType] = connection()
Variant3: some_m2m_relation
field not defined
PQ: Place query:
places {
edges {
node {
some_m2m_relation {
edges {
node {
id
}
}
}
}
}
}
RQ: Restaurant query:
restaurants {
edges {
node {
some_m2m_relation {
edges {
node {
id
}
}
}
}
}
}
Place
Variant1 Variant2 Variant3
PQ RQ PQ RQ PQ RQ
Variant1 Yes Yes No Yes N/A Yes
Restaurant Variant2 Yes No No No N/A No
Variant3 Yes No* No No N/A N/A
* I would expect a 'Yes' here due to inheriance. All other cases behave as I would expect
Currently I need to use Variant1 on both Place and Restaurant types to enable optimization for both PQ and RQ
Hi @TWeidi ,
Ohhh, I'm so sorry, I got confused by the bug report because it was very similar to the other 2 I mentioned. You are mentioning n+1 here, not the lack of query optimization for those queries.
Regarding those issues:
The reason for that is because connections are used for pagination. If you prefetch all the results, you would either be fetching a lot more than you want to then slice a possibly very large list in memory, or your cache would be thrown away and you would have to make another query for the connection itself.
To avoid that the optimizer would need to prefetch
results for all items in the parent connection considering the applied pagination (together with any kind of filtering/ordering), and I haven't find a way to be able to do that correctly yet. Not sure if it is even possible TBH, consider this example:
For lists django can easily prefetch all 3 child connections for the 3 parents, but if it does that trying to apply pagination on those, how can it tell that query to limit it to like 10 results for each child connection? Maybe this would require the usage of RowNumber and filter the results which rownumber is < 10, but I'm not sure if performance-wise this would be better than letting the n+1 for very large results. e.g. say each childconnection has 1m+ results, rownumber as a window function would be applied after the results are materialized, meaning you could be replacing 3 ~1ms extra queries by 1 ~100ms.
prefetch_related
issue is that, if you ask the optimizer to do that, it will do the prefetch for all items I mentioned in the previous issue. Meaning that you would be fetching in memory all the rows for them to be sliced in a list, or even to be thrown away after pagination is applied to the queryset, making it even worse than not optimizing.For those kinds of nested relations, specially considering those limitations, I would say that you either want a simple list if the results from it are small enough, or let the n+1 issue happen and try to avoid letting too many queries happen if possible. e.g. avoid exposing a root query with a connection/list of a type that has nested connections
Hey @bellini666,
I guess i get your point regarding the nested m2m queries not being limited leading to a lot of overhead, but as far as i have tested your library handles that case already as I would expect. The query below is working perfectly fine:
places(first: 50) {
edges {
node {
some_m2m_relation(first: 10) {
edges {
node {
id
}
}
}
}
}
}
So, i can indeed limit the nested lists to some sensible length and get the expected result within acceptable time while it costs me only 2 database queries (no matter how many "some_m2m_relation"s are associated with one place) according to the django toolbar, but if i replace the places query with the restaurants query, the toolbar shows that this hits the database a lot more often, depending on the number of "some_m2m_relation"s associated with each place.
So, to rephrase my actual question: Why is this property:
some_m2m_relation: ListConnectionWithTotalCount[SomeRelatedType] = connection(prefetch_related=['some_m2m_relation'])
not inherited 'as is' when inheriting the type for the Restaurant model from that of the Place model but obviously replaced by:
some_m2m_relation: ListConnectionWithTotalCount[SomeRelatedType] = connection()
A quick comment in here. I'll soon try to workaround this with something like this
Describe the Bug
When inheriting from a strawberry_django.type the relay-connections defined on the base-type are not respected by the query-optimizer anymore when querying the relay-connection on the sub-typ
Example:
Querying the RestaurantType for the some_m2m_relation shows the n+1 problem unless i repeat the field definition of some_m2m_relation on the RestaurantType, preventing me from keeping my code DRY:
This works well:
Is this the intended behavior?
System Information
Additional Context
Upvote & Fund