Closed crpahl closed 2 years ago
Hey, thanks for the detailed report.
I bet take
would work fine, thanks for the suggestion. (Honestly, I thought limit
returned an Array already. But it looks like I was confused.) I'll take a look, thanks.
From the bigger picture, do you observe a performance or behavior difference between these two:
- AssociationsLoader.for(Client, :emails).load(object).then(&:emails)
+ object.emails
?
I ask because I've never seen effective batching for has-many relations. I would expect:
AssociationsLoader.for(Client, :emails).load(object).then(&:emails)
would return an unbounded AR::Relation
, just like object.emails
That is to say, why use a batch loader here? (There might be a good reason -- I see this used all the time -- but I haven't figured it out yet!)
The loader doesn't cache between instances
Ah I didn't even consider that. Thanks for pointing that out.
Sorry, I know I'll be explaining things you're already aware of, but just to give some more specifics for our case, we're using it to avoid N+1 queries. We have default_max_page_size
set to 100. So in the worst case, without preloading, we'll have this query run 100 times:
But by using the AssociationLoader
, all emails will be fetched for all clients in a single query instead:
The work flow for our product can be pretty complex and a number of our GraphQL fields are resolved through one-to-many associations and many of those fields may need to be called to fully populate a page in our SPA, or mobile app. So if we don't preload our queries, the accumulated load times for those N+1 queries can add up pretty quickly.
So yeah, basically we use it to avoid N+1 queries to:
Do any of those queries have LIMIT
s? I think that's what I'm struggling with -- I wouldn't want an unbounded query, because it might select a bazillion records from the database, loading them all into Ruby memory (via ActiveRecord) even though only a subset of them are actually required for the query.
Yeah that's a good point. In our case, all of our queries are scoped to an account and we'll typically be loading a reasonable number of records into memory and would rarely be in the realm of thousands of records.
If that's the case -- that you want to load all records, then paginate over them in memory, then I think the .to_a
approach you mentioned previously is a good one. The whole point of the ActiveRecordRelationConnection
is to run the minimal queries to satisfy the needs of the GraphQL query. But if you want to serve a connection based on an in-memory list, then using an Array (instead of a Relation) seems ok to me.
Otherwise, I'm not sure how to modify the relation connection to know "This is a relation, but you should treat it like an array" vs. "This is a relation, and you should run the minimal set of queries to fetch the required data" :S
Sounds good, thanks! Do you still think trying to update ActiveRecordRelationConnection
to use the take
Active Record method instead of limit
might be a good approach since I feel like that satisfies this:
"This is a relation, but you should treat it like an array" vs. "This is a relation, and you should run the minimal set of queries to fetch the required data"
take
is ultimately calling this:
def find_take_with_limit(limit)
if loaded?
records.take(limit)
else
limit(limit).to_a
end
end
Which will use records from preload/includes if there are any (which there would be in our case), or still call limit
if there aren't. Feels like it satisfies both of our cases, but I might be missing something 🤔
Yeah, I suppose it's still worth a look!
Thank you 🙏
Just going to add a bit more context with the workaround we're doing for now. We have a pretty large schema and a large number of our fields are resolved through a has_one
, or has_many
association method (which leads to a lot of potential N+1's). It would be pretty tedious to implement a resolver for every field that is essentially just something like this:
def emails
AssociationsLoader.for(Client, :emails).load(object).then(&:emails)
end
So we added a new keyword argument to the field definitions that takes the association(s) to be loaded as an argument. Here's an example of a has_many
and a has_one
:
field :emails, Types::EmailType.connection_type, preload: :emails
field :client, Types::ClientType, preload: :client
And we add this field extension to the field when the preload
argument has been set:
class AssociationLoader < GraphQL::Schema::FieldExtension
def resolve(object:, arguments:, context:)
AssociationsLoader.for(object.object.class, field.preload).load(object.object).then do
yield(object, arguments, context)
end
end
end
And to work around this for now I've updated it to:
class AssociationLoader < GraphQL::Schema::FieldExtension
def resolve(object:, arguments:, context:)
AssociationsLoader.for(object.object.class, field.preload).load(object.object).then do
resolved = yield(object, arguments, context)
resolved.respond_to?(:to_a) && resolved.present? ? resolved.to_a : resolved
end
end
end
I'm checking if it responds to to_a
because it might just be a single record from a has_one
association. And I'm checking resolved.present
in the case where the has_many
returns no records so I don't resolve an empty ActiveRecord_Associations_CollectionProxy
as an empty array (which will cause errors because the object type on the field will try to resolve fields using the empty array).
But yeah, if take
works we can go back to the simpler solution 😄
Describe the bug
We use GraphL-Batch, but we're seeing N+1 query results in
connection_type
field resolvers.We currently use the GraphQL-Batch association loader in any resolvers returning data through Active Record relations. For example, we basically have something like the following in our emails resolver:
If emails are returned as an array and we run the following query:
this will work as expected: all clients will be batched together, the emails association will be preloaded onto all clients and all emails will be retrieved in a single query allowing us to avoid the N+1.
But if emails are returned as a
connection_type
and we run the following query:the batch loader still works and loads all emails in a single query, but we noticed the following query getting run for every client:
After looking at the source, I found that was happening here. The issue I'm seeing is that limit won't use the preloaded results and will always fetch from the database. After experimenting with this a bit in the console, I found that take will use already loaded results from preload. Would a possible solution be to update
GraphQL::Pagination::RelationConnection
to usetake
instead oflimit
to paginate the results? I knowlimit
returns anActiveRecord_AssociationRelation
whiletake
returns aArray
and the pagination implementation would have to be changed 🙈 but I can't think of any other way to prevent the N+1 for the preloaded association.We can get around this for now by updating the resolver to return an array so it gets paginated through
GraphQL::Pagination::ArrayConnection
instead, but I'm not sure if this is the best approach:Versions
graphql
version: 1.13.0graphql-batch
version: 0.4.3