Open jkeen opened 5 years ago
Update: this was implemented in #168 along with making the default always_include_resource_ids: true
on belongs_to
relationships, and then that default was changed to always_include_resource_ids: false
in #185, due to a problem with N+1 queries.
From @richmolj in #185
@jkeen sorry for the delay, I was in the middle of a dozen things and wanted to push a fix ASAP. Try the sample app with graphiti
1.2.4
and hit/api/v1/positions
- you'll see that because we're rendering the resource identifier for the department, it's causing an N+1
I'm finally digging into this now, and have uncovered a couple of things.
First, I think the root cause of this might be an upstream issue in jsonapi-serializable
Second, there's a solid workaround we can do from graphiti
. The sample app N+1 on positions
is fully solved by changing the resource's base_scope
to eager load the relationships
def base_scope
Position.includes(:employee, :department)
end
I think graphiti should anticipate the needs of the resource and automatically modify the scope to eager load things if necessary. I've run into other query-heavy situations that I've solved by modifying the base scope to eager load things—is graphiti supposed to do this automatically already, @richmolj? Currently digging through the graphiti code to figure out how all this is happening now that I've identified the problem
@jkeen sorry for the long delay - didn't see your latest comment until it came up in Slack yesterday. Thank you for the lovely issue description ❤️
I think graphiti should anticipate the needs of the resource and automatically modify the scope to eager load things if necessary.
We actually already do this - if the request says it wants relationships (?include=positions
) then we eager load those relationships. In this case there's nothing in the request telling us we should eager load this data.
That's why the workaround here is:
?include=department&fields[departments]=id
In which case we will eager load the association, but only render the id.
A better solution would be to always render the resource identifiers if the relationship is a belongs_to because in that case no eager loading is necessary. I'd definitely love to see that in Graphiti.
A better solution would be to always render the resource identifiers if the relationship is a belongs_to because in that case no eager loading is necessary. I'd definitely love to see that in Graphiti.
I think this is what we originally did in #168 (only having them on belongs_to relationships by default), but it was reverted in #185 due to an N+1 issue? I've got always_include_resource_id: true
set by default (monkey patched) for all my belongs_tos in my app and do occasionally run into weird/inefficient queries. More often than not this ends up being being solved by changing my base_scope
to eager load some deeply nested thing. Recently I disabled links by default on my API to improve that situation/performance quite a bit as it seems like models required to render custom links are not anticipated.
I think graphiti should anticipate the needs of the resource and automatically modify the scope to eager load things if necessary. We actually already do this - if the request says it wants relationships (?include=positions) then we eager load those relationships. In this case there's nothing in the request telling us we should eager load this data.
I've spotted trouble with queries when a model is needed for a belongs to a couple of levels deep, even if that model has been eager loaded on the top level. For instance, consider this contrived example:
post
belongs_to :author
belongs_to :site
has_many :comments
has_many :participants # through comments
author
has_many :comments
belongs_to :site
comment
belongs_to :author
belongs_to :site
And so in the case of posts having many comments having an author which has many comments, it seems like the deep relationships get missed with the eager load.
I started going down a path in a local graphiti copy where I try to traverse relationships and resources and auto include things. So in the above example if posts?include=comments
are requested, the query behind the scenes preloads authors
of the post and authors
of the comments
, to combat what I often see in a query like that — a handful of SELECT * FROM authors WHERE author ID = $1
, and… I think it even happens if I request authors at the top level. The only way I found to get around it was to preload the entire nest of relationships, i.e. Post.includes({:comments => :author}, :author, :comments)
Here's a snippet of the mess I started creating:
attr_reader :preload_map, :preload_map_keys, :requested_includes
def build_scope(base, query, opts = {})
@requested_includes = query.params[:include].dup
@preload_map_keys = fetch_preload_map_keys # builds the belongs to map and converts it dot notation
auto_includes = ([@preload_map_keys] + [@requested_includes]).flatten.compact.uniq
@preload_map = build_preload_map(auto_includes) #just so this can be to console in scope
Scope.new(base.preload(@preload_map), self, query, opts)
end
But it occurs to me again when writing all this that it seems silly that we need to include resource to load a the type and id of a belongs_to
when that information can be inferred from the table row that's already loaded by the parent item, doesn't it?
But it occurs to me again when writing all this that it seems silly that we need to include resource to load a the type and id of a belongs_to when that information can be inferred from the table row that's already loaded by the parent item, doesn't it?
I think this is the salient bit. I agree! Right now the code calls the equivalent of post.author.id
instead of post.author_id
. We just need a PR that does the latter instead of the former on an opt-in basis. This is likely somewhat complicated as it's not how our dependency jsonapi-serializable
works (there's also the issue that a FK might not necessary equal the canonical identifier, but this feature would be opt-in in any case).
Though I agree this would be a good feature, one reason it's not a priority for me personally is it's a client-side issue with an existing client-side fix. The reason to include the identifiers is for client-side caching, so it seems OK that the client should specifically request those identifiers with ?include
. Again, I support the feature it just doesn't scratch my personal itch.
The last thing of note I think is closer to an identity/map cache, where we can see authors/123
was already loaded by another part of the graph. This would address your Post.includes({:comments => :author}, :author, :comments)
scenario, and have general performance impact as well.
Summarizing the lengthy discussion on slack:
Currently graphiti does not include the resource identifier object of a relationship without explicitly requesting the resource be included. This makes sense in a
has_many
case where there might be thousands of related records, but perhaps not in the case of abelongs_to
where the foreign key is right there on the loaded resource.Not rendering the relationship data on the resource when it is easily available can cause problems, such as when using a client side jsonapi library with good caching like ember-data. Currently if I've loaded some parent records in one request, and later I load child records directly without explicitly specifying to include the parent, the payload will be rendered without any relationship data (only
{meta: {included: false}}
, perhaps links) and the client will have no way to relate the two together without a) re-requesting the relationship alone (through its link) or b) specifying the parent be sideloaded with the child request with?include=parent
. Both of these options require the server to re-render a resource that the client already has loaded, which feels inefficient.Futher, it just feels weird that in a pre-jsonapi world a resource with a
belongs_to
would have had aparent_id
attribute included in the payload, allowing the client to link things together that are already loaded, or even just to know which child records are grouped together without having to know anything about the parent.jsonapi-serializable
includes an option to specifylinkage: always: true
on abelongs_to
to achieve this behavior, and I think graphiti should set this by default.