graphiti-api / graphiti

Stylish Graph APIs
https://www.graphiti.dev/
MIT License
970 stars 139 forks source link

Always include id and type for belongs_to relationships #167

Open jkeen opened 5 years ago

jkeen commented 5 years ago

Summarizing the lengthy discussion on slack:

Currently graphiti does not include the resource identifier object of a relationship without explicitly requesting the resource be included. This makes sense in a has_many case where there might be thousands of related records, but perhaps not in the case of a belongs_to where the foreign key is right there on the loaded resource.

Not rendering the relationship data on the resource when it is easily available can cause problems, such as when using a client side jsonapi library with good caching like ember-data. Currently if I've loaded some parent records in one request, and later I load child records directly without explicitly specifying to include the parent, the payload will be rendered without any relationship data (only {meta: {included: false}}, perhaps links) and the client will have no way to relate the two together without a) re-requesting the relationship alone (through its link) or b) specifying the parent be sideloaded with the child request with ?include=parent. Both of these options require the server to re-render a resource that the client already has loaded, which feels inefficient.

Futher, it just feels weird that in a pre-jsonapi world a resource with a belongs_to would have had a parent_id attribute included in the payload, allowing the client to link things together that are already loaded, or even just to know which child records are grouped together without having to know anything about the parent.

jsonapi-serializable includes an option to specify linkage: always: true on a belongs_to to achieve this behavior, and I think graphiti should set this by default.

jkeen commented 4 years ago

Update: this was implemented in #168 along with making the default always_include_resource_ids: true on belongs_to relationships, and then that default was changed to always_include_resource_ids: false in #185, due to a problem with N+1 queries.

From @richmolj in #185

@jkeen sorry for the delay, I was in the middle of a dozen things and wanted to push a fix ASAP. Try the sample app with graphiti 1.2.4 and hit /api/v1/positions - you'll see that because we're rendering the resource identifier for the department, it's causing an N+1

I'm finally digging into this now, and have uncovered a couple of things.

First, I think the root cause of this might be an upstream issue in jsonapi-serializable

Second, there's a solid workaround we can do from graphiti. The sample app N+1 on positions is fully solved by changing the resource's base_scope to eager load the relationships

  def base_scope
    Position.includes(:employee, :department)
  end

I think graphiti should anticipate the needs of the resource and automatically modify the scope to eager load things if necessary. I've run into other query-heavy situations that I've solved by modifying the base scope to eager load things—is graphiti supposed to do this automatically already, @richmolj? Currently digging through the graphiti code to figure out how all this is happening now that I've identified the problem

richmolj commented 4 years ago

@jkeen sorry for the long delay - didn't see your latest comment until it came up in Slack yesterday. Thank you for the lovely issue description ❤️

I think graphiti should anticipate the needs of the resource and automatically modify the scope to eager load things if necessary.

We actually already do this - if the request says it wants relationships (?include=positions) then we eager load those relationships. In this case there's nothing in the request telling us we should eager load this data.

That's why the workaround here is:

?include=department&fields[departments]=id

In which case we will eager load the association, but only render the id.

A better solution would be to always render the resource identifiers if the relationship is a belongs_to because in that case no eager loading is necessary. I'd definitely love to see that in Graphiti.

jkeen commented 4 years ago

A better solution would be to always render the resource identifiers if the relationship is a belongs_to because in that case no eager loading is necessary. I'd definitely love to see that in Graphiti.

I think this is what we originally did in #168 (only having them on belongs_to relationships by default), but it was reverted in #185 due to an N+1 issue? I've got always_include_resource_id: true set by default (monkey patched) for all my belongs_tos in my app and do occasionally run into weird/inefficient queries. More often than not this ends up being being solved by changing my base_scope to eager load some deeply nested thing. Recently I disabled links by default on my API to improve that situation/performance quite a bit as it seems like models required to render custom links are not anticipated.

I think graphiti should anticipate the needs of the resource and automatically modify the scope to eager load things if necessary. We actually already do this - if the request says it wants relationships (?include=positions) then we eager load those relationships. In this case there's nothing in the request telling us we should eager load this data.

I've spotted trouble with queries when a model is needed for a belongs to a couple of levels deep, even if that model has been eager loaded on the top level. For instance, consider this contrived example:

  post
    belongs_to :author
    belongs_to :site
    has_many :comments
    has_many :participants # through comments

 author
   has_many :comments
   belongs_to :site

 comment
   belongs_to :author
   belongs_to :site

And so in the case of posts having many comments having an author which has many comments, it seems like the deep relationships get missed with the eager load.

I started going down a path in a local graphiti copy where I try to traverse relationships and resources and auto include things. So in the above example if posts?include=comments are requested, the query behind the scenes preloads authors of the post and authors of the comments, to combat what I often see in a query like that — a handful of SELECT * FROM authors WHERE author ID = $1, and… I think it even happens if I request authors at the top level. The only way I found to get around it was to preload the entire nest of relationships, i.e. Post.includes({:comments => :author}, :author, :comments)

Here's a snippet of the mess I started creating:

    attr_reader :preload_map, :preload_map_keys, :requested_includes
    def build_scope(base, query, opts = {})
      @requested_includes = query.params[:include].dup
      @preload_map_keys = fetch_preload_map_keys # builds the belongs to map and converts it dot notation
      auto_includes = ([@preload_map_keys] + [@requested_includes]).flatten.compact.uniq
      @preload_map = build_preload_map(auto_includes) #just so this can be to console in scope
      Scope.new(base.preload(@preload_map), self, query, opts)
    end

But it occurs to me again when writing all this that it seems silly that we need to include resource to load a the type and id of a belongs_to when that information can be inferred from the table row that's already loaded by the parent item, doesn't it?

richmolj commented 4 years ago

But it occurs to me again when writing all this that it seems silly that we need to include resource to load a the type and id of a belongs_to when that information can be inferred from the table row that's already loaded by the parent item, doesn't it?

I think this is the salient bit. I agree! Right now the code calls the equivalent of post.author.id instead of post.author_id. We just need a PR that does the latter instead of the former on an opt-in basis. This is likely somewhat complicated as it's not how our dependency jsonapi-serializable works (there's also the issue that a FK might not necessary equal the canonical identifier, but this feature would be opt-in in any case).

Though I agree this would be a good feature, one reason it's not a priority for me personally is it's a client-side issue with an existing client-side fix. The reason to include the identifiers is for client-side caching, so it seems OK that the client should specifically request those identifiers with ?include. Again, I support the feature it just doesn't scratch my personal itch.

The last thing of note I think is closer to an identity/map cache, where we can see authors/123 was already loaded by another part of the graph. This would address your Post.includes({:comments => :author}, :author, :comments) scenario, and have general performance impact as well.