venuu / jsonapi-authorization

Authorization for JSONAPI::Resource
http://www.rubydoc.info/github/venuu/jsonapi-authorization
MIT License
132 stars 59 forks source link

n+1 for side loaded records #83

Open adambedford opened 7 years ago

adambedford commented 7 years ago

I'm using JSON API include to side load two associations (two levels deep: company,company.company-master) and I'm seeing an N+1 issue when plugging in Pundit + JSONAPI Athorization into my JSONAPI Resources Processor stack.

I've tried explicitly including those two associations in the policy scope with no success, although this shouldn't be required since JSONAPI Resources should be handling this. It does when JSONAPI Authorization isn't involved.

The logs produced:

Started GET "/api/v1/job-applications?include=company%2Ccompany.company-master" for ::1 at 2017-09-20 18:11:06 -0700
[NAME COLLISION] `type` is a reserved key in EventResource.
[NAME COLLISION] `type` is a reserved key in AccountResource.
Processing by Api::V1::JobApplicationsController#index as JSON
  Parameters: {"include"=>"company,company.company-master"}
  User Load (0.7ms)  SELECT  "users".* FROM "users" WHERE "users"."id" = $1 ORDER BY "users"."id" ASC LIMIT $2  [["id", 6], ["LIMIT", 1]]
  JobApplication Load (1.4ms)  SELECT  "job_applications".* FROM "job_applications" WHERE "job_applications"."user_id" = 6 ORDER BY "job_applications"."id" ASC LIMIT $1 OFFSET $2  [["LIMIT", 1000], ["OFFSET", 0]]
  Events::Applied Load (1.6ms)  SELECT "events".* FROM "events" WHERE "events"."type" IN ('Events::Applied') AND "events"."subject_type" = $1 AND "events"."subject_id" IN (465, 466, 467, 468, 479)  [["subject_type", "JobApplication"]]
  Source Load (0.7ms)  SELECT "sources".* FROM "sources" WHERE "sources"."id" = 21
   (1.8ms)  SELECT "job_applications"."id", "companies"."id" FROM "job_applications" INNER JOIN "companies" ON "companies"."id" = "job_applications"."company_id" LEFT OUTER JOIN "events" ON "events"."subject_id" = "job_applications"."id" AND "events"."type" IN ('Events::Applied') AND "events"."subject_type" = $1 LEFT OUTER JOIN "sources" ON "sources"."id" = "job_applications"."source_id" WHERE "job_applications"."user_id" = 6 AND "job_applications"."id" IN (465, 466, 467, 468, 479)  [["subject_type", "JobApplication"]]
  Company Load (1.0ms)  SELECT "companies".* FROM "companies" WHERE "companies"."user_id" = 6 AND "companies"."id" IN (260, 285, 286, 287, 288)
  CompanyMaster Load (1.3ms)  SELECT "company_masters".* FROM "company_masters" WHERE "company_masters"."id" IN (338, 312, 79, 301, 359)
   (1.8ms)  SELECT "job_applications"."id", "companies"."id", "company_masters"."id" FROM "job_applications" INNER JOIN "companies" ON "companies"."id" = "job_applications"."company_id" INNER JOIN "company_masters" ON "company_masters"."id" = "companies"."company_master_id" LEFT OUTER JOIN "events" ON "events"."subject_id" = "job_applications"."id" AND "events"."type" IN ('Events::Applied') AND "events"."subject_type" = $1 LEFT OUTER JOIN "sources" ON "sources"."id" = "job_applications"."source_id" WHERE "job_applications"."user_id" = 6 AND "job_applications"."id" IN (465, 466, 467, 468, 479)  [["subject_type", "JobApplication"]]
  CompanyMaster Load (1.7ms)  SELECT "company_masters".* FROM "company_masters" WHERE "company_masters"."id" IN (338, 312, 79, 301, 359)
   (0.6ms)  SELECT COUNT(*) FROM "job_applications" WHERE "job_applications"."user_id" = 6
  Company Load (0.6ms)  SELECT  "companies".* FROM "companies" WHERE "companies"."id" = $1 LIMIT $2  [["id", 287], ["LIMIT", 1]]
  CompanyMaster Load (0.5ms)  SELECT  "company_masters".* FROM "company_masters" WHERE "company_masters"."id" = $1 LIMIT $2  [["id", 301], ["LIMIT", 1]]
  Company Load (0.9ms)  SELECT  "companies".* FROM "companies" WHERE "companies"."id" = $1 LIMIT $2  [["id", 288], ["LIMIT", 1]]
  CompanyMaster Load (3.7ms)  SELECT  "company_masters".* FROM "company_masters" WHERE "company_masters"."id" = $1 LIMIT $2  [["id", 359], ["LIMIT", 1]]
  Company Load (0.6ms)  SELECT  "companies".* FROM "companies" WHERE "companies"."id" = $1 LIMIT $2  [["id", 286], ["LIMIT", 1]]
  CompanyMaster Load (2.6ms)  SELECT  "company_masters".* FROM "company_masters" WHERE "company_masters"."id" = $1 LIMIT $2  [["id", 79], ["LIMIT", 1]]
  Company Load (0.6ms)  SELECT  "companies".* FROM "companies" WHERE "companies"."id" = $1 LIMIT $2  [["id", 285], ["LIMIT", 1]]
  CompanyMaster Load (0.7ms)  SELECT  "company_masters".* FROM "company_masters" WHERE "company_masters"."id" = $1 LIMIT $2  [["id", 312], ["LIMIT", 1]]
  Company Load (0.5ms)  SELECT  "companies".* FROM "companies" WHERE "companies"."id" = $1 LIMIT $2  [["id", 260], ["LIMIT", 1]]
  CompanyMaster Load (0.6ms)  SELECT  "company_masters".* FROM "company_masters" WHERE "company_masters"."id" = $1 LIMIT $2  [["id", 338], ["LIMIT", 1]]
  Rendering text template
  Rendered text template (0.4ms)
Completed 200 OK in 576ms (Views: 1.7ms | ActiveRecord: 45.0ms)

As you can see, the side loaded records are fetched as normal. However, after that, individual queries are performed, which shouldn't be happening.

This is my policy:

class JobApplicationPolicy < ApplicationPolicy
  def index?
    current_user.has_role?(:jobseeker)
  end
  alias_method :new?, :index?
  alias_method :edit?, :index?
  alias_method :create?, :index?
  alias_method :update?, :index?
  alias_method :destroy?, :index?

  class Scope < Scope
    def resolve
      # Applied Event is included because model delegates `date_applied` to `applied_event`
      new_scope = scope.includes(:applied_event, :source)

      if current_user.has_role?(:admin) || current_user.has_role?(:advisor)
        new_scope.joins(user: [:account]).where(accounts: { id: current_user.account.id })
      else
        new_scope.where(user: current_user)
      end
    end
  end
end

The scope returned should already include the includes(company: [:company_master]), and explicitly adding it doesn't resolve the issue.

valscion commented 7 years ago

I was worried we'd have something like this.

Most likely the culprit is in here, where we authorize that you have access to display every related record when using the ?include= to side-load resources.

https://github.com/venuu/jsonapi-authorization/blob/d258f01ddf86d2c718c1eda3c6e08e4293ce99a3/lib/jsonapi/authorization/authorizing_processor.rb#L340-L382

Would you be willing to try fixing this somehow?

It might very well be that the current approach requires re-thinking — and that is totally fine as well.

For context, this was the original issue that prompted authorization of side-loaded resources: https://github.com/venuu/jsonapi-authorization/issues/7 — and this PR implemented a first approach to that problem: https://github.com/venuu/jsonapi-authorization/pull/10

Of particular note is this comment in #10 by @lime:

There is most definitely still work that could be done here, Currently authorize_include_item is such a beast that it's bound to come back and bite us. :grimacing: Still, it might be best to get this merged and keep coding on top of it, so as to lessen the risk of conflicts for other PRs.

...so most likely your issue just highlights that the method got back and bit us 😅

adambedford commented 7 years ago

@valscion Thanks for your reply and thorough breakdown of the potential problem. I've taken a brief look at the source but honestly I'm not all that familiar with how the library is doing things, and much less familiar with JSONAPI Resources.

I'm not sure how much I can contribute to the solution, but I will certainly pitch in to test it extensively.

valscion commented 7 years ago

Would you happen to have time for me to walk you through the code, for example via a video chat? Would that help?

adambedford commented 7 years ago

I do need to make some time to figure this out since it's blocking currently.

Do you have a sense of how complicated the issue is?

valscion commented 7 years ago

I'm afraid it might be a bit complicated, and likely warrants rethinking include authorization.

Before writing any code, we should discuss design of potential solutions first, in order for us not to shoot ourselves in the foot. Try to bear with me as I write up some architectural decisions I've made in the past.


As the authorization layer sits on the JR processor layer, we unfortunately have way less information available on the resources and thus have to usually fetch records manually. This is likely the main reason why adding includes(company: [:company_master]) to your policy scope resolver did not help with the N+1 situation.

The current approach works by deducing associations from the ?include= directive, and then eventually calling SomePolicy#index? for has-many associations providing the ActiveRecord class to it, and SomePolicy#show? for has-one associations, providing the actual ActiveRecord model that might be included in the return payload included: [] JSON array.

The reason why we need to do this authorization in the first place, is that I have had a hunch and when looking at JR code, I seem to have found out that include directives haven't gone through the resource association hooks correctly and thus have bypassed our pundit policy scope hooks.

It would be amazing if we could:

  1. Verify this hypothesis is correct
  2. Have a test case that shows pundit policy scope resolver being bypassed

It would mean that without any hook to include directives, our applications could leak information.

On the other hand, if we can show that pundit policy scopes are being used on all levels of the include directive, the current solution might be a huge overkill and an incorrect solution in the first place.

It seems that in https://github.com/venuu/jsonapi-authorization/issues/7#issuecomment-179848209 I noticed that policy scopes are being bypassed for JSONAPI::Relationship::ToOne cases in PunditScopedResource#records_for method. If we have a good solution on how we could apply authorization for has-one resources, possibly by only figuring out what the code needs to look like here:

https://github.com/venuu/jsonapi-authorization/blob/da04d6cf5ca37c505e6503016141a4413e96c6b2/lib/jsonapi/authorization/pundit_scoped_resource.rb#L20-L21

...we might be safe again.

What we'd need would be tests that verify this functionality. Those tests should happen on the request layer for every operation allowing resource sideloading, like I did in #10 with these request specs:

  • GET /resources
  • GET /resources/1
  • PATCH /resources/1
  • POST /resources
  • GET /resources/other-resources
  • GET /resources/another-resource
adambedford commented 7 years ago

I think this is a bit beyond my scope of understanding for Pundit, JSONAPI Resources and this gem. I may not be able to take the lead on implementing a fix here.

valscion commented 7 years ago

Thanks for thinking about this and engaging in discussion @adambedford 💞 . I am grateful that you thought about this, and it's OK to admit that this might be too much right now :relaxed:

One way to get you unblocked at your project could be to subclass JSONAPI::Authorization::AuthorizingProcessor and create a jsonapi-resources Processor that skips authorize_include_item method completely to avoid the N+1 problem.

# Work around N+1 problem with `?include=company,company.company-master` queries
# by subclassing jsonapi-authorization processor to skip include authorization completely
#
# https://github.com/venuu/jsonapi-authorization/issues/83#issuecomment-337152268
class CustomAuthorizingProcessorSkippingInclude < JSONAPI::Authorization::AuthorizingProcessor
  def authorize_include_item
    # no-op
  end
end
# config/initializers/jsonapi-resources.rb
JSONAPI.configure do |config|
  config.default_processor_klass = CustomAuthorizingProcessorSkippingInclude
  # ...
end

I advise you to double-check that you are not leaking information from your database in your API responses.

ryanto commented 6 years ago

Hi @valscion

Thanks for the write up, it was helpful in understanding this issue!

Do you still think the best action is to write a test for the following?

Have a test case that shows pundit policy scope resolver being bypassed

valscion commented 6 years ago

Thanks for the write up, it was helpful in understanding this issue!

Great! I'm glad it was helpful :relaxed:

Do you still think the best action is to write a test for the following?

It would indeed be very helpful to verify my hypothesis is correct and we need to add manual authorization for the sideloaded resources case. I'd love to be proven wrong here. I have a feeling that this issue might have been resolved with upcoming jsonapi-resources versions, and having the spec in here to verify it would be ace.

Subtletree commented 6 years ago

Could potentially be fixed when https://github.com/cerebris/jsonapi-resources/pull/1164 lands

valscion commented 6 years ago

Wow, thanks for linking to that PR @Subtletree! It looks really promising indeed :relaxed: