Closed SomaticIT closed 3 years ago
Yeah, we have been through this in multiple cycles. In the first instance, we used another business logic / framework called ladon. We used regexp matching in SQL there but obviously SQL can't use indices then which makes queries quite slow when many policies are used.
The next iteration (current master) uses rego on top of a cache. Unfortunately, rego turned out to be way heavier and way slower than expected and we ended up with several performance-related issues because of that. Rego/opa requires like 35GB memory and 6 CPUs if you're really stress testing the system with complex policies ( #104 ).
The internal decision is to move away from OPA and back to Ladon (for AWS-style policies). However, we're not sure yet how to solve the querying part and/or caching part.
In general, policies are better suited for use cases where you have few of them. Using 100 policies per user sounds more like access control lists to me which we also want to support but don't do right now!
I think anything that runs in the database (e.g. pg funcs) will eventually run into the same problems as we do right now.
To be honest, I think that keto will evolve into something that works better with RBAC (+ Attributes) / ACL than into the AWS IAM Policy direction. While it was a pretty good idea to define a common language for defining access rights, AWS IAM Policies have caused a lot of headache around accidental data leaks for example. I think the move by Google and others away from policy DSL and towards RBAC + Attributes (e.g. Role "VM Owner" on GCP Project "my-project") illustrates that.
One major problem we discover though is data filtering is very hard. Let's take lists for example. Assuming you have a list of items but you want to show only those items that you created (so basically an ACL) - how do you enforce that with something like Keto? It's not very feasable to query all rows in you SQL Database, and then make REST calls to Keto to figure out if you can show the item or not. Is fetching that list from Keto first the answer? Well, if you want synchronization issues then probably yes :D
This is pretty well explained in this blog post were Torin (OpenPolicyAgent maintainer beast) explains the problem. I am unfortunately not sure if the answer provided there (given that it e.g. doesn't work with JOINs) is what we're looking for - it looks quite complex and prone to errors during operation, when for example running migrations. Here's an example project though.
In conclusion, there are two major problems haunting this project at the moment - performance and filtering. Unfortunately, resources are quite sparse now and other projects have precedence, which is why we haven't been able to work on this. But we'd really want to solve these issues (or explain e.g. data-filtering a non-goal).
Any input is welcomed!
Hi,
Thanks for all your work across the ORY stack. We're currently evaluating different components of it and I was currently testing out keto to see its limits before deciding if we could adopt it for our use case.
In my test I had 1 policy that hypothetically be used by many users, who have different roles. I tested with one to get an idea of how keto would perform with just handful of policies, a handful of roles, but lots of users
# 9,000 role-mappings, 1 policy -- 0.063 sec
# 90,000 role-mappings, 1 policy -- 0.272 sec
# 900,000 role-mappings, 1 policy -- 2.600 sec
It seems to be related to the above that you mention with loading all the data into opa before it could be evaluated (these lines https://github.com/ory/keto/blob/master/engine/ladon/handler.go#L487). This makes me wonder if it were better the role information was stored as some sort of many-to-many mapping so we can narrow down the data only the roles applicable to the user when querying by a user subject and then provide all the roles for a given user as opa ["inputs"] instead of needing it to load the entire corpus of user mappings just to evaluate 1 subject's access.
So for example:
k = num policies
n = number of user-roles mappings
m = number of roles for a user
# current solution
k * n
# proposed idea
k * m
I think it's safe to say for most systems m < n
by magnitudes.
I think as a temp work around we can keep the user-role mappings out of keto, and use the role as the subject with the initial request, and if the user has multiple roles, unfortunately we'd have to query multiple times for each user's role, or just not allow a user to have multiple roles and design policies, etc accordingly
Does this analysis make sense? Any suggestions?
I don't quite remember the exact internals but it is possible that small optimizations like that could be possible. As said above, there's still a plan to move away from OPA but we're currently lacking developers who would be able to take a lead on this as we are working on Kratos.
I totally understand. Is there anything I can do to help out that's not swap out the entire engine?
We're bound to build out these user -> role mapping tables like I mentioned above, so it would be great if it could be done in a way that contributes back and fits in with your architecture.
I totally understand. Is there anything I can do to help out that's not swap out the entire engine?
😅 I think to resolve the issue mentioned in the OP that is the only possibility right now. I feel uncomfortable with merging database-specifica (e.g. a solution for postgres only) because all databases should more-or-less work equally well.
I feel uncomfortable with merging database-specifica (e.g. a solution for postgres only) because all databases should more-or-less work equally well.
Agreed.
What I had in mind was introducing a few new tables instead of persisting them in the rego policy, which I guess could be considered on track towards what the setup may be without rego?
role_id | user_id |
---|---|
car.admin | user.0 |
payment.admin | user.1 |
car.viewer | user.1 |
policy_id | subject_id |
---|---|
car.admin.edit | car.admin |
car.view.products | car.viewer |
car.view.products | car.admin |
car.misc | user.1 |
WITH role_mappings as (
--- selects all roles for which a subject could be tied to (if subject is a user)
SELECT role_id
FROM role_mapping
WHERE user_id = <subject_id>
) ,
policies AS (
--- selects all policies that map to any of the roles tied to this subject
--- or this subject itself (in the case the policy is defined directly to a user subject or
--- the subject provided is actually a role already)
SELECT policy_id, subject_id
FROM subject_policy_mapping spm, role_mappings rm
WHERE
collection = <collection> AND (
spm.subject_id = rm.role_id OR spm.subject_id = <subject_id>
)
)
SELECT r.*, p.subject_id
FROM rego_data r, policies p
WHERE document->id = p.policy_id
Does that make sense?
Oh I see, so you want to filter the policies based on the role already in the database!
The problem with this approach is the matching strategy. So both glob and regex can not be executed in the database using an index. If we run a regex compare on e.g. the subject_ids then we will still end up with O(n) * O(m) where O(n) is the regex complexity and O(m) is the amount of policies. It's actually possible that Go will perform better here because Regex is guaranteed to be of O(n) complexity and we have all the data in memory as opposed to on disk.
So while this would work for the exact strategy, it would fail for regex and glob. This was the problem that haunted Ladon / ORY Keto before OPA and one of the reasons why we wanted to give OPA a try. Unfortunately, it performed even worse for use cases where you have many decisions in parallel.
Honestly, I think the best approach would be to figure out a good cache/in memory structure that would be able to filter data with regex/glob patterns. One example would be caching found policy IDs for a subject and invalidating those matches if any of the returned policy IDs change. I'm not sure if we would get a ton of advantage from using e.g. GroupCache (or etcd or any other distributed k/v store) but it could be an optimization on top
If we have that, we already solved the matching part and can actually rip out OPA.
I am closing this issue as our Google Zanzibar-based refactoring is scheduled to be released soon. Ory Keto up to version 0.5 will go in hibernation mode and receive only critical security patches.
Is your feature request related to a problem? Please describe.
Keto will encounter a lot of issue when it will be integrated in big applications. Indeed, to evaluate a
IsAllowed
request, it request the entire database in cache then applyrego
policies.Imagine you have an application used by 100k users which manages 100 policies and 5 roles on average. Each evaluate request will need to fetch 10M policies and 500k roles in cache and apply
rego
on 10.5M entities.I think that's not a good option.
Describe the solution you'd like
I don't have the optimal solution right now.
I image multiple solutions:
rego
flavor
(eg:policies:tenants:1:**
using solution proposed in #186) in theEvaluate
request for multi-tenant systemsCQRS
where arego_data_request
table is rebuilt whenever a new policy is insertedI think you may have other ideas. What do you think?
Describe alternatives you've considered
The only alternative I see is to have an instance of Keto per tenant, which is very hard to maintain.
Additional context
I saw that it is possible to run go code directly on postgres: https://github.com/dbudworth/gopgfuncs