Closed kenfdev closed 4 years ago
It should be possible to do this today by persisting the String() representations of the queries and support modules. When you load the persisted values you would have to re-parse and compile them.
When modules are parsed, the rules have the module pointer set on them. The module is excluded from the serialized value to prevent cycles in the serialized objects. That's why the module is not set on the rules you are reading from disk.
Thank you for the advice. That makes sense! Let me check if it works and report back. If all goes as expected, I'll close this issue π
@tsandall
The code is ugly, but as you suggested, it looks like it's working now!
https://github.com/kenfdev/sample-serde-opa/commit/2f54c1e7573e4bd44cd65608dcd73e11abeec104
Huge thanks to all of your advices @tsandall @patrick-east @srenatus
Closing this issue.(I'm going to further look at better serialization methods and benchmark the compile time etc.)
@tsandall A bit off topic from the issues title but I found that the re-parsing of a Module which has about 3000 lines is taking about 3-4secs
on my machine. Of course, if I keep the Module in-memory, the decision is made in about 10ms
or so.
Is there any way I can optimize this re-parsing speed in the current state of OPA? If re-parsing is going to consume pretty much time, I think it won't make much sense to put it outside somewhere like Redis or as a file.
@kenfdev you can call PrepareForEval
here and then cache the result in-memory. The result is a prepared query that you can call Eval() on repeatedly. The prepared query has been parsed and compiled so those steps don't have to be re-executed one each eval.
@tsandall Yes, I've had success using PartialResult
as well to cache in-memory. Apologies about not explaining enough. In my use-case, roughly speaking, I'd like to cache a PartialResult
per-user. This is going to consume a huge amount of memory and I was thinking about swapping out the PartialResult
outside of OPA (e.g. Redis, file, etc.) in order to save memory. Since PartialResult
doesn't seem to be able to be serialized, I took the path to serialize the PartialQueries
.
Despite succeeding in serializing the PartialQueries
, I found that re-parsing the Module
takes a lot of time and is not a viable solution. But I guess re-parsing is inevitable and saving something like PartialResult
or PrepareForEval
outside of OPA (Redis, file, etc.) is not possible now (is this a crazy idea in the first place?). Is my understanding correct?
BTW, thank you very much for your supportive replies!
Persisting PartialResult
or PreparedEvalQuery
is going to be problematic because they contain a compiler (which is not serializable). The compiler contains data structures like the rule index that are used during evaluation.
In my use-case, roughly speaking, I'd like to cache a PartialResult per-user.
This surprised me. Are you going to run PartialEval with a different data set for each user? If you share more details about what you're trying to accomplish along with the requirements/constraints that you have (e.g., latency, memory usage, # of users, # of permissions per user, permission model, etc.), I can provide more guidance.
As a side note, we're working on a guide that explains options for implementing IAM functionality in an application using OPA (e.g., something like the Chef write up but more general). /cc @timothyhinrichs
@kenfdev IIRC the primary issue was that some data specific to a user was slow to change and very large (prohibitively large to actually be used as input
for evaluation, or stored in the inmem store for all users). Would it be possible to just add a custom builtin (since you're already using the Golang API it should be super easy to add one) which makes calls out to pull in that data, or parts of required for evaluation.
Despite succeeding in serializing the PartialQueries, I found that re-parsing the Module takes a lot of time and is not a viable solution. But I guess re-parsing is inevitable and saving something like PartialResult or PrepareForEval outside of OPA (Redis, file, etc.) is not possible now (is this a crazy idea in the first place?).
I feel like there's a bug described between the lines here. There's json struct tags, the thing can be serialized/deserialized, but when it is, the result is not usable. I'd think that either PartialQueries
should use custom JSON (un)marshaling methods, so that it's stored as text, and restored by parsing the text, or it should not unmarshal into an unusable ast.Module
, but rather some ast.DehydratedModule
(names are hard, hope you get the gist).
tl;dr: what @kenfdev has attempted seems valid to try, given the code, and that it doesn't work is something to fix, either by adjusting expectations (commenting stuff?) or actually making it work somehow π
If you share more details about what you're trying to accomplish along with the requirements/constraints that you have (e.g., latency, memory usage, # of users, # of permissions per user, permission model, etc.), I can provide more guidance.
Thank you. The basic structure of the permission model is sort of like Chef's ( action
, resource
, policies
) and I'm experimenting with a huge number of policies to see the performance. Currently, huge is about 3000 to 4000 things (action
, resource
, statements
) inside the policies
to possibly iterate in order to make a decision.
I first thought of caching the PartialResult
entirely in-memory but since the combinations of policies
can differ slightly per-user, this was going to consume a huge amount of memory (predicting the number of users will possibly grow massively).
The second thing I thought of was to dynamically fetch the policies
per-user on run-time and cache the PartialResult
per-user in-memory. At first, this looked like it was working, but since there's a possibility that a user can have 3000 to 4000 things (action, resource, statements) to iterate, I found that the PartialResult
consumed about 90MB of memory with a single user.
Side Notes: @patrick-east , I'm dynamically fetching the policies
here and as you suggested, using it as data.policies
inside the rego instead of input.policies
;)
There are 2 things I wanted to test from this point.
Case 1 is what I am testing and discussing in this issue. I've found that PartialResult
cannot be serialized/deserialized directly but a PartialQueries
can be serialized/deserialized and re-produce a Rego object for evaluation. The problem (as I mentioned above) is that re-parsing (compiling) the Module
takes about 3-4s on my machine with a Module string of about 3000 lines. This won't be much useful to swap in and out of the service since the cost is too large to accept.
Hence, I'm coming to a conclusion that Case 1 isn't a viable solution. It would greatly help if you can suggest I'm missing a point and there actually is a way to swap in and out something pretty much equal to an already compiled Rego object.
As always, thank you very much for all your attentions. They are all super helpful.
P.S.
As a side note, we're working on a guide that explains options for implementing IAM functionality in an application using OPA (e.g., something like the Chef write up but more general).
I heard your talk about this and am super excited in how it would look like π
I'm re-opening this issue for further discussion. Thank you all for your attention.
Assuming we fix the panic in the module deserialization, you would still have to compile the deserialized partial query/module today. The reason is that the compile step creates data structures that are required by the evaluator. I.e., you can't run evaluation on the deserialized queries/modules alone. The compiler itself is not serializable today and I haven't looked into what that would take.
This is an interesting idea though. If we could serialize the state required for evaluation, you could imagine storing per-user evaluation state. To authorize requests you would lookup the evaluation state for the user and execute it. The evaluation state could be cached in something like redis or memcached or whatever. Perhaps the wasm support could help here (eventually, not today).
@kenfdev are you sure that you require ~4,000 distinct checks per user? How many users do you expect to have? Currently the maximum number of rules/checks I've seen loaded into an OPA is around ~300,000. If you really do require thousands of distinct/unique rules per user you'll probably have to consider sharding.
I'm going to close this issue because there isn't much to be done at this point. We could revisit the original scenario later on (e.g., persisting a larger number of distinct checks and then loading them on the fly). Perhaps wasm would be an answer here (e.g., compile out Wasm binaries for each user policy and then load it on the fly and execute.)
Wished Behavior
Eval
)Actual Behavior
I was able to Deserialize the PartialQueries from file but when creating the
compiler
for the Rego object, the compile fails withpanic: assertion failed
.https://github.com/open-policy-agent/opa/blob/master/ast/policy.go#L497
Steps to Reproduce the Problem
I built a simple project for this.
https://github.com/kenfdev/sample-serde-opa
The query and policy is simple and pretty much hard coded.
First I create a
PartialQueries
here and write it down to a file here. After that, I immediately read the file and unmarshal the file toPartialQueries
here . However, when I initialize thecompiler
here, it panics.I'm assuming this is because some mandatory information is lost on serialization/deserialization.
My question is, is serializing and de-serializing possible with PartialQueries? I'd like to cache it to somewhere like Redis for later usage and was thinking about how this could be achieved.
Thank you for reading this long issue.
Additional Information
Related conversations in slack are here:
https://openpolicyagent.slack.com/archives/C1H19LW4F/p1568892719072100