microsoft / regorus

Regorus - A fast, lightweight Rego (OPA policy language) interpreter written in Rust.
MIT License
150 stars 33 forks source link

regorus vs regorust #113

Closed cosmincatalin closed 9 months ago

cosmincatalin commented 9 months ago

Not an issue, per say, but what is the difference between regorus and regorust? I'm interested in this especially in the context of both projects being under the microsoft organisation.

anakrish commented 9 months ago

@cosmincatalin Thanks for your interest in the project.

I can provide background about Regorus.

Regorus started off as a project to bring Rego into Confidential Containers. The following are the key goals/attributes:

We also want to use it as a playground for language exploration, and also expose it to other environments.

cosmincatalin commented 9 months ago

Hi @anakrish , thank you for the description, but I'm still in doubt regarding the differences from regorust. It sounds to me like you could have just used regorust for the same purpose and vice versa. However, I understand that currently, one key advantages of regorus over regorust is performance due to it being native Rust.

anakrish commented 9 months ago

@cosmincatalin

It sounds to me like you could have just used regorust for the same purpose and vice versa.

Excellent question!

There is a strong push towards writing software in memory-safe languages like Rust for security, reliability and performance. Here are few public links:

Writing software in C/C++ and exposing it to Rust, unfortunately does not reap the benefits of using Rust directly. There will still be memory access violations, buffer overruns, double-frees etc. in the C/C++ code, despite programmers' best intentions. Such issues are to a large extent avoided by use of Rust.

Thus, it is better to write code in Rust and expose to C and not the other way around.

one key advantages of regorus over regorust is performance due to it being native Rust.

Based on initial measurements Regorus is

Benchmark script available here: https://github.com/anakrish/rego-compare.

In my experience, often times such large differences in performance are not due to it being in native Rust. It has more to do with software design, algorithms used, trade-offs evaluated etc.

cosmincatalin commented 9 months ago

Thank you for this detailed response, @anakrish . Things are quite clear now.

anakrish commented 9 months ago

For future readers of this issue:

cosmincatalin commented 8 months ago

I have just switched to using regorus, your latest changes, supporting Arc tipped the scales in your favor. Please continue doing what you're doing.

anakrish commented 8 months ago

@cosmincatalin

I'm trying to make improvements to the design.

Can you share why you need the arc feature? Do you share Value between threads or do you share Engine between threads?

cosmincatalin commented 8 months ago

I share an Engine instance with one policy that needs to be used over and over again.

anakrish commented 8 months ago

Thanks for sharing your usage.

The starting state of the Engine which consists of the parsed policies and the data value are immutable and Arc counted. I'm thinking of

The above scheme may be used instead of a global Arc<Mutex<Engine>>. It may not be that useful if the Engine instance part of an object that is already shared between threads using Arc+Mutex.

cosmincatalin commented 8 months ago

I think that would be a nice addition, certainly. Although I kind of got used to use Mutex I'd much rather avoid it if I can, without loosing performance, of course.

adams-shaun commented 4 months ago

@anakrish correct me if I'm wrong here -- but the referenced benchmark of calling a binary with json input seems to be nothing like a production use case.

What we want to do is create an engine and load policy/data to precalculate AST, then cache and re-use this object for an array of inputs. Was investigating the usage of golang bindings yesterday and performance-wise, this approach was getting crushed by opa/rego which can make use of precomputed AST and has some hooks for further optimizations.

For the regorus approach, I cached an Engine w/ policies loaded and cloned it for each evaluation. Certainly some of the performance delta can be attributed to cgo overhead; however, I'm also wondering if there is performance data available for native RUST uses cases more similar to what I just described?

Thanks!

anakrish commented 4 months ago

@adams-shaun

but the referenced benchmark of calling a binary with json input seems to be nothing like a production use case.

This comparison intended to give a peek into the performance of the different components (parser, interpreter) of the 3 engines (regorus, opa. rego-cpp) when evaluating the same policy. Hence no caching is done.

For most production use cases, yes, the policy will be preloaded, and evaluation will be done reusing this parsed policy. We've observed Regorus to be much faster in this case as well.

Was investigating the usage of golang bindings yesterday and performance-wise, this approach was getting crushed by opa/rego which can make use of precomputed AST and has some hooks for further optimizations.

Would it be possible to share your benchmark? In your benchmark is Regorus parsing the policy for each evaluation whereas OPA isn't?

For the regorus approach, I cached an Engine w/ policies loaded and cloned it for each evaluation.

Typically, one doesn't need to clone the engine for each evaluation. The same engine can be reused. I guess, you need to clone because you are using it form Go?

Certainly some of the performance delta can be attributed to cgo overhead

Likely. But I'm curious about the query evaluation numbers in your benchmark. I would like to fix any performance issues in the Go bindings nonetheless.

I'm also wondering if there is performance data available for native RUST uses cases more similar to what I just described?

The last performance experiment I worked on was to run the benchmarks in https://arxiv.org/pdf/2403.04651 using Regorus in place of OPA. The paper finds that Cedar is 80x faster than OPA. We observed that with unmodified Rego policies, Cedar is only 5x faster than Regorus and with tweaked Rego policy, Regorus is faster than Cedar. Note that in this benchmark, only the policy evaluation is measured, similar to the real-world use case you describe. My experiments are here https://github.com/anakrish/cedar-examples. Let me know if you want instructions to try it out.

adams-shaun commented 4 months ago

Hi @anakrish -- Thanks for the response.

I'm still trying to grok the results of our testing, and I'm not sure how much of it I can break apart for public consumption, but I'll try to give you a gist of the test setup:

My knowledge with CGO and Rust performance profiling is limited. Admittedly, I'm probably not doing a fair apples-to-apples comparison here. I was trying to determine whether Regorus could serve as a drop-in replacement in the interim while we transition towards a native Rust policy evaluation service.

Questions that still come to mind:

Hope that all makes sense.

anakrish commented 4 months ago

@adams-shaun

I have pushed a compare-eval script to https://github.com/anakrish/rego-compare that benchmarks the policy evaluation time using OPA and Regorus. For the chosen policies, you should see output similar to below indicating a speed up of 2x to 3x.

./compare-eval
Testing example policy
evaluating data.example.allow using OPA average eval time = 136.22 microseconds
evaluating data.example.allow using regorus eval-rule average eval time = 67.03837 microseconds
evaluating data.example using OPA average eval time = 160.16 microseconds
evaluating data.example using regorus eval-query average eval time = 77.98542 microseconds
evaluating data using OPA average eval time = 163.71 microseconds
evaluating data using regorus eval-query average eval time = 75.61737 microseconds

Testing ACI policy
evaluating data.framework.mount_overlay using OPA average eval time = 194.37 microseconds
evaluating data.framework.mount_overlay using regorus eval-rule average eval time = 70.82748 microseconds
evaluating data.framework using OPA average eval time = 1545.58 microseconds
evaluating data.framework using regorus eval-query average eval time = 548.64717 microseconds
evaluating data using OPA average eval time = 2054.17 microseconds
evaluating data using regorus eval-query average eval time = 650.39836 microseconds

The script uses two binaries opa-profile and regorus-profile that takes policies, data and input and evaluates a query or rule for a given number of iterations and prints average eval time.

You could try out your policy, data and input using opa-profile and regorus-profile to determine how the performance would look like in regorus and OPA without cgo overhead.

Note that regorus provides both eval-rule and eval-query. The former will always be faster than the latter since eval_query has to parse the query each time where as eval-rule doesn't.

anakrish commented 4 months ago

@adams-shaun

Thanks for sharing aspects of your use case.

Input data is transformed into rego text

By this, do you mean that rego code is generated for input instead of regorus::Value? Using Engine::add_data and Engine::set_input will be faster than emitting input/data values as rego code within a policy. In the former case, values are directly available whereas in the latter case the rego expressions need to be evaluated to on each query to compute values.

I was using "clone" because re-use of an engine across goroutines would require some locking mechanism

Makes sense. Is the cost of cloning also included in the measurement? Currently Engine::clone is not as optimized as it can be; deep copies are performed instead of just incrementing a ref count.

However; All things being equal, we still seem to spend more % of CPU cycles in the CGO evaluate policy path.

You could try eval_rule instead of eval_query. The former will be faster due to not having to parse the query as well as returning a Value directly instead of an OPA style Result.

Maybe the performance overhead could also be due to having to marshal the result back to json?

while we transition towards a native Rust policy evaluation service.

Would the policies be written in Rust or just the evaluator would be in Rust? I'm curious what is the motivation for the move to native Rust based policy evaluation service.

On a related note, a major feature in the plan for Regorus is the introduction of a policy intermediate language (IL) that different policy languages including Rego will be compiled into. In addition to being optimally evaluated, the IL would support code generation to Rust, C# etc.

Is CGO overhead a limiting factor here

I'm not yet sure. It could also be that OPA implements some optimizations that regorus doesn't. Trying out the policy using opa-profile, regorus-profile would help determine whether it is the CGO overhead or policy evaluation overhead.

Could exported FFI interface be optimized for further improvement (take input and evaluate pre-parsed query, returning result in a single call). i.e. limiting stack swaps and getting a more fair comparison.

Yes, we can do this if this is what is causing the performance difference.

anakrish commented 4 months ago

@adams-shaun

You might also want to try out evaluating a rule immediately after all the policies and data have been loaded into the engine instance. This rule evaluation will cause the engine instance to be "prepared for evaluation". Then if this engine is cloned, then the clones will inherit the "preparation" without having to redo the preparation again. This will speed up the evaluation using the clones, but I'm not sure whether this might make a difference in your case or not.

adams-shaun commented 3 months ago

Hey @anakrish,

Circling back to look at your data... thanks again.

I think for the rego-compare delta, most of the gap (for my use case) is closed by doing:

    rawPtr := util.Reference(input)
    parsedInput, _ := ast.InterfaceToValue(*rawPtr)

    start := time.Now()
    for i := 0; i < numIterations; i++ {
        rs, err = query.Eval(ctx, rego.EvalParsedInput(parsedInput))
        if err != nil {
            return err
        }
    }

I think this is analogous to parsing input to Value object in regorus script. Can you provide any more details on the "tweaked" policies you referenced for the white paper demo repo? I'm just digging into the code now, but wondering if there are any guidelines for optimizing performance.

Our policies are quite small and we iterate through multiple of them to arrive at a deny/allow. The input object is shared across multiple policies and is pre-filtered to only include relevant fields. The folks who worked on this before me have obviously put in a great deal of effort to squeeze out as much performance as possible from Opa.

I've made a few changes to my local regorus repo :

  1. allowing pre-parsing of the input and storing it on the heap for future use.
  2. creating a shared auto-scaling engine object pool to minimize cloning.
  3. Hacked on a wrapper to eval_rule that avoids passing json responses back through cgo -- now just a struct with a couple string fields.
  4. Started working on some built-ins for feature parity (main one we use is net.cidr_contains)

Taking all of the above, minus some CGO overhead, we are now beating the optimized opa/rego policy engine in local benchmarks (and much fewer heap allocs)! As Opa has not been performant enough for us, there has been an effort to hand-write native Golang rule evaluation code. I was hoping to show regorus as an alternative to this maintenance-heavy approach, but I'm not sure I can close the gap. My policies in regorus are evaluating in the 5-20us range, but these hand written functions are <1us. I should note, even 5us is quite small as a percentage of CPU time spent processing the gRPC request -- Probably with a high performance Rust gRPC server, that gap dissapears.

I'm fairly new (as of yesterday) to Rust, so I'm unsure how much faster I can push this. I'm hoping to continue this effort, learning the language and someday push some contributions back upstream.

I can share some examples with ya if you are curious, are you in Slack somewhere?

Thanks again!

anakrish commented 3 months ago

@adams-shaun Thanks for all the investigation. Great work! Let me address the various points in your post via multiple comments.

I think for the rego-compare delta, most of the gap (for my use case) is closed by doing:

Thanks for pointing out rego.EvalParsedInnput. It seems to improve OPA performance for the example policy, not so much for the ACI policy.

An interesting thing I noticed was that, even with all the parsing and preparation, if a new PreparedEvalQuery is created in each instance then there is a noticeable performance slowdown if specific rules are evaluated.

Testing example policy
evaluating data.example.allow using OPA average eval time = 88.96 microseconds
evaluating data.example.allow using OPA (fresh instance each time) average eval time = 101.58 microseconds
evaluating data.example.allow using regorus eval-rule average eval time = 69.33359 microseconds
evaluating data.example using OPA average eval time = 108.55 microseconds
evaluating data.example using OPA (fresh instance each time) average eval time = 116.01 microseconds
evaluating data.example using regorus eval-query average eval time = 77.91995 microseconds
evaluating data using OPA average eval time = 111.65 microseconds
evaluating data using OPA (fresh instance each time) average eval time = 117.74 microseconds
evaluating data using regorus eval-query average eval time = 75.10071 microseconds

Testing ACI policy
evaluating data.framework.mount_overlay using OPA average eval time = 185.59 microseconds
evaluating data.framework.mount_overlay using OPA (fresh instance each time) average eval time = 498.73 microseconds
evaluating data.framework.mount_overlay using regorus eval-rule average eval time = 70.60009 microseconds
evaluating data.framework using OPA average eval time = 1568.31 microseconds
evaluating data.framework using OPA (fresh instance each time) average eval time = 1717.96 microseconds
evaluating data.framework using regorus eval-query average eval time = 550.49225 microseconds
evaluating data using OPA average eval time = 2082.92 microseconds
evaluating data using OPA (fresh instance each time) average eval time = 2202.94 microseconds
evaluating data using regorus eval-query average eval time = 650.8659 microseconds

This leads me to believe that some kind of caching or initialization happens the first time PreparedEvalQuery is used. Do you know what could explain this? We could add similar caching to Regorus to improve performance further.

anakrish commented 3 months ago

@adams-shaun

Can you provide any more details on the "tweaked" policies you referenced for the white paper demo repo? I'm just digging into the code now, but wondering if there are any guidelines for optimizing performance.

We have an email thread with the authors of the OOPSLA paper (some of the MSR researchers I work with are well connected with the creators of the Cedar language who authored the paper) showing our findings that Regorus is much faster than OPA for the rego policies in the paper and when policies are tweaked, Regorus even outperforms Cedar. This is remarkable given that the policies being benchmarked are tailored for the authorization scenarios which Cedar specialized in. The understanding is that the Cedar team will redo the benchmarks and update the paper as appropriate to reflect the new findings.

The tweaks to the policies to get better performance are:

anakrish commented 3 months ago

Taking all of the above, minus some CGO overhead, we are now beating the optimized opa/rego policy engine in local benchmarks (and much fewer heap allocs)! ... I'm hoping to continue this effort, learning the language and someday push some contributions back upstream. I can share some examples with ya if you are curious, are you in Slack somewhere?

It is quite impressive that you are able to beat OPA in local benchmarks for policies/input that have been heavily optimized for OPA. I think your work will greatly benefit other users of Regorus from Go. Let me know if you need any help contributing upstream. I seem to have a slack account (anakrish@microsoft.com) for confidential-containers workspace but am not too active there.

anakrish commented 3 months ago

As Opa has not been performant enough for us, there has been an effort to hand-write native Golang rule evaluation code. I was hoping to show regorus as an alternative to this maintenance-heavy approach, but I'm not sure I can close the gap. My policies in regorus are evaluating in the 5-20us range, but these hand written functions are <1us.

We will continue to improve the current Regorus interpreter which is a AST walking interpreter. That coupled with policy tweaks outlined earlier should result in reduced evaluation time. However, it might not still be close in performance to hand-written optimized policy evaluation code in say Rust/Go.

Our long-term plan is to introduce a policy intermediate language that many policy languages including Rego will be compiled into. The IL will be optimized and will be faster to evaluate than the current regorus ast walking approach. Also the IL will support generating policy evaluation code in Rust, C#, Go etc. This generated code will likely be much faster than interpreted evaluation and will likely be very competitive with hand-written policy evaluation code.