open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.56k stars 1.33k forks source link

OPA high latency - potential cause: bad memory allocations #6795

Open itayhac opened 3 months ago

itayhac commented 3 months ago

NOTE: This issue is related to issue: https://github.com/open-policy-agent/opa/issues/6753 There we discussed the OOM caused by loading big data objects due to AST loading, on this issue we point out the Latency aspects.

We are working with OPA as our policy agent. We deploy multiple instances of OPA as docker containers on Kubernetes. Each OPA loads a bundle with data.json file of about ~15Mb.

Recently we have noticed that OPA response time for each request is 200-600 ms.

After further investigation we have found out that it is caused when we move big data objects to functions, more specifically between functions in other packages, but the same issue happens when we move data objects to a function in the same package.

  1. run this command: curl http://localhost:8181/v1/data/test_policy/evaluator/access

  2. run this command to get metrics: curl http://localhost:8182/metrics

in this case we get 165 ms latency, please also see that if we just remove the rules object from the function arguments we get less than 1ms latency.

image

Files: test_main_code.tar.gz test_rego_code.tar.gz

if further information regarding our production setup is required ill be happy to provide it.

ashutosh-narkar commented 3 months ago

We'll continue the discussion in https://github.com/open-policy-agent/opa/issues/6753. We can keep this open for tracking purposes.

stale[bot] commented 2 months ago

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.