Open itayhac opened 4 months ago
Thanks for the detailed issue @itayhac. I tried to reproduce this by running OPA on docker and setting a 4GB memory limit. I increased the number of go routines from your script to send more concurrent requests to OPA. The maximum amount of memory consumed by OPA did not cross 200 MB. Is there something different in your actual setup vs the mock bundle you've provided here? I would expect the CPU usage to spike while OPA handles these requests but it's still unclear why OPA runs OOM.
Hi @ashutosh-narkar , thank you so much for you fast and detailed reply. i changed the files to reproduce the issue with 4GB memory (i increased the size and structure of the data.json file).
please retry and it should be reproduced.
One thing I noticed in the policy is you're using the object.get
builtin on the data set instead of just accessing under data.rules
for example. You can probably avoid using the builtin. Another thing I noticed when I run the stress test with the openpolicyagent/opa:0.64.1-static
image variant there is no significant increase in memory. Have you seen that as well?
any further thoughts? @ashutosh-narkar can we label it as bug and prioritize it?
@itayhac can you please confirm if you're able to repro this issue with the upstream OPA images including any differences with the static
variant. You mentioned (in a separate thread) that y'all are building your own images. Also this could be a relevant issue.
the problem is reproduced with our own OPA image (we compile latest), and with both latest public images (static and non-static)
This could be related to https://github.com/open-policy-agent/opa/issues/5946. In your policy you're referring to a large object and this can be replicated if you modify the policy to refer to the object w/o using the object.get
builtin. @johanfylling did you encounter something like this while working on https://github.com/open-policy-agent/opa/pull/6040 ?
@ashutosh-narkar, the work in #6040 focused solely on the CPU time aspect, and did not look at how memory usage was affected.
The data has some objects and arrays and I wonder if when referenced inside of the policy the interface-AST conversions are impacting performance in terms of CPU and memory.
We're looking to implement something like discussed in https://github.com/open-policy-agent/opa/issues/4147. This should probably help with performance as we'll avoid the interface to AST conversion during eval.
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.
@itayhac are you able to repro this with OPA v0.67.0? I was unable to repro this so would be good to verify incase I missed something.
This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.
we are working with OPA as our policy agent. we deploy multiple instances of OPA as docker containers on kubernetes. Each OPA has k8s memory limit of 4GB. also, each OPA loads a bundle with data.json file of about ~15Mb.
recently we have noticed that some of our OPA instances have been restarted due to OOM. after further investigation we have found out that it happens when OPA is receiving frequent requests and memory fails to get free fast enough, which in turns results in OOM very fast (within 3 seconds).
Disclaimer: the bundle i share here is a mock data that best mimics our use case. i will share the heapdump that we got for the mimic data, and for actual production data (both with same rego code).
Please note, these are functions are taking almost 90 percent of the memory and the service gets OOMed out within seconds.
this is also true for our production memory profile.
Steps To Reproduce
run the following command to start OPA: opa run --bundle itay_kenv_files/test_15mb.tar.gz --server --pprof --log-level=info run the code to trigger OPA requests
Expected behavior
memory should remain low or at least get free shortly after the requests are being made.
Code that sends 100 request to OPA
test_15mb.tar.gz memory profile.zip
if further information regarding our production setup is required ill be happy to provide it.