open-policy-agent / opa

Open Policy Agent (OPA) is an open source, general-purpose policy engine.
https://www.openpolicyagent.org
Apache License 2.0
9.74k stars 1.35k forks source link

Delta Bundles on large Datasets/Snapshot-Bundles - Memory usage #6079

Open smilowski opened 1 year ago

smilowski commented 1 year ago

Short description

Hi All, We seem to experience quite strange memory spikes on OPA loading delta bundles with larger datasets in InMemory storage.

Let us assume we have the following Structure in OPA ( loaded previously with a snapshot bundle ): { "list_of_persons": { "person1": { "relatedPerson1": { "relationType": [ [ "validFrom", "validTo" ] ] } } } The dataset is quite large (OPA runtime -> 1.8 Gb)

Once patches/delta bundles of the following form hit OPA, there is a x2, x3 memory spike. { "data": [ { "op": "upsert", "path": "/list_of_persons/nonExistingPerson1/relatedPerson1", "value": { "relationType": [ [ "validFrom", "validTo" ] ] } } }

I hope I am not mistaken in my analysis but, through debugging locally I was able to see that through an 'upsert' operation on a non-existing 'leaf' there is a DeepCopy of the whole InMem Storage

https://github.com/open-policy-agent/opa/blob/main/storage/inmem/txn.go#L210

OPA calls the Read method with _"/list_ofpersons/nonExistingPerson1", which returns NotFoundErr. OPA tries to create _"/list_ofpersons/nonExistingPerson1" but needs to firstly check that _"/list_ofpersons" exists. The Read method is called with: path = 'list_of_persons', which in turn makes a DeepCopy of 'list_of_persons'

Some additional info: opa_version: latest (master checked-out)

Expected behavior

My expectation would be that OPA reconciles/merges only the updates in the transaction/patch itself and not the whole InMem storage. In other words, that it does not produce a DeepCopy of the whole internal storage on every non-existing leaf.

Hope I could make my problem clear and would appreciate some feedback. Have a great day

ashutosh-narkar commented 1 year ago

The logic behind making a deep copy of the data during a read seems to handle the merging of updates and not to modify the original. We'll need to try this out but in your example were the any updates in the list here? Also if you have a sample snapshot and delta bundles handy and could share those, that would be hepful.

ashutosh-narkar commented 1 year ago

Also given your data set size I'm curious if you've experimented with Disk Storage?

smilowski commented 1 year ago

hi @ashutosh-narkar Exacly, the delta contains mutliple updates to the same branch, so this loop is triggered and merges are applied to the transaction.

I've generated example snapshot and delta bundles, which show this behavior.

Snapshot contains 1Mio. relations ( OPA Runtime ~1Gb after activation ) snapshot.tar.gz

Delta contains only 20 'upsert' operations, where some branches are new. delta.tar.gz

I've not tried Disk Storage, we are purely on in-memory storage

stale[bot] commented 1 year ago

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.

smilowski commented 1 year ago

hi @ashutosh-narkar Was wondering if you had some time to investigate this issue.

Seems like a tricky situation, due to the algorithm design. The algorithm seems to copy the internal storage and update it inside of the transaction scope. Hence the requirement for the DeepCopy operation. It then applies the new structures via write operation once all patches are processed. I assume, this design makes an all-or-nothing update easier to implement.

Was wondering if an introduction of an 'Exists' method on the Store interface would alleviate the problem. It seems the result of Read operation at those 2 locations is obsolete.

The Read method is however responsible for the reconciliation of the updates so the new Exists method would need to somehow do the same job

Do you have any thoughts?

ashutosh-narkar commented 1 year ago

Hey @smilowski, I haven't had a chance to look into this. From looking at the code, not sure an Exists method would have any benefit here. As you point out those Read calls at the 2 places could be improved and that may help with performance. In terms of improvements to the Read method itself, that would be a more involved change and need further investigation.

Also would be worth trying out disk storage to see if any performance benefit from that.

stale[bot] commented 1 year ago

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days. Although currently inactive, the issue could still be considered and actively worked on in the future. More details about the use-case this issue attempts to address, the value provided by completing it or possible solutions to resolve it would help to prioritize the issue.