OOM error when lots of information is pulled by context aware policies

kubewarden / policy-server

Webhook server that evaluates WebAssembly policies to validate Kubernetes requests

https://kubewarden.io

Apache License 2.0

138 stars 18 forks source link

OOM error when lots of information is pulled by context aware policies #716

Closed flavio closed 4 months ago

flavio commented 7 months ago

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

Currently we create a kube-rs reflector object per each type of context-aware resource is made by a policy. This reflector keeps a copy of the Kubernetes results in memory, and keeps it in sync with the kubernetes's internal data.

The more resources are pulled, the more memory is consumed by the Policy Server process. We had reports of Policy Server being killed by the kernel OOM killer because it was consuming too much memory.

That happened with the following information being pulled from Kubernetes:

Namespaces 3200
Ingress: 10500
ClusterRoleBinding: 200
RoleBinding: 11000

Expected Behavior

The Policy Server should not be killed by the OOM killer. There should be no need to tune the memory limits of the Pod.

Steps To Reproduce

No response

Environment

* Kubewarden 1.11

Anything else?

No response

flavio commented 5 months ago

Kubewarden 1.13.0-RC1 is out which ships with this bug fix.

We have reduced the memory spike that happens when resources are fetched from the cluster fort the 1st time. Also, memory usage at rest has been drastically reduced.

The key point however is that there's no silver bullet. When operating inside of a big cluster, with lots of resources (like the numbers inside of the issue's description), there's not that much we can do to reduce the initial spike. The spike happens when we fetch all these resources for the 1st time, and we put them into our reflectors.

When deploying a Policy Server inside of such a cluster, the administrator must take into account this spike when calculating the resource limits (just the memory) of the Pods.

According to our load tests, after the initial spike, the memory usage goes down and remains stable. This will prevent the policy-server processes from being flagged as offenders by the kernel's OOM killer.

flavio commented 5 months ago

@fabriziosestito can you share some numbers over there? This is going to be useful when doing the blog post

fabriziosestito commented 5 months ago

Benchmark data

10000 RoleBindings

k6 load testing (load_all policy) [policy server branch](https://github.com/fabriziosestito/policy-server/tree/feat/sqlite-cache

	HTTP Request Duration (avg)	Max RSS Under load	Idle RSS
No fixes	436.15ms	1.4 Gb	1.2 Gb
kube-rs buffer fix + jemalloc	233.663ms	1.2 Gb	264 Mb

Kube-rs memory burst reduction

Watching 10000 RoleBindings:

ListWatch strategy:

	With Store	Without Store
Patched	~58 Mb	~13 Mb
Upstream	~92 Mb	~76 Mb

fabriziosestito commented 5 months ago

Also see https://github.com/kube-rs/kube/pull/1494#issuecomment-2126694967

clux commented 5 months ago

The merged PR into kube should help reduce the spike by a good factor (thank you!), as should Kubernetes InitialStreamingLists once it stabilises (probably a few years before you can use that in public distribution stuff tho).

In the mean time, here's a drive-by comment, you might have another optimisation path available to you now depending on how you structure things.

If you use metadata_watchers to fill your reflector stores, you could use stores for basic bookkeeping, but call an api.get(x) on the object before sending it to a user's validator. This can work if Kubernetes updates the objects more frequently than you send to the validators. It's kind of badly explained in kube.rs/optimization#watching-metadata-only.

flavio commented 4 months ago

Closing as fixed, we've seen positive results from the tests done by us and some users with the 1.13.0-RC1