istio-ecosystem / authservice

Move OIDC token acquisition out of your app code and into the Istio mesh
Apache License 2.0
217 stars 63 forks source link

Authservice pods are using high memory resources even when utilizing Redis caching #156

Closed atoy3731 closed 3 years ago

atoy3731 commented 3 years ago

In one of our more trafficked environments, we're noticing our authservice pods using all 4Gi of memory we're assigning to in our K8s resource limits, which is causing the pods to restart.

We're implementing HA and utilizing Redis for caching, so I wouldn't think the authservice pods themselves should be utilizing that high of memory. Perhaps it is a leak?

incfly commented 3 years ago

4GB sounds suspicious.

but without data, just using 4GB is not a definition of the mem leak, right? if the traffic goes down, but authservice mem usage does not go down, that would be an issue. or keep the same level of the traffic, but we see an unbounded memory usage growth for authservice as time passes. (say change the limit to higher, but keep the same level of the traffic)

other than that, what are the high traffic look like? is it more of a un-login new session request (entire oidc flow) or requests already login-ed? or mix? want to see how to reproduce this if turns out to be a problem indeed.

incfly commented 3 years ago

I indeed have reproduced the issue, https://github.com/incfly/authservice/tree/oom/test/perf contains some instructions of how to set this up. Once we obtained a session cookie for repeatable requests, i noticed the authservice itself memory does not shrink even after requests going down.

incfly commented 3 years ago

@Shikugawa BTW, I also spent a bit time to obtain the memprofile for the authservice when under load test. This is a memory allocation for 15s execution profile. profile001.pdf

It has some setup steps needed to obtain the pprof memory profile rendering. https://gperftools.github.io/gperftools/heapprofile.html We will need to add LD_PRELOAD env var to point to the authservice docker image's lib.so. And then have the process dump into some folder /tmp/xxx. And copy the profiler dump to local machine and render into pdf via pprof.

I don't have exact steps/manifest to get that for now. But let me know if you need to get this setup for investigation.

Thanks!

Shikugawa commented 3 years ago

@atoy3731 @incfly I fixed OOM here. https://github.com/istio-ecosystem/authservice/pull/167 But this fix is not related to Redis (will cause even if utilizing in-memory cache because of my overlook of implementation 🤦‍♂️ .)

Shikugawa commented 3 years ago

resolved by #167