Closed atoy3731 closed 3 years ago
4GB sounds suspicious.
but without data, just using 4GB is not a definition of the mem leak, right? if the traffic goes down, but authservice mem usage does not go down, that would be an issue. or keep the same level of the traffic, but we see an unbounded memory usage growth for authservice as time passes. (say change the limit to higher, but keep the same level of the traffic)
other than that, what are the high traffic look like? is it more of a un-login new session request (entire oidc flow) or requests already login-ed? or mix? want to see how to reproduce this if turns out to be a problem indeed.
I indeed have reproduced the issue, https://github.com/incfly/authservice/tree/oom/test/perf contains some instructions of how to set this up. Once we obtained a session cookie for repeatable requests, i noticed the authservice itself memory does not shrink even after requests going down.
@Shikugawa BTW, I also spent a bit time to obtain the memprofile for the authservice when under load test. This is a memory allocation for 15s execution profile. profile001.pdf
It has some setup steps needed to obtain the pprof memory profile rendering. https://gperftools.github.io/gperftools/heapprofile.html We will need to add LD_PRELOAD
env var to point to the authservice docker image's lib.so. And then have the process dump into some folder /tmp/xxx. And copy the profiler dump to local machine and render into pdf via pprof.
I don't have exact steps/manifest to get that for now. But let me know if you need to get this setup for investigation.
Thanks!
@atoy3731 @incfly I fixed OOM here. https://github.com/istio-ecosystem/authservice/pull/167 But this fix is not related to Redis (will cause even if utilizing in-memory cache because of my overlook of implementation 🤦♂️ .)
resolved by #167
In one of our more trafficked environments, we're noticing our authservice pods using all 4Gi of memory we're assigning to in our K8s resource limits, which is causing the pods to restart.
We're implementing HA and utilizing Redis for caching, so I wouldn't think the authservice pods themselves should be utilizing that high of memory. Perhaps it is a leak?