Closed klauern closed 5 years ago
Note that the one that looks the most stable is our least-used server, so it doesn't get any traffic, hence it doesn't really fluctuate.
Hi @klauern , can you describe the workload a little bit?
We also faced this issue and I did some profiling and found out memory leak is because of grafeasclient connections are not closed after the request is handled and there are many zombie open connections.
I created debug docker image with netstat installed and observe new grpc connections (TCP) are getting created whenever a new pod is created. These connections do not get closed and left open.
root@kritis-validation-hook-f76fd4c75-z2cvg:/go# netstat -W | wc -l
524
Connections
tcp 0 0 kritis-validation-hook-f76fd4c75-z2cvg:55436 nrt12s23-in-f10.1e100.net:https ESTABLISHED
tcp 0 0 kritis-validation-hook-f76fd4c75-z2cvg:44904 nrt12s13-in-f10.1e100.net:https ESTABLISHED
tcp 0 0 kritis-validation-hook-f76fd4c75-z2cvg:54140 nrt12s15-in-f74.1e100.net:https ESTABLISHED
NOTE: 1e100.net
is google domain for containeranalysis API.
We internally use kritis fork and temporary using this fix. https://github.com/mercari/kritis/pull/18
Ideally, Kritis should not create grafeasclient inside handler but instead should be initialized before calling request handler so that same connection can be reused between goroutines (request handlers)
Thank you @klauern for reporting the issue and @vbanthia for contributing a potential fix! I'd be happy to review a PR with the fix, merge and release it as 0.1.1
.
Wrote a PR to fix this issue. https://github.com/grafeas/kritis/pull/394
Thanks for the fix, @vbanthia! Let's keep this issue open until we've refactored Kritis code per your comment in #394:
Ideally, Kritis should not create upstream client inside handler. It should be created outside and passed as an argument to request handlers. This way same client can be used by all goroutines. This will require a bit of refactoring, till then we can use this solution to fix memory issues.
Happy to accept contributions that address this with refactoring.
We will be testing this out in the next week on our internal tools K8s cluster. We should be able to tell fairly quickly if this is fixed.
Great, thanks for the update! Looking forward to hearing from you.
On Thu, Sep 26, 2019 at 11:05 AM Nick Klauer notifications@github.com wrote:
We will be testing this out in the next week on our internal tools K8s cluster. We should be able to tell fairly quickly if this is fixed.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/grafeas/kritis/issues/391?email_source=notifications&email_token=AAEXLWUK4KEQBQBR5GAZ5BTQLTFUFA5CNFSM4IOGQS22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7V46VQ#issuecomment-535547734, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEXLWUVESMYZP5QDEIOO3DQLTFUFANCNFSM4IOGQS2Q .
@klauern have you been able to confirm this by any chance?
Marking this resolved due to inactivity. Please re-open if this is still an issue.
Expected Behavior
It runs at a stable memory footprint
Actual Behavior
it doesn't?
Steps to Reproduce the Problem
Run it for a while
Additional info
I don't know how to better debug the issue, but here's a graph:
Each of these is an individual container on a separate Kubernetes cluster, so they are similar in nature but will have their own footprint.
I'd appreciate any guidance on what I can do to better provide information on what is going on. This is my only window into the view that I can find, and I'm scratching my head to find other ways to get this information on a running cluster.