Closed FrancoisPoinsot closed 9 months ago
Thanks for reporting. Could you please check whether you can see the leak also when you use only Prometheus scaler on GCP? So we can narrow down the possible problems. Thanks!
With only Prometheus scalers, the go routines count is stable at 178 go routines.
here are the goroutines from pprof.
Cool, thanks for confirmation. And to clarify, this doesn't happen with version < 2.13.0 ? If it is a regression, than we should be able to track down changes in GCP pubu sub scaler.
I confirm this does not happen in v2.12.1
Maybe it's something related with the changes in the gcp client?
Do you see errors in KEDA operator logs? Maybe we are not closing well the client on failures? Could this be related with https://github.com/kedacore/keda/issues/5429? (as the scaler cache is being refreshed on each error)
yeah, the new queryClient
isn't closed, so if the scaler is being refreshed due to #5429, the connections aren't properly closed. I guess that it could be the root cause (I'll update my PR)
Report
After upgrading keda to 2.13.0 there seems to be a memory leak Looking at go_goroutines metric I see the number growing indefinitely. Confirmed at least above 30k. Here is a graph for
go_goroutines
.I have deployed keda in different cloud vendor. I only see this issue in GCP. Might be related to pubsub scalers I used only in GCP clusters.
Expected Behavior
memory/goroutine count remains somewhat constant. You can see very clearly on the graph above when the upgrade to v2.13.0 happens.
Actual Behavior
memory increases indefinitely
Steps to Reproduce the Problem
1. 2. 3.
Logs from KEDA operator
KEDA Version
2.13.0
Kubernetes Version
1.26
Platform
Google Cloud
Scaler Details
prometheus, gcp-pubsub
Anything else?
No response