Closed mustaFAB53 closed 1 month ago
PS: Autoscaling is not affected significantly (even though we get prometheus query timeout issue at random intervals, it does get the metric on retries), but we are looking forward to find a root cause of keda pod getting crashed
I've not checked it yet, but it looks as an issue with the internal cache. WDYT @zroubalik ?
@mustaFAB53 thanks for reporting. Could you please also share the ScaledObject that causes this?
Hi @zroubalik,
Attaching the scaledobject kubernetes manifest being applied scaledobject.zip
Hi, the polling interval set to 1s is too aggressive. You Prometheus Server instance is not able to properly respond in time. I would definitely recommend you to extend the polling interval to at least 30s and then try to find a lower value that's reasonable for you and you don't see following problems in the output:
{"type": "ScaledObject", "namespace": "app1", "name": "myapp", "error": "Get \"http://prometheus_frontend:9090/api/v1/query?query=truncated_query&time=2024-02-28T09:59:41Z\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity
/workspace/pkg/scalers/prometheus_scaler.go:391
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
/workspace/pkg/scaling/cache/scalers_cache.go:130
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScalerState
/workspace/pkg/scaling/scale_handler.go:743
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState.func1
/workspace/pkg/scaling/scale_handler.go:628
2024-02-28T10:00:48Z ERROR prometheus_scaler error executing prometheus query {"type": "ScaledObject", "namespace": "app1", "name": "myapp", "error": "Get \"http://prometheus_frontend:9090/api/v1/query?query=truncated_query&time=2024-02-28T10:00:45Z\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}
github.com/kedacore/keda/v2/pkg/scalers.(*prometheusScaler).GetMetricsAndActivity
/workspace/pkg/scalers/prometheus_scaler.go:391
github.com/kedacore/keda/v2/pkg/scaling/cache.(*ScalersCache).GetMetricsAndActivityForScaler
/workspace/pkg/scaling/cache/scalers_cache.go:130
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScalerState
/workspace/pkg/scaling/scale_handler.go:743
github.com/kedacore/keda/v2/pkg/scaling.(*scaleHandler).getScaledObjectState.func1
/workspace/pkg/scaling/scale_handler.go:628
2024-02-28T10:02:53Z ERROR prometheus_scaler error executing prometheus query
You can also try to tweak HTTP related settings: https://keda.sh/docs/2.13/operate/cluster/#http-timeouts
Hi @zroubalik,
We have kept the polling interval this aggressive as we wanted scale up to happen immediately considering spike traffic. I will try increasing it to check if keda pod doesn't get crashed.
Regarding timeout settings, I had already tried to set it to 20000 (20s) but could not see any improvement.
@zroubalik i am also facing this issue in keda version 2.11.0
@mustaFAB53 I understand, but in this case you should also boost your Prometheus, as it is the origin of the problems - it is not able to respon in time.
+1 panic: runtime error: invalid memory address or nil pointer dereference
has anyone one is working on a fix?
is there something we can do to avoid getting this?
KEDA 2.11 , K8S 1.27
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity.
Keda operator pod crashes daily once with an error code 2 even kept ideal (autoscaling got triggered or not) Previous logs showed following different errors:
panic: reflect: slice index out of range
panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x19c9182]
Expected Behavior
keda-operator should not crash
Actual Behavior
keda operator pod crashes daily once with an error code 2
Steps to Reproduce the Problem
Specifications
Keda Operator Pod Status:
Attaching complete keda operator stacktrace of previous container run
slice index out of range
issue keda-operator-stacktrace.loginvalid memory address or nil pointer dereference
keda-stacktrace-SIGSEGV.logPS: Autoscaling is not affected significantly (even though we get prometheus query timeout issue at random intervals, it does get the metric on retries), but we are looking forward to find a root cause of keda pod getting crashed