Open vrok opened 4 months ago
Hello, What KEDA version are you using? this error shouldn't happen because KEDA tries to reconcille the ScaledObjects automatically. Do you see any error in KEDA ooperator logs?
@JorTurFer I'm on 2.14.0 (but I tested the main branch yesterday and the problem occurred too).
This is the ScaledObject definition:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
labels:
app.kubernetes.io/managed-by: Helm
scaledobject.keda.sh/name: scaledobject-workers
name: scaledobject-workers
namespace: default
spec:
scaleTargetRef:
kind: Deployment
name: scheduler
triggers:
- metadata:
scalerAddress: scheduler-scaler.default.svc.cluster.local:8080
type: external-push
And this is the HPA that gets created - notice that the list of metrics only contains a CPU-based metric (this is the default one inserted by K8s):
apiVersion: v1
items:
- apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
annotations:
meta.helm.sh/release-name: scheduler
meta.helm.sh/release-namespace: default
creationTimestamp: "2024-05-08T14:49:33Z"
labels:
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: keda-hpa-scaledobject-workers
app.kubernetes.io/part-of: scaledobject-workers
app.kubernetes.io/version: 2.14.0
scaledobject.keda.sh/name: scaledobject-workers
name: keda-hpa-scaledobject-workers
namespace: default
ownerReferences:
- apiVersion: keda.sh/v1alpha1
blockOwnerDeletion: true
controller: true
kind: ScaledObject
name: scaledobject-workers
uid: 1c21176d-71bc-4de2-9740-9fe03f5f66d7
resourceVersion: "2777064"
uid: a272a347-f011-499f-92e5-fa08d650f985
spec:
maxReplicas: 100
metrics:
- resource:
name: cpu
target:
averageUtilization: 80
type: Utilization
type: Resource
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: scheduler
status:
conditions:
- lastTransitionTime: "2024-05-08T14:49:48Z"
message: the HPA controller was able to get the target's current scale
reason: SucceededGetScale
status: "True"
type: AbleToScale
- lastTransitionTime: "2024-05-08T14:49:48Z"
message: 'the HPA was unable to compute the replica count: failed to get cpu
utilization: unable to get metrics for resource cpu: unable to fetch metrics
from resource metrics API: the server could not find the requested resource
(get pods.metrics.k8s.io)'
reason: FailedGetResourceMetric
status: "False"
type: ScalingActive
currentMetrics: null
currentReplicas: 1
desiredReplicas: 0
kind: List
metadata:
resourceVersion: ""
I'm also attaching logs from the operator pod:
Now, for example, if I edit the ScaledObject (with kubectl edit scaledobject ...
), KEDA's Reconcile()
method in scaledobject_controller.go
will be re-run and update the HPA resource with the expected changes. It seems to be happening because the GRPC connection error is ignored when the GRPC service isn't available yet, and when it becomes available, KEDA doesn't retry the GRPC call.
I'm going to try to reproduce this. From your example, I understand that I can deploy the ScaledObject and then, after some seconds, the external gRPC server and it'd be almost your use case, right? I want to find where we are hiding the connection error
@JorTurFer That's correct, the gRPC server with the external scaler should be down for some time after a ScaledObject is installed
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.
Report
When I install a Helm chart containing both an external scaler GRPC service and a
ScaledObject
, the resulting HPA has an empty list of metrics (K8s inserts the default 80% CPU utilization metric in that case). It then remains in that state even after the external scaler GRPC service has been initialized (I can manually force it to re-reconcile by editing the ScaledObject).This is happening because Helm installs the external scaler service and the
ScaledObject
at the same time. The external scaler's GRPC server isn't available immediately (it takes ~1 sec for the pod to start), and KEDA runs the reconciliation of theScaledObject
before the external scaler is available, ignoring the GRPC connection error.Expected Behavior
In my opinion, it would probably be better if KEDA were to re-queue the reconciliation request in these situations. For example,
Reconcile()
inscaledobject_controller.go
could be returningctrl.Result{RequeueAfter: time.Minute}
if a GRPC connection error was observed.Actual Behavior
KEDA doesn't update the HPA even after the external scaler is available.
Steps to Reproduce the Problem
ScaledObject
resource using an external scalerHorizonalPodAutoscaler
created by KEDA is missing the metric specified in theScaledObject
Logs from KEDA operator
No response
KEDA Version
None
Kubernetes Version
None
Platform
Any
Scaler Details
No response
Anything else?
No response