Open parjun8840 opened 2 years ago
Any update on this. We had seen something similar metricServer was reporting metric as 1 ?
Looked like the sarama client didn't query all partitions under the hood but just the first one ?
Yes, I'm also getting this issue in my production stack. Super strange. It seems that the lag threshold I set is just multiplied by the max number of replicas. Then I have my pods scaling up and down constantly. Very odd behaviour and a bit difficult to debug.
When I query the metric server and the metric I get a value that is equal to the multiplication of the threshold by the amount of replicas?
My trigger config is
What's strange is when testing this in a local k8s cluster with kind, the metric values are reported correctly. So the only different right now is. that my production stack is using AWS MSK, whilst kind is using a local deployment of kafka.
Just mentioning here when I set allowIdleConsumers to true in the kafka trigger the metrics are the same as the lag. Still not quite sure if the other behaviour is expected. Maybe there is a logic error happening somewhere, or I'm misunderstanding something in the implementation
A clear and concise description of what the bug is.
Expected Behavior
I have defined- lagThreshold: "10".
`apiVersion: keda.sh/v1alpha1 kind: ScaledObject metadata: name: kafka-scaledobject namespace: kafka labels: deploymentName: kafka-consumer-deployment # Required Name of the deployment we want to scale. spec: scaleTargetRef: name: kafka-ap pollingInterval: 5 minReplicaCount: 1 #Optional Default 0 maxReplicaCount: 3 #Optional Default 100 triggers:
I have pushed around "33" messages, which caused a lag of "33". With the lag as "33" or any value above "10" on the Kafka consumer group. It should scale up the Pods.
`sh-4.4$ ./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group order-shipper
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID order-shipper preorder 0 55 88 33 kafka-python-2.0.2-cc83bb42-afe7-4f34-b37b-d944859356b6 /192.168.171.85 kafka-python-2.0.2 sh-4.4$`
API request: "/apis/external.metrics.k8s.io/v1beta1/namespaces/kafka/s0-kafka-preorder" doesn't report value more than "10"
In the above it should be {"kind":"ExternalMetricValueList","apiVersion":"external.metrics.k8s.io/v1beta1","metadata":{},"items":[{"metricName":"s0-kafka-preorder","metricLabels":null,"timestamp":"2022-08-25T22:59:38Z","value":"33"}]}
ScaledObject:
HPA `arjunpandey$ kubectl describe hpa -nkafka Name: keda-hpa-kafka-scaledobject Namespace: kafka Labels: app.kubernetes.io/managed-by=keda-operator app.kubernetes.io/name=keda-hpa-kafka-scaledobject app.kubernetes.io/part-of=kafka-scaledobject app.kubernetes.io/version=2.7.1 deploymentName=kafka-consumer-deployment scaledobject.keda.sh/name=kafka-scaledobject Annotations:
CreationTimestamp: Fri, 26 Aug 2022 07:32:51 +0900
Reference: Deployment/kafka-ap
Metrics: ( current / target )
"s0-kafka-preorder" (target average value): 10 / 10
Min replicas: 1
Max replicas: 3
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
.---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from external metric s0-kafka-preorder(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: kafka-scaledobject,},MatchExpressions:[]LabelSelectorRequirement{},})
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
Normal SuccessfulRescale 20m horizontal-pod-autoscaler New size: 2; reason: external metric s0-kafka-preorder(&LabelSelector{MatchLabels:map[string]string{scaledobject.keda.sh/name: kafka-scaledobject,},MatchExpressions:[]LabelSelectorRequirement{},}) above target Normal SuccessfulRescale 14m horizontal-pod-autoscaler New size: 1; reason: All metrics below target arjunpandey$`
Actual Behavior
It should be
The Pod count should have scaled up from 1 to 2.
Steps to Reproduce the Problem
Mentioned in the above comment. Any help highly appreciated :-)
It works for the scenarios:
But not for, lagThreshold: "10" and actual lag > 19 ( In my case I have tested with 23, 33, 47 it didn't work). I have also test the latest version ( 2.8.1) of Keda but the same problem.
Specifications
Big thanks for developing such a wonderful most awaited product :-)