Closed APuertaSales closed 4 years ago
@APuertaSales Could you please format the ScaledObject
and HPA
posted above, so it is more readible?
eg. https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#fenced-code-blocks
Of course, sorry!
@ppatierno might have an idea?
How many partitions have your topic?
Keda log only shows 1 partition in the example given: "Group int.absolutegrounds.helper.processor.datapipeline has a lag of 7931 for topic INT-AG_TASK_SOURCE_DP and partition 2\n"
Kafka Tool shows 5 partitions:
And so does Kafka Manager:
well that log is not clear because it's in a for loop which breaks as soon as there is a lag higher than the lagThreshold. So it just says that you have lag 7931 on partition 2 but maybe you could have other lags on the other partitions (which is not your case from the kafka tool output). I think we should change this log somehow.
Regarding the value showed by HPA 500/500 I would expect more 2500 due to this snippet code:
// don't scale out beyond the number of partitions
if (totalLag / s.metadata.lagThreshold) > int64(len(partitions)) {
totalLag = int64(len(partitions)) * s.metadata.lagThreshold
}
metric := external_metrics.ExternalMetricValue{
MetricName: metricName,
Value: *resource.NewQuantity(int64(totalLag), resource.DecimalSI),
Timestamp: metav1.Now(),
}
it drops the totalLag
to the number of partitions (otherwise more consumers would be idle). So I would expect totalLag = 5 * 500
and this value is passed as external metric value for the HPA.
Strange it reports 500/500 ...
Anyway, you have the correct number of consumer instances which is 5 (because of your maxReplicaCount but even due to the 5 partitions).
Thanks @ppatierno, this explains everything. Sorry I was expecting exact values but as you say it has no sense to run idle consumers. Do you have another explanation to the currentReplicas and desiredReplicas info that is not aligned with the real number of replicas of the deployment managed? This was the info about the HPA:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
keda-hpa-cancellation-helper-processors Deployment/cancellation-helper-processors 0/500 (avg) 1 5 4 3h42m
And this was the list of PODs, only 1 deployed, not 4:
NAME READY STATUS RESTARTS AGE
cancellation-helper-processors-7f56f97c84-b6h2h 1/1 Running 0 7d
The HPA was true, the target was correct, the number of replicas diminished to the minpods value, but the number of replicas shown was the one reached when the lag was over the target of the HPA. It was not updated to match the real number of replicas until quite a long time. Thanks for your help!!!!!!!
@APuertaSales tbh keda should be not involved on updating the hpa values, it's all about Kubernetes. I have no clue right now.
Thanks @ppatierno, You are right, but is strange because other HPAs we have configured do not show this misbehavior. We will continue using your application while monitoring the results. It is far easier to work with your solution than with our previous one. Regards, Alberto.
What happened: I configured a scaledobject for kafka and it is not updating the HPA info. This is the scaledobject configuration:
Adding debug log level shows that the consumer has a certain lag:
{"level":"debug","ts":1581350124.0215647,"logger":"kafka_scaler","msg":"Group int.absolutegrounds.helper.processor.datapipeline has a lag of 7931 for topic INT-AG_TASK_SOURCE_DP and partition 2\n"}
But the HPA created shows this information:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE keda-hpa-absolutegrounds-helper-processors Deployment/absolutegrounds-helper-processors 500/500 (avg) 1 5 5 3h35m
With this info:
What you expected to happen: Something like 7931/500 (avg) in the HPA. Now it says that currentValue is 0 but currentAverage Value is 500 ¿¿?¿?¿?¿ and this for a long time.
Anything else we need to know?: Noticed that the currentReplicas and desiredReplicas info is not updated:
Environment:
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:00:57Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
Keda version 1.2.0