When I set excludePersistentLag flag to true, it does not excludes persistent lags for the partition with invalid offset (that is -1).
Expected Behavior
If the lags for these partitions with invalid offset is persistent, it should be ignored in the custom metrics.
Actual Behavior
In my set up, this resulted in no of kafka consumer pods to set to max.
Steps to Reproduce the Problem
In the setup, you should have some partitions that has not been consumed till now. (Invalid offset)
Old messages are not in kafka buffer as they are already expired.
Use kafka scalar to scale the consumer deployment.
Logs from KEDA operator
example
KEDA Version
2.14.0
Kubernetes Version
None
Platform
Amazon Web Services
Scaler Details
Kafka
Anything else?
Hi
Refer to #5274
The issue got reproduced in my system again. I did some analysis by adding some debug logging in the kafka scaler and found that we have a topic in the consumer group where the lag is like below. The below is output from kafka-consumer-groups.sh...
GROUP
TOPIC
PARTITION
CURRENT-OFFSET
LOG-END-OFFSET
LAG
CONSUMER-ID
HOST
CLIENT-ID
group1
topic1
0
80
80
0
rdkafka-xxx-xxx-xxx-xxx-xxx
/xx.xx.xx.xx
rdkafka
group1
topic1
1
-
60
-
rdkafka-xxx-xxx-xxx-xxx-xxy
/100.100.17.69
rdkafka
group1
topic1
2
-
96
-
rdkafka-xxx-xxx-xxx-xxx-xxz
/100.100.14.23
rdkafka
Because of this the lag comes as 156.
Here in our case, the topic1 get messages, once in 2-3 months or more. So if we create a new consumer (new group), the lag will be - for some time for all partitions and then suddenly we get one message and lag will be 0 in one partition and will be '-' in other partitions and then the lag for other partitions will start getting counted.
Can you please check?
Report
When I set excludePersistentLag flag to true, it does not excludes persistent lags for the partition with invalid offset (that is -1).
Expected Behavior
If the lags for these partitions with invalid offset is persistent, it should be ignored in the custom metrics.
Actual Behavior
In my set up, this resulted in no of kafka consumer pods to set to max.
Steps to Reproduce the Problem
In the setup, you should have some partitions that has not been consumed till now. (Invalid offset) Old messages are not in kafka buffer as they are already expired.
Use kafka scalar to scale the consumer deployment.
Logs from KEDA operator
KEDA Version
2.14.0
Kubernetes Version
None
Platform
Amazon Web Services
Scaler Details
Kafka
Anything else?
Hi Refer to #5274
The issue got reproduced in my system again. I did some analysis by adding some debug logging in the kafka scaler and found that we have a topic in the consumer group where the lag is like below. The below is output from kafka-consumer-groups.sh...
Because of this the lag comes as 156. Here in our case, the topic1 get messages, once in 2-3 months or more. So if we create a new consumer (new group), the lag will be - for some time for all partitions and then suddenly we get one message and lag will be 0 in one partition and will be '-' in other partitions and then the lag for other partitions will start getting counted. Can you please check?