Open ghost opened 1 year ago
I found something interesting! I had this config in the broker:
replica.selector.class: org.apache.kafka.common.replica.RackAwareReplicaSelector
but the clients were not specifying a rackId
. Deleting this config from the broker immediately fixed the issue I described above.
Additional info for future reads: About config How to implement it with kafkajs
Describe the bug I have a simple consumer with default configs. I am using sasl (scram-sha-512) with ssl. The consumer consumers as expected, but everyone once in a while, it stops consuming and the logs show the following error:
Along with above error, I also sometimes see the following log:
Each time this happens, I see a timeout error related to the TLS connection. The consumer recovers automatically after a couple minutes usually, but during that time no messages are being processed. This can be seen from the below graph as well:
I checked the kafka broker during the same period and it looks like consumer fails to send a heartbeat.
To Reproduce
Expected behavior Smooth consumption without drops.
Environment:
Additional context An interesting thing is I carried out the experiment on 2 clusters, and the issue only happens on one of them. The problematic cluster is deployed on kubernetes via Strimzi, and the cluster that works fine is deployed via docker on aws ec2 hosts. Both the clusters are provisioned the same amount of resources, and the broker configs, topic configs etc. are identical.