Closed yuvital-render closed 2 years ago
This is my post. Posted it by mistake from another account.
Hi @yaronlevi, connections are centrally managed, and there is a lock around it since authentication can require multiple requests. From your picture, it seems that connecting to a particular broker took a while, which caused the timeout. You can try to make the connection timeout a bit more forgiven, but this is an error that you ultimately have to handle. The lock timeout follows this formula:
const lockTimeout = 2 * this.connection.connectionTimeout + this.authenticationTimeout
The default values are optimized for quick connections and smaller payloads, so you might have to adjust them for your setup. From what I've seen, you need higher connection and auth timeouts when using Kafka clusters outside of your cloud provider or network.
@tulios very interesting!
So I've increased the timeout on both parameters to 5000:
I'll keep you updated if it works. Thanks
Did this work?
@femi-paystack Yes! seems to work. The Last error was 2 weeks ago, which is the exact time at which I've made the change:
Closing since this seems to have just been an issue with connection timeouts. The next release will increase the default authentication timeout.
@tulios very interesting!
So I've increased the timeout on both parameters to 5000:
I'll keep you updated if it works. Thanks
It works well! Thanks a lot, @yaronlevi.
In kafkajs 2.2.3, Try adding connectionTimeout: 5000
and authenticationTimeout: 5000
settings.
Kafka client pod's error message for timeout:
KafkaJSLockTimeout: Timeout while acquiring lock (1865 waiting locks): "connect to broker <REDACTED>:9094"
The error KafkaJSLockTimeout: Timeout while acquiring lock (1865 waiting locks)
indicates that the KafkaJS client took too long to acquire a lock needed to perform a specific operation, resulting in a timeout.
lockTimeout calculation formula in kafkajs 2.2.3:
Solution:
constructor() {
const kafka = new Kafka({
clientId: kafkaClientId,
brokers: kafkaBrokers,
ssl: kafkaSSL,
logLevel: 2,
connectionTimeout: 5000, // Add newly
authenticationTimeout: 5000, // Add newly
});
// ... redacted ...
}
Increase the timeout settings for both the Kafka client and broker to allow more time.
Hi (-:
kafkajs@1.15.0
We are using KafkaJS, it's solid and works great, but sometimes it doesn't. It seems like, sometimes, in small traffic spikes (but not always) we get those errors thrown out when trying to send messages (produce messages):
"KafkaJSLockTimeout Timeout while acquiring lock (15 waiting locks)"
When this happens it's about 30 seconds, and many messages are dropped and never get persisted:
The whole errors looks like this:
We use Heroku, with 3 Large Dynos (In Private Space). We use Confluent fully managed Kafka on a type:standard production cluster. We are using Node with Express.
This is the code that connects to the cluster. Runs once at the start of each Dyno. Only after the connect() we start listening to requests:
This is the configuration of the producer:
This is the code that sends the messages:
The error in the top is being caught in this catch block.
Those errors here happens on this topic in Confluent:
This topic has 2 partitions. Just raising this up because maybe it's related? The default partitions by Confluent is 6. We chose 2.
So it these errors related to a high burst of traffic, maybe increasing the number of partitions? And if not, could it be the code that connects to cluster? Maybe add some additional retry or other configuration we are missing here?
Any help on the issue and directions for further investigation would be much appreciated! 🙏