Based on the logs, it seems like kafkajs is sending messages with API versions only supported by the upgraded brokers to brokers that have not been upgraded.
The cause seems to be that the seed broker, which is created using a random broker from the list of brokers specified, is used to set the API versions. This is used to set the API versions for all connections to the other brokers in the Kafka cluster. Essentially, this means that brokers are presumed to support the same API versions as the seed broker even though there are situations where this might not be the case. The end result is that kafkajs runs into kafka interaction issues with clusters that running brokers with different versions such as when Kafka is being upgraded on a multi node cluster.
The fix seems to be as simple as removing the setting of the version when creating the broker connections which should allow the code to discover the correct versions using a given broker's API. However, we have limited experience with the internals of the library so there might be a better way of doing this.
To Reproduce
If none of the above are possible to provide, please write down the exact steps to reproduce the behavior:
Run a Kafka cluster where the brokers are running different versions of Kafka
Attempt to send() messages from a producer to the Kafka cluster
Notice that a percentage of messages fail to send with the Closed connection error above
Expected behavior
When sending messages to a Kafka cluster where the brokers are running different versions (eg. in the middle of an upgrade) the protocol version used should be the protocol version of the broker that the message is being sent to.
Observed behavior
The message is sent to a broker using the protocol version expected by a random broker which may differ from the one where the message is being sent
Describe the bug
While in the middle of upgrading Kafka on our multi node Kafka cluster we noticed the following logs in our Kafka brokers that had yet to be upgraded:
Correspondingly, we saw an error on our
kafkajs
application that showed a produce had failed after 4 retry attempts:Based on the logs, it seems like
kafkajs
is sending messages with API versions only supported by the upgraded brokers to brokers that have not been upgraded.The cause seems to be that the seed broker, which is created using a random broker from the list of brokers specified, is used to set the API versions. This is used to set the API versions for all connections to the other brokers in the Kafka cluster. Essentially, this means that brokers are presumed to support the same API versions as the seed broker even though there are situations where this might not be the case. The end result is that
kafkajs
runs into kafka interaction issues with clusters that running brokers with different versions such as when Kafka is being upgraded on a multi node cluster.The fix seems to be as simple as removing the setting of the version when creating the broker connections which should allow the code to discover the correct versions using a given broker's API. However, we have limited experience with the internals of the library so there might be a better way of doing this.
To Reproduce
If none of the above are possible to provide, please write down the exact steps to reproduce the behavior:
send()
messages from a producer to the Kafka clusterClosed connection
error aboveExpected behavior When sending messages to a Kafka cluster where the brokers are running different versions (eg. in the middle of an upgrade) the protocol version used should be the protocol version of the broker that the message is being sent to.
Observed behavior The message is sent to a broker using the protocol version expected by a random broker which may differ from the one where the message is being sent
Environment: