If this is a bug report, please fill out the following:
Version of Ruby: 2.6.x, 2.7.x, possibly 2.x
Version of Kafka: Confluent Cloud Kafka, based on this we are on 3.2.
Version of ruby-kafka: 1.4.0
Please verify that the problem you're seeing hasn't been fixed by the current master of ruby-kafka.
Doesn't appear 1.5 introduces any fixes related to this, afaict.
Steps to reproduce
Broker restart seemed to cause this issue.
We didn't see this problem with a previous vendor AND also not using SASL creds.
Now we see this issue and we are using SASL PLAIN with TLS enabled.
We have been unable to reproduce locally; and do not want to attempt to reproduce in the live environment at this time.
Expected outcome
Client shuts down cleanly, (basically, letting the pod restart the process on it's own)
Alternatively, Client recovers cleanly and reconnects to broker.
Actual outcome
Brokers rolling restarted (per their upgrade policy) caused this sequence of events in all of our ruby-kafka clients:
Here is a sequence of events:
Error committing offsets: Kafka::NotCoordinatorForGroup
Error sending heartbeat: Kafka::RebalanceInProgress.
Failed to fetch from events/1: Kafka::NotLeaderForPartition
Error committing offsets: Kafka::NotCoordinatorForGroup
Error committing offsets: Connection error EOFError: end of file reached
ruby-kafka-1.4.0/lib/kafka/ssl_socket_with_timeout.rb:69:in connect_nonblock': SSL_connect SYSCALL returned=5 errno=0 state=SSLv3/TLS write client hello (OpenSSL::SSL::SSLError)
(the ruby consumer must have attempted to shutdown after some retries)
undefined method join' for nil:NilClass ...
Though we are not entirely familiar with the ramifications of this change. (or even how @thread could have been nil to begin with, in the fetcher.) We think it has something to do with using SASL with TLS (ssl cert from system).
In any event, perhaps skipping thread join when thread doesn't exist can let the process finish shutting down.
If this is a bug report, please fill out the following:
Please verify that the problem you're seeing hasn't been fixed by the current
master
of ruby-kafka.Doesn't appear 1.5 introduces any fixes related to this, afaict.
Steps to reproduce
Broker restart seemed to cause this issue.
We didn't see this problem with a previous vendor AND also not using SASL creds.
Now we see this issue and we are using SASL PLAIN with TLS enabled.
We have been unable to reproduce locally; and do not want to attempt to reproduce in the live environment at this time.
Expected outcome
Client shuts down cleanly, (basically, letting the pod restart the process on it's own) Alternatively, Client recovers cleanly and reconnects to broker.
Actual outcome
Brokers rolling restarted (per their upgrade policy) caused this sequence of events in all of our ruby-kafka clients:
We think a simple fix could be this: https://github.com/zendesk/ruby-kafka/pull/959
Though we are not entirely familiar with the ramifications of this change. (or even how @thread could have been
nil
to begin with, in the fetcher.) We think it has something to do with using SASL with TLS (ssl cert from system).In any event, perhaps skipping thread join when thread doesn't exist can let the process finish shutting down.