zendesk / ruby-kafka

A Ruby client library for Apache Kafka
http://www.rubydoc.info/gems/ruby-kafka
Apache License 2.0
1.27k stars 338 forks source link

Infinite loop in Cluster.get_coordinator if CoordinatorNotAvailable occurs #957

Closed rammpeter closed 1 year ago

rammpeter commented 2 years ago

If this is a bug report, please fill out the following:

Please verify that the problem you're seeing hasn't been fixed by the current master of ruby-kafka.

Steps to reproduce

not so easy to reproduce because raise of CoordinatorNotAvailable is the precondition. In this case possibly caused by a crashed cluster node, but failover in cluster seemed to function correctly. The crashed cluster node was the first entry in configured nodes list.

Currently the problem occurs during call of Producer.init_transactions. If the method Cluster.get_coordinator raises the exception CoordinatorNotAvailable then retries are executed without any limit. So in the result the application hangs, but a global reconnect to Kafka could have fixed the problem if call of init_transactions would return with error/exception.

cluster.rb: line 504


        rescue CoordinatorNotAvailable
          @logger.debug "Coordinator not available; retrying in 1s"
          sleep 1
          retry
``

###### Expected outcome
There should be a limited amount of retries and a raise of exception to caller if the problem persists after multiple retries.

###### Actual outcome
The call of  Producer.init_transactions hangs infinite if CoordinatorNotAvailable is still present.
github-actions[bot] commented 1 year ago

Issue has been marked as stale due to a lack of activity.