python-zk / kazoo

Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.
https://kazoo.readthedocs.io
Apache License 2.0
1.3k stars 386 forks source link

Client fails to re-connect to Zookeeper server when connection failure was originally due to address resolution failure #622

Closed hrishikeshk closed 3 years ago

hrishikeshk commented 4 years ago

I am using DataWatcher recipe to watch a single Node. The client as well as the zookeeper is running within docker containers. If Zookeeper server is not available for some time, and the network error seen is as below -

WARNING:kazoo.client:Connection dropped: socket connection error: None Connection dropped: socket connection error: None WARNING:kazoo.client:Cannot resolve : [Errno -5] No address associated with hostname INFO:kazoo.client:Zookeeper session closed, state: CLOSED

After this point, even if Zookeeper server is available again and a client.restart() is attempted, a re-connection does not work.

Expected Behavior

On the Zookeeper server becoming available, the client should re-connect, particularly when using the DataWatcher recipe.

Actual Behavior

Even if using the DataWatcher recipe, the client does not connect and the only option is to restart application.

Snippet to Reproduce the Problem

def addWatch(zk): @zk.DataWatch("/log/level") def watch_node(data, stat, event): logger = logging.getLogger() if event == None or (event.type != EventType.CREATED and event.type != EventType.CHANGED) or stat.data_length <= 0: return True data.decode("utf-8") return True

def conn_listener(state): logger = logging.getLogger() if state == KazooState.LOST: logger.fatal('Failed connecting to Zookeeper: Connection state LOST') zk.restart() elif state == KazooState.SUSPENDED: logger.fatal('Connection suspended to Zookeeper...') else: logger.fatal('Re-connected to Zookeeper...')

def connectHelper(connStr): global zk zk = KazooClient(hosts=connStr) zk.start() addWatch(zk) zk.add_listener(conn_listener)

def connectZk(): zkHost = 'zk' zkPort = 2181 connectHelper(zkHost + ':' + zkPort)

Logs with logging in DEBUG mode

WARNING:kazoo.client:Connection dropped: socket connection error: None Connection dropped: socket connection error: None WARNING:kazoo.client:Cannot resolve : [Errno -5] No address associated with hostname INFO:kazoo.client:Zookeeper session closed, state: CLOSED

Specifications

hrishikeshk commented 4 years ago

Another piece of information that might help - This provlem is deeply related to running in docker contaners and name resolution exceptions when some remote container is not alive. So If I connect to remote service using an IP address instead of hostname, Kazoo client is able to successfully recover and re-connect, along with registering the watches again.

StephenSorriaux commented 4 years ago

Hello,

This seems due to the STOP_CONNECTING state we are returning in case it is not possible to resolve any of the hostname given: https://github.com/python-zk/kazoo/blob/cbdc4749edb5879099c1f9b832c055d9eeb52dea/kazoo/protocol/connection.py#L545-L546 Raising the ForceRetryError() exception instead should solve the problem (or at least trigger the expected behavior). I don't see any cons changing this, so PR is welcomed if you have some time on your side.

qmorek commented 3 years ago

It is already merged in https://github.com/python-zk/kazoo/pull/631, so I think it could be closed.

StephenSorriaux commented 3 years ago

@qmorek Good point, thank you.