Hi there!
First of all, thank you for such a great recipe as TreeCache. We recently tried to use it, but started experiencing ussies upon reconnection with large trees. After reconnecting every request fails with ConnectionLoss error. I managed to capture a stacktrace:
File "/venv/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
result = self._run(*self.args, **self.kwargs)
File "/venv/lib/python2.7/site-packages/kazoo/protocol/connection.py", line 473, in zk_loop
if retry(self._connect_loop, retry) is STOP_CONNECTING:
File "/venv/lib/python2.7/site-packages/kazoo/retry.py", line 123, in __call__
return func(*args, **kwargs)
File "/venv/lib/python2.7/site-packages/kazoo/protocol/connection.py", line 512, in _connect_loop
status = self._connect_attempt(host, port, retry)
File "/venv/lib/python2.7/site-packages/kazoo/protocol/connection.py", line 539, in _connect_attempt
read_timeout, connect_timeout = self._connect(host, port)
File "/venv/lib/python2.7/site-packages/kazoo/protocol/connection.py", line 646, in _connect
client._session_callback(KeeperState.CONNECTED)
File "/venv/lib/python2.7/site-packages/kazoo/client.py", line 467, in _session_callback
self._make_state_change(KazooState.CONNECTED)
File "/venv/lib/python2.7/site-packages/kazoo/client.py", line 440, in _make_state_change
remove = listener(state)
File "/venv/lib/python2.7/site-packages/kazoo/recipe/cache.py", line 179, in _session_watcher
self._root.on_reconnected()
File "/venv/lib/python2.7/site-packages/kazoo/recipe/cache.py", line 217, in on_reconnected
child.on_reconnected()
File "/venv/lib/python2.7/site-packages/kazoo/recipe/cache.py", line 215, in on_reconnected
self._refresh()
File "/venv/lib/python2.7/site-packages/kazoo/recipe/cache.py", line 247, in _refresh
self._refresh_data()
File "/venv/lib/python2.7/site-packages/kazoo/recipe/cache.py", line 251, in _refresh_data
self._call_client('get', self._path)
File "/venv/lib/python2.7/site-packages/kazoo/recipe/cache.py", line 264, in _call_client
method(path, *args, **kwargs).rawlink(callback)
File "/venv/lib/python2.7/site-packages/kazoo/client.py", line 1065, in get_async
async_result)
File "/venv/lib/python2.7/site-packages/kazoo/client.py", line 547, in _call
write_sock.send(b'\0')
File "/venv/lib/python2.7/site-packages/gevent/socket.py", line 443, in send
self._wait(self._write_event)
File "/venv/lib/python2.7/site-packages/gevent/socket.py", line 300, in _wait
self.hub.wait(watcher)
File "/venv/lib/python2.7/site-packages/gevent/hub.py", line 348, in wait
result = waiter.get()
File "/venv/lib/python2.7/site-packages/gevent/hub.py", line 575, in get
return self.hub.switch()
File "/venv/lib/python2.7/site-packages/gevent/hub.py", line 338, in switch
return greenlet.switch(self)
I think here what happens:
kazoo calls session callback
tree calls self._root.on_reconnected()
which in turn issues get and get_children calls in kazoo
these calls are put in queue and '\0' is written into "wake up" socket
socket is "blocking" (it appears as blocking in gevent) and whole thread (greenlet) is blocked after several requests
because no one is reading from it, because reading is normally performed in connection thread (same stack) right after session callback is performed
I see two issues here:
performing heavy lifting in session callback stack (recipe issue)
It is highly not recommended. I patched this locally with self._root.on_reconnected => self._in_background(self._root.on_reconnected)
using blocking socket for wake up
Usually non-blocking socket is used and thread after woke up drains queue and socket. There can be ussies with long sending batches and other stuff. Not sure if it is worth fixing right now.
Hi there! First of all, thank you for such a great recipe as
TreeCache
. We recently tried to use it, but started experiencing ussies upon reconnection with large trees. After reconnecting every request fails with ConnectionLoss error. I managed to capture a stacktrace:I think here what happens:
self._root.on_reconnected()
get
andget_children
calls in kazooI see two issues here:
self._root.on_reconnected
=>self._in_background(self._root.on_reconnected)