python-zk / kazoo

Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.
https://kazoo.readthedocs.io
Apache License 2.0
1.3k stars 386 forks source link

Unhandled exception in connection loop when administrator sets system time in past #722

Closed serhiinikitenko closed 7 months ago

serhiinikitenko commented 1 year ago

This issue is easy to reproduce, follow steps:

  1. Start async zookeeper connection as shown below in the snippet
  2. Change system time few years back

Expected Behavior

Connection restarted

Actual Behavior

Connection stopped

Snippet to Reproduce the Problem

Python 3.8.16 (default, Dec 23 2022, 09:54:58)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-17)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> from kazoo.client import KazooState, KazooClient
>>> from kazoo.retry import KazooRetry
>>>
>>> host = '127.0.0.1:2181'
>>>
>>> zk = KazooClient(hosts=host,
...                               connection_retry=KazooRetry(max_tries=-1, delay=0.1, backoff=1.5, max_jitter=0.5, max_delay=5),
...                               command_retry=KazooRetry(max_tries=1, delay=0.1, backoff=2, max_jitter=0.1, max_delay=1))
>>>
>>> zk.start_async()
<threading.Event object at 0x7f1c86a4cd00>
>>>
>>> zk.get('/')
(b'', ZnodeStat(czxid=0, mzxid=0, ctime=0, mtime=0, version=0, cversion=41, aversion=0, ephemeralOwner=0, dataLength=0, numChildren=7, pzxid=115964117194))
>>>
>>> Unhandled exception in connection loop
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 592, in _connect_attempt
    s = self.handler.select([self._socket, self._read_sock],
  File "/usr/local/lib/python3.8/site-packages/kazoo/handlers/threading.py", line 172, in select
    return selector_select(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/kazoo/handlers/utils.py", line 380, in selector_select
    ready = selector.select(timeout)
  File "/usr/lib64/python3.8/selectors.py", line 468, in select
    fd_event_list = self._selector.poll(timeout, max_ev)
OverflowError: timeout is too large
Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib64/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib64/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 512, in zk_loop
    if retry(self._connect_loop, retry) is STOP_CONNECTING:
  File "/usr/local/lib/python3.8/site-packages/kazoo/retry.py", line 126, in __call__
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 552, in _connect_loop
    status = self._connect_attempt(host, hostip, port, retry)
  File "/usr/local/lib/python3.8/site-packages/kazoo/protocol/connection.py", line 592, in _connect_attempt
    s = self.handler.select([self._socket, self._read_sock],
  File "/usr/local/lib/python3.8/site-packages/kazoo/handlers/threading.py", line 172, in select
    return selector_select(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/kazoo/handlers/utils.py", line 380, in selector_select
    ready = selector.select(timeout)
  File "/usr/lib64/python3.8/selectors.py", line 468, in select
    fd_event_list = self._selector.poll(timeout, max_ev)
OverflowError: timeout is too large

>>> zk.get('/')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/site-packages/kazoo/client.py", line 1165, in get
    return self.get_async(path, watch=watch).get()
  File "/usr/local/lib/python3.8/site-packages/kazoo/handlers/utils.py", line 80, in get
    raise self._exception
kazoo.exceptions.ConnectionClosedError: Connection has been closed
>>>
[vagrant@localhost ~]$ date
Tue May 30 08:24:15 UTC 2023
[vagrant@localhost ~]$ sudo timedatectl set-time '2020-06-01'

Logs with logging in DEBUG mode

COPY/PASTE the result of the snippet here (omit any sensitive information)

Specifications

StephenSorriaux commented 1 year ago

Hello,

Thank you for the issue.

Looks like a very valid bug as we are using time.now() to detect how long we have been waiting in the retry logic. When possible we should use time.monotonic instead to fix this issue.