python-zk / kazoo

Kazoo is a high-level Python library that makes it easier to use Apache Zookeeper.
https://kazoo.readthedocs.io
Apache License 2.0
1.3k stars 387 forks source link

ConnectionLoss when reading large data over SSL #618

Closed jeblair closed 4 years ago

jeblair commented 4 years ago

Using kazoo 2.7.0 or git master, I observe the following error when calling get_children on a node with 8280 children when using TLS:

Traceback (most recent call last):
  File "../zktest.py", line 17, in <module>
    zk.get_children('/nodepool/requests-lock')
  File "/home/corvus/kazoo/kazoo/client.py", line 1219, in get_children
    include_data=include_data).get()
  File "/home/corvus/kazoo/kazoo/handlers/utils.py", line 75, in get
    raise self._exception
kazoo.exceptions.ConnectionLoss

By removing the exception masking in _socket_error_handling I'm able to see the underlying exception:

Traceback (most recent call last):
  File "/home/corvus/kazoo/kazoo/protocol/connection.py", line 605, in _connect_attempt
    response = self._read_socket(read_timeout)
  File "/home/corvus/kazoo/kazoo/protocol/connection.py", line 438, in _read_socket
    header, buffer, offset = self._read_header(read_timeout)
  File "/home/corvus/kazoo/kazoo/protocol/connection.py", line 226, in _read_header
    b = self._read(4, timeout)
  File "/home/corvus/kazoo/kazoo/protocol/connection.py", line 254, in _read
    chunk = self._socket.recv(remaining)
  File "/usr/lib/python3.5/ssl.py", line 914, in recv
    return self.read(buflen)
  File "/usr/lib/python3.5/ssl.py", line 791, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib/python3.5/ssl.py", line 577, in read
    v = self._sslobj.read(len)
ssl.SSLWantReadError: The operation did not complete (read) (_ssl.c:1981)

The OpenSSL documentation indicates that when we receive this, we should call the function again when the underlying socket is readable. The same thing is true for writing as well.

It is possible this bug is related to issues #587 and #580 as well.