mjs / imapclient

An easy-to-use, Pythonic and complete IMAP client library
https://imapclient.readthedocs.io/
Other
520 stars 86 forks source link

idle_check can see incomplete lines #519

Open mjs opened 1 year ago

mjs commented 1 year ago

From #464:

I've found something else that may be related: I connected with tls=FALSE and ran a network capture to see what was actually on the wire. I'm seeing these errors now:

Traceback (most recent call last):
  File "./debug-protocol-error.py", line 23, in <module>
    responses = server.idle_check(timeout=30)
  File "/home/david/.local/lib/python3.8/site-packages/imapclient/imapclient.py", line 175, in wrapper
    return func(client, *args, **kwargs)
  File "/home/david/.local/lib/python3.8/site-packages/imapclient/imapclient.py", line 940, in idle_check
    line = self._imap._get_line()
  File "/usr/lib/python3.8/imaplib.py", line 1164, in _get_line
    raise self.abort('socket error: unterminated line: %r' % line)
imaplib.IMAP4.abort: socket error: unterminated line: b'* 91 FETCH (FLAGS (\\Deleted \\Seen \\Recent)'

In the network capture, there are two packets which are re-assembled to form the message seen in the error. The only thing in the second packet is \r\n. I have not looked at the code here but is it possible that it's attempting to process a packet that hasn't been fully assembled, which presumably would be in imaplib judging from the trace?

mjs commented 1 year ago

The issue is likely that IMAPClient is using imaplib.IMAP4's _get_line while the socket is in non-blocking mode which it isn't really designed for. IMAPClient will like need to call IMAP4.readline() and do end of line handling itself instead

axoroll7 commented 1 year ago

readline should return a complete line. I am afraid there may be others unknown side-effects. imaplib2 uses its own line reader for this.

Since there is no concurrency, why using non-blocking sockets instead of blocking sockets with timeout ? I don't know why it was originally written this way. Maybe it was necessary in older python versions. Maybe concurrency was the next step. But can this be reconsidered today ?

EDIT 1: linguistic errors EDIT 2: Sorry for my ignorance

mjs commented 1 year ago

My memory is fuzzy but I'm pretty sure it was done like this because timeouts weren't available with older Python versions. I'm definitely open to rethinking the whole approach to IDLE now that we are only concerned with more modern Python versions.

axoroll7 commented 1 year ago

A draft :

_idle_raw_buffer:bytearray
_idle_lines_buffer:List[bytes]

def idle_check(self, timeout=None):

    sock = self.socket()

    end = 0.0
    if timeout is not None:
        end = time.monotonic() + float(timeout)

    raw_buffer = self._idle_raw_buffer
    lines_buffer = self._idle_lines_buffer

    selector = selectors.DefaultSelector()

    # make the socket non-blocking so the timeout can be
    # implemented for this call
    sock.settimeout(None)
    sock.setblocking(0)
    selector.register(sock, selectors.EVENT_READ)

    try:
        socket_error = None
        resps = []
        while not lines_buffer:

            events = selector.select(timeout)
            if not events:
                break

            i = 0
            while not lines_buffer:
                try:
                    buf = sock.recv(8096)
                except BlockingIOError:
                    if i == 0:
                        # the socket should be readable, but it is not
                        # always the case. To prevent a tight loop,
                        # we sleep for one second.
                        time.sleep(1)
                    # the socket was fully drained
                    break
                except socket.error as ex:
                    socket_error = ex
                    break
                if not buf:
                    # EOF
                    break
                while True:
                    newline_pos = buf.find(b"\n")
                    if newline_pos > -1:
                        l_buf = buf[:newline_pos+1] # +1 to get \n
                        r_buf = buf[newline_pos+1:] # +1 to remove \n
                        buf = r_buf
                        raw_buffer += l_buf
                        line = bytes(raw_buffer)
                        lines_buffer.append(line)
                        raw_buffer.clear()
                    else:
                        break
                raw_buffer += buf

                i += 1
                # imaplib._MAXLINE / 8k ~= 125
                if i > 125:
                    i = 0
                    # if the socket is not yet drained after 125*8k bytes,
                    # the elapsed time is checked
                    if timeout is not None and end - time.monotonic() < 0.0:
                        break

            if socket_error:
                break

            if timeout is not None:
                timeout = end - time.monotonic()
                if timeout < 0.0:
                    break

        while lines_buffer:
            line = lines_buffer.pop(0)
            if len(line)>imaplib._MAXLINE:
                raise imaplib.IMAP4.error(
                    "got more than %d bytes" % imaplib._MAXLINE)
            if not line.endswith(b"\r\n"):
                raise imaplib.IMAP4.abort(
                    "socket error: unterminated line: %r" % line)
            line = line[:-2]
            resps.append(_parse_untagged_response(line))
        return resps
    finally:
        selector.unregister(sock)
        sock.setblocking(1)
        self._set_read_timeout()

def idle_done(self):
    logger.debug("< DONE")
    self._imap.send(b"DONE\r\n")

    sock = self.socket()
    raw_buffer = self._idle_raw_buffer

    remaining_line = None
    if raw_buffer and raw_buffer[-1:] == b"\r":
        # if the last character is a carriage return, the line end may
        # be truncated
        next_char = sock.read(1)
        if next_char == b"\n":
            remaining_line = bytes(raw_buffer[:-1])
            raw_buffer.clear()
        else:
            raw_buffer += next_char
    # if ``raw_buffer`` is not empty, the next line is truncated.
    # By concatenating the ``raw_buffer`` and the next line, the full line is
    # recovered.
    if raw_buffer:
        raw_buffer += sock._imap._get_line()
        remaining_line = bytes(raw_buffer)
        raw_buffer.clear()

    returned_value = self._consume_until_tagged_response(self._idle_tag, "IDLE")
    if remaining_line is not None:
        returned_value[1].insert(0,_parse_untagged_response(remaining_line))
    return returned_value

Not tested.

Non-blocking socket utilized, because I was afraid to use blocking socket and being confronted with something new.

Custom line reader, seems ok for me.

Removed poll and select to use selectors, easier to maintain.

Will edit this draft often.

A socket error should be raised ?

axoroll7 commented 1 year ago

@mjs

Should I continue in this direction ? Should I do idle_done too, or not, because blocking sockets are better, because they are easier ? Have you already started on your side, and should I stop ? I am open to suggestions.

axoroll7 commented 1 year ago

https://github.com/python/cpython/issues/51571

I don't have time to look it up, but it seems a readline can discard data. The issue is not closed. A non-blocking socket seems safer.

mjs commented 1 year ago

Thanks for digging into this @axoroll7 . I don't have anything significant in progress for this so if you want to continue with solving this please go ahead.

I do wonder if we can get away from using non-blocking sockets and use a custom line reading function with a socket timeout. We could side step the issues with readline by implementing it ourselves.

mjs commented 1 year ago

Heads up: I've got a rough prototype working which simply uses timeouts with custom line reading and buffering. I'll try to get it tidied up for review over the weekend.