jborean93 / smbprotocol

Python SMBv2 and v3 Client
MIT License
322 stars 73 forks source link

smbprotocol.transport.Tcp._recv can end up in a dead lock #263

Open adiroiban opened 10 months ago

adiroiban commented 10 months ago

Hi. Thanks again for working on smbprotocol.

I have recently start seeing on my Windows tests the overlap of smbprotocol.tranport.Tcp.close() with smbprotocol.tranport.Tcp.recv() ...the issue is that recv() will only free the Tcp._close_lock after a read timeout.

I am trying to see how this can be fixed.

One idea is to lock only after the select() call.

def _recv(self, length, timeout):
    buffer = bytearray(length)
    offset = 0
    while offset < length:
        read_len = length - offset
        log.debug(f"Socket recv({read_len}) (total {length})")

        start_time = timeit.default_timer()

-        with self._close_lock:
-            if not self.connected:
-                # The socket was closed - need the no cover to avoid CI failing on race condition differences
-                return None, timeout  # pragma: no cover
-
-           read = select.select([self._sock], [], [], max(timeout, 1))[0]
-            timeout = timeout - (timeit.default_timer() - start_time)
-            if not read:
-                log.debug("Socket recv(%s) timed out")
-                raise TimeoutError()

+       read = select.select([self._sock], [], [], max(timeout, 1))[0]
+       timeout = timeout - (timeit.default_timer() - start_time)
+       if not read:
+            log.debug("Socket recv(%s) timed out")
+            raise TimeoutError()
+
+        with self._close_lock:
+            if not self.connected:
+                # The socket was closed - need the no cover to avoid CI failing on race condition differences
+                return None, timeout  # pragma: no cover
            try:
                b_data = self._sock.recv(read_len)
            except OSError as e:
                # Windows will raise this error if the socket has been shutdown, Linux return returns an empty byte
                # string so we just replicate that.
                if e.errno not in [errno.ESHUTDOWN, errno.ECONNRESET]:
                    # Avoid collecting coverage here to avoid CI failing due to race condition differences
                    raise  # pragma: no cover
                b_data = b""

Originally posted by @adiroiban in https://github.com/jborean93/smbprotocol/issues/80#issuecomment-1824970873

adiroiban commented 10 months ago

I have created the issue as a reminder. I will see how I can write an automated tests for this.

jborean93 commented 9 months ago

I've been meaning to revamp the whole transport mechanism to work in a more reliable and plugin like way. This would enable transports like Quic and use epoll on platforms that support this. It's been a while since I last played around with it but the WIP code is in the socket-refactor branch.

adiroiban commented 9 months ago

Removing the need for threads would be nice. I would love to see an async API, as I am using this code from Twisted with some wrappers to work around the threads.

I think that going with asyncio is an excelent first step...and later it should be easy for people to refactor the code for uvloop or twisted


At the same time, I think that it is worth trying to write an automated test for this case. It might help with the future refactoring.

I have observed this issue with my automated tests.

The issue is that we have this close,

with self._close_lock:
    b_data = self._sock.recv(read_len)

Which blocks on self._sock.recv(read_len)

It also holds the close lock. If you try to close, you need to wait for the recv timeout.


I have this patch applied to my smbprotocol version, so I would be happy send a PR to fix it , so that I can use vanilla smbprotocol.

I will try to find some time to work on this

jborean93 commented 9 months ago

Removing the need for threads would be nice. I would love to see an async API, as I am using this code from Twisted with some wrappers to work around the threads.

That's another use case, unfortunately async is a lot more difficult as you would essentially need to duplicate the rest of the APIs to expose an async one :(

I have this patch applied to my smbprotocol version, so I would be happy send a PR to fix it , so that I can use vanilla smbprotocol.

If you are coming across a deadlock and have a fix then yes please a PR would be great.