msoulier / tftpy

Pure Python TFTP library
http://tftpy.sf.net
MIT License
172 stars 120 forks source link

Fix race condition when waiting for ACK #133

Closed marcin-le closed 2 years ago

marcin-le commented 2 years ago

TFTPy is designed in a way that socket timeout is used to calculate timeout when waiting for packet. During that time another unexpected packet may arrive. After that the socket operation is restarted and timeout is calculated from start. This might be a problem because both sides have timeout and these timeout may be different or one host may be significantly faster that another. In such situation the timeout will be never triggered as another host will always retransmit his packet faster.

For most cases it does not matter because TFTP is always responding to packet sent and transmission may continue. The only one exception is no response to duplicate ACK. It is necessary to prevent Sorcerer's Apprentice Syndrome.

This patch introduces additional exception TftpTimeoutExpectACK raised when reaching timeout during waiting on ACK of current block but receiving duplicate ACK of previous block.

msoulier commented 2 years ago

I don't see an update to the test suite. Can we simulate this in the test suite?

marcin-le commented 2 years ago

Full test of concurrent timeouts might be hard to create. For test purposes I have modified source to drop 0.1 percent of packets, but of course this cannot go into production (see code below). I think I can emulate the flow of duplicate ACKs in similar way like the test: testServerNoOptions is working. I am afraid I might not be able to emulate how the exception propagates to listen and start methods. Will such unit test for TftpStateExpectACK be enough?

        dat = TftpPacketDAT()
        dat.data = buffer
        dat.blocknumber = blocknumber
        self.context.metrics.bytes += len(dat.data)
        log.debug("Sending DAT packet %d", dat.blocknumber)
-        self.context.sock.sendto(dat.encode().buffer,
-                                 (self.context.host, self.context.tidport))
+        dat.encode()
+        if randrange(1000) > 0:
+            self.context.sock.sendto(dat.buffer, (self.context.host, self.context.tidport))
        self.context.metrics.last_dat_time = time.time()

        log.info("Sending ack to block %d" % blocknumber)
        ackpkt = TftpPacketACK()
        ackpkt.blocknumber = blocknumber
-        self.context.sock.sendto(ackpkt.encode().buffer,
-                                 (self.context.host,
-                                  self.context.tidport))
+        ackpkt.encode()
+        if randrange(1000) > 0:
+            self.context.sock.sendto(ackpkt.buffer, (self.context.host, self.context.tidport))
marcin-le commented 2 years ago

One unit test for duplicate ACK added

msoulier commented 2 years ago

Hmm. Let me look at this and see if I can put in a test hook.

msoulier commented 2 years ago

Ok, I took your idea and put it in as a test hook with a new test case. Right now that test seems to be chewing a lot of resources on my laptop. I suspect log analysis will show that you were right. I will confirm. Then merging your fix should allow this test to pass. Feel free to pull and try it yourself.