Zombie Ethernet Stack - Githubissues

t35tB0t commented 5 years ago

Stack gets into "corrupted buffers" condition with zombie packets stuck in Tx buffers. socket.close(), del socker_server etc. does not clear out zombie packets - they continue to be reitransmitted on the wire. Buffer space is consumed and new socket sessions have limited to zero buffer space. New socket sessions that do successfully write their data into the Tx buffer do get these packets sent out.

When buffers are too zombified to accept data, we get a ENOMEM error on socket writes. From packets inspection on the line, it appears that Stack has zombie packets hogging the Tx buffers. Micropython will attempt (forever) to re-transmit these packets. CTRL-D reset usually recovers the Stack. Occasionally, a hard reset is required. Micropython continues to respond to pings.

MicroPython v1.10-150-gf8f272429 on 2019-03-03; NUCLEO-F429ZI-SD with STM32F429 (built from master branch, added SD card support but breaks either way) (fails within 10 iterations to a STM32F429ZI)

Stress test code used on remote fast client (WindozPC):

def run(iterations):
    for i in range(iterations):
        try:
            s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            s.settimeout(2)
            print('Socket connecting...',end='')
            s.connect(('10.0.0.5',80))
            print('Connected...',end='')
            s.send(b'GET / HTTP/1.0\r\n\r\n')
            print('HTTP request sent...',end='')
            s.send(b'\xa5'*16384)
            print('Dummy filler data sent...',end='')
            s.close()
            print('Socket closed.')
        except Exception as e:
            print("Iteration %d, ERROR: %s" % (i,e)) 
            time.sleep(1)
        finally:
            pass

dpgeorge commented 5 years ago

Thanks for the report. I'm not sure if the following patch will fix it but it's worth a try:

--- a/extmod/modlwip.c
+++ b/extmod/modlwip.c
@@ -579,6 +579,15 @@ STATIC mp_uint_t lwip_tcp_send(lwip_socket_obj_t *socket, const byte *buf, mp_ui

     err_t err = tcp_write(socket->pcb.tcp, buf, write_len, TCP_WRITE_FLAG_COPY);

+    // ERR_MEM means there's currently no space in output buffers, so keep trying to write
+    while (err == ERR_MEM) {
+        err = tcp_output(socket->pcb.tcp);
+        MICROPY_PY_LWIP_EXIT
+        poll_sockets();
+        MICROPY_PY_LWIP_REENTER
+        err = tcp_write(socket->pcb.tcp, buf, write_len, TCP_WRITE_FLAG_COPY);
+    }
+
     // If the output buffer is getting full then send the data to the lower layers
     if (err == ERR_OK && tcp_sndbuf(socket->pcb.tcp) < TCP_SND_BUF / 4) {
         err = tcp_output(socket->pcb.tcp);

t35tB0t commented 5 years ago

Thanks for the patch - will try shortly. Meanwhile... resetting the nic seems to have made the stack far more tolerant. It is not getting stuck. We haven't verified if the zombie packets are gone but handling all socket exceptions other than timeout with a nic.active(0); nic.active(1); nic.config(...) seems to have worked around the issue somewhat. The socket timeout error is handled with a socket.close().

t35tB0t commented 5 years ago

UPDATE: Patch broke it worse - probably getting trapped in send wait. The problem is not that the buffers are getting full but more like something gets broken when a buffer full occurs. Resetting the nic object did not work around the issue. Adding sleep delays to slow down block writes to the socket Tx buffers is what was helping stabilize the stack. Slow enough block writes = no zombies. Too fast and the entire stack can get locked up so bad a hard reset (with button) is required. So, avoiding full Tx buffers seems to dodge the crash. But this is no fix. We've padded server Tx code with sleeps to slow the data output rate to a slower (but more stable) 23Mbps. This down from 77Mbps which works with large multi-MB transfers as long as the client and network is well behaved. Conclusion: Socket packet/buffer bug somewhere in stack.

dpgeorge commented 5 years ago

@t35tB0t for my testing I used iperf3 on a PC combined with a version that runs under MicroPython, which you can find at https://micropython.org/resources/uiperf3.py (also on PyPI). Please see if this works for you without any error. You should be able to get around 80Mbit/sec for TCP send/recv on a local wired network.

Otherwise, please can you provide a full working test case which shows the problem you see? I'll need both the host PC and stm32 script.

t35tB0t commented 5 years ago

We'll try iperf next. Sending the mpy server code and stack testing tools here. Use PC browser to do simple HTTP GETs to confirm mpy server is happy, then... ethwebtest.py (run on fast client/PC to generate benign HTTP GETs and report data transfer stats) socket_test.py (run on fast client/PC to generate broken HTTP connections) web.py and favicon.ico (run on mpy/STM32F429ZI Nucleo board)

We suspect that the client is jamming the stack by sending a lot more data after the HTTP request. This is intentionally modelling a jabbering client behavior and may have exposed an issue with the stack. e.g that the rx buffers are filling up and LWIP is being blocked from seeing the client ACKs. So blocked somewhere in the handshakes - I need to analyze the wireshark captures; might need to figure out how to probe LWIP.

mpy-stresstest.zip

t35tB0t commented 5 years ago

Sample out from a test run follows. First two and last three HTTP requests came from a benign browser. The flailing server responses in the middle are due to the socket_test.cAbort() jamming the stack with 10 HTTP GETs with jabbering after the request. Note that the rate posted here is the rate at which mpy is able to write bytes into its own output socket buffer. The final responses down at the bottom come from the last three (normal) client requests after the server has reset everything up to the nic object. If the ENOMEM exception is not trapped with nic is "reactivated" and "reconnected", the server will not recover and we end up with zombie packets pigging up the mpy stack. We're trying to figure a work-around by flushing socket rx buffers but haven't found a way to perform a clean rx "ignore/flush" after receiving a valid HTTP request. As long as the client requests are kept short, mpy hangs in there.

NOTE: An aggressive NMAP scan will also jam the stack unless we handle exceptions with aggressive socket and nic resets... nmap -sT -sV -p 80 -T4 -O -A -v -v -v -v -n 10.0.0.5

OUTPUT FROM socket-test.cAbort()...

Waiting for client... connection accepted from ('10.0.0.80', 33646) <-- This is PC Browser
HTTP request:  b'GET / HTTP/1.1\r\n'
rate=28.3 Mb/sec for dLen=1048576 bytes
Closing socket...

Waiting for client... connection accepted from ('10.0.0.80', 33647) <-- This is PC Browser
HTTP request:  b'GET / HTTP/1.1\r\n'
rate=27.6 Mb/sec for dLen=1048576 bytes
Closing socket...

Waiting for client... connection accepted from ('10.0.0.80', 33648) <-- This is bad client
HTTP request:  b'GET / HTTP/1.0\r\n'
Failed after sending 6144 payload bytes
socket write failed at errFlag=3
[Errno 110] ETIMEDOUT
Closing socket...

Waiting for client... connection accepted from ('10.0.0.80', 33649)<-- This is bad client
HTTP request:  b'GET / HTTP/1.0\r\n'
Failed after sending 7168 payload bytes
socket write failed at errFlag=3
[Errno 110] ETIMEDOUT
Closing socket...

Waiting for client... connection accepted from ('10.0.0.80', 33650)<-- This is bad client
HTTP request:  b''
BAD REQUEST: Length=0
Closing socket...

Waiting for client... connection accepted from ('10.0.0.80', 33651)<-- This is bad client
HTTP request:  b'GET / HTTP/1.0\r\n'
Failed after sending 7168 payload bytes
socket write failed at errFlag=3
[Errno 110] ETIMEDOUT
Closing socket...

Waiting for client... connection accepted from ('10.0.0.80', 33653)<-- This is bad client
HTTP request:  b'GET / HTTP/1.0\r\n'
Failed after sending 0 payload bytes
socket buffer full at errFlag=3
[Errno 12] ENOMEM
Closing socket...
Restarting nic...
nic connected:  ('10.0.0.5', '255.255.255.0', '10.0.0.254', '10.0.0.254')
Restarting server... Server listening!

Waiting for client... connection accepted from ('10.0.0.80', 33655)<-- This is bad client
HTTP request:  b''
BAD REQUEST: Length=0
Closing socket...

Waiting for client... connection accepted from ('10.0.0.80', 33656) <-- This is bad client
HTTP request:  b'GET / HTTP/1.0\r\n'
Failed after sending 10240 payload bytes
socket write failed at errFlag=3
[Errno 110] ETIMEDOUT
Closing socket...

Waiting for client... connection accepted from ('10.0.0.80', 33661) <-- This is PC Browser
HTTP request:  b'GET / HTTP/1.1\r\n'
Failed after sending 0 payload bytes
socket buffer full at errFlag=3
[Errno 12] ENOMEM
Closing socket...
Restarting nic...
nic connected:  ('10.0.0.5', '255.255.255.0', '10.0.0.254', '10.0.0.254')
Restarting server... Server listening!

Waiting for client... connection accepted from ('10.0.0.80', 33662) <-- This is PC Browser
HTTP request:  b'GET / HTTP/1.1\r\n'
rate=27.7 Mb/sec for dLen=1048576 bytes
Closing socket...

Waiting for client... connection accepted from ('10.0.0.80', 33663) <-- This is PC Browser
HTTP request:  b'GET / HTTP/1.1\r\n'
rate=28.1 Mb/sec for dLen=1048576 bytes
Closing socket...

Waiting for client...

(yay! recovered and survived but had to reset nic to clear out stale/jammed connections)

dpgeorge commented 5 years ago

Thanks for the test scripts. I could reproduce most of the issues in one way or another. Simplest way for me to crash the stack was to just run socket_test.py once which would break the stack so the stm32 board didn't respond anymore to requests (or ping or anything). I was using an STM32F769DISC board.

It was an issue with concurrency locking, and should be fixed by 68a5d6fe7746850ce049b8bf295bfce1382383f3

With this fix socket_test.py no longer crashed the stack, and I could run ethwebtest.py without the 200us delay in web.py to slow down the write (and got close to 80Mbit/sec transfers).

t35tB0t commented 5 years ago

Thanks for the fast patch gen. Unfortunately, that wasn't the root cause. Actually, it behaves worse now, with around six iterations of socket_test.cAbort(), hanging mpy in bigdata at: block += socket.write(buf)

Wireshark shows mpy desperately issuing multiple packets with... [TCP Window Full][TCP Spurious Restransmission][PSH,ACK]. The concurrency locking patch has certainly change the behavior to the fault condition is passing through that code section.

dpgeorge commented 5 years ago

Thanks for the fast patch gen. Unfortunately, that wasn't the root cause.

Ok. But the patch was definitely needed, it certainly fixed one of the issues I was seeing (that cAbort() could stop the stack from working at all).

Do you have a NUCLEO_F767 or STM32F769 Discovery board (they have Ethernet) to test with? That would rule out issues related purely to the F4xx.

t35tB0t commented 5 years ago

Running on a STM32F429ZI Nucleo. But can repeat on a 767 tomorrow. Agreed the patch was needed. Before it was throwing ENOMEM exception which we could trap and do a brutish nic reset. Now that exception is not happening. So we're stuck in a stack broken handshake condition.

TCP full on the receiver makes sense with all the client sends. These are not being read out. And I couldn't derive a nice way to flush all client sends after we get what we expect. This is troubling. As is the absence of a timeout exception after socket_test has managed to jam the stack.

Don't get me wrong here. Your mpy work is amazing and probably the best thing to happen in embedded systems in recent times. We want to help get mpy to a level of robustness where external inputs absolutely can not crash the device. Then it has much broader use.

dpgeorge commented 5 years ago

Thanks for the kind words @t35tB0t ! I agree the network support should be as robust as possible.

I will investigate further the issues.

Some questions: are you using the ERR_MEM patch above? Is your STM32 board connected via ETH to a hub, switch or router? Is it full duplex 100M? (I have full duplex 100M to a router, so it only sees packets for it.)

t35tB0t commented 5 years ago

Ran same scripts against both F429 and F767 Nucleos. The F767 is on its own VLAN routed through a PFSENSE Denverton class router with three switches in the path and zero background chatter. The F429 is on a local switch (single hop) with minimal background chatter. Re-built firmware with fresh pull plus enabled SD card support.

Both are for more resilient with all the updates. socket_test.cAbort() does not trigger the severe ENOMEM exceptions on the F767 but still seeing those on the F429. A nic reset on ENOMEM seems to recover from that socket_test. However... running an aggressive nmap profile can cause hard lock ups on both Nucleos and occasional full crash/restarts on the F429. This is with or without the 200usec write delays in bigdata() loop. I added a random delay (0.2 to 2 sec) in the socket_test.cAbort() loop and that avoids some of the multiple ignored requests while mpy is resetting the nic. ie it crashes with fewer loop iterations than without the random delay.

As tested, when successfully sending with no delays, the 1MB payload is sent into the F429's buffers at 61Mbps and in the F767's at 83Mbps. With the F429, simply rapidly F5 re-loading from a browser can trigger ENOMEM error but didn't lock up the F429 (nic restart recovered stack).

nmap profile used is: nmap -sT -sV -p 80 -T4 -A -v -n 10.0.0.5

t35tB0t commented 5 years ago

UPDATE: Need clean up on socket.close(). With a more narrowly defined test script, the problem has also been narrowed down and confirms the zombie packets are still there. In a simplified client, we can trigger this with the following:

Client performs 3-way handshake with mpy Server to establish a connection (SYN/SYN-ACK/ACK)
Client issues HTTP GET request
Client issues FIN-ACK close (even though Server isn't done yet)

Server sends exactly four packets with dataLen=536 each of the long HTTP response and then starts TCP Retransmission attempts, with increasing delays from 3 sec, to 7sec, 15sec, 30sec, 60sec 120sec, etc. In this state, doing a proper simple HTTP GET from a proper brower results in a Server ENOMEM exception and nic gets reset to recover.

Hypothesis: We know that the Server's bigdata() function is getting a proper ETIMEOUT exception, returning to the main loop and closing its socket. But this clearly is NOT purging those four packets out of the send buffers. They are now zombies and will continue to hog buffer space until the nic is reset. This test results in most of the available Tx buffer space being occupied by zombies. So, a subsequent socket handler might get out some short headers but the first attempt to write 1024 bytes to the tx buffer throws the ENOMEM exception. This seems to be an issue of garbage collection within the stack on socket close.

scapy-http-get-zombie.zip packet-trace-z01 The mpy Server code issues a socket.close() and is now waiting in a server.listen() state. A normal HTTP GET after this point will result in an ENOMEM exception after the Server has written short HTTP response headers to the socket and is trying to write 1024 bytes to the socket buffer. The cruft will remain in the stack forever until nic object is closed and re-initialized. Occasionally, very aggressive client behaviors will result in a lock up where stack will no longer respond to external packets or even a hard crash/reset.

t35tB0t commented 5 years ago

Further testing with debug statements in lwip/tcp.c reveals that not all the packets zombies - many are simply orphaned by the socket close and waiting (too long) for various timeouts. Most of them will eventually timeout (20 seconds). Yet, not all do. And with a small number of pbufs they are occupying precious packet space and in many applications we really can't afford to hang on to these packets. More significantly, while the slow tcp timeouts are counting down, further connection attempts can easily crash the stack and even hard reset the system. Something is wrong with concurrency locks, pointer management, etc.

Editing tcp.c and forcing slow-timer bpuf timeouts one slow timer tick plus dumping new connections for five seconds in the server python script did substantially improve mpy's ability to work through a broken connection. However, this is not a desirable solution.

Sometimes when there are pseudo-zombie packets in the stack, a SIG-INT keyboard break exits the mpy Server script without killing those packets. This means that destroying the objects ie socket server does not release its resources. This behavior does not seem correct for mpy. Note that this issue is somewhat similar issue #4484 with the WizNet module. In both STM32 and WizNet, a call to socket.close() or socket.del () results in the same sub-system call. IMHO, a "graceful and proper" close shouldn't be the same as a immediate object delete (disconnect). Both stacks/drivers differentiate between "delete/disconnect" and "close" but, unless we're missing something here, it seems that mpy does not appear to expose both methods.

t35tB0t commented 5 years ago

Still trying to find root cause of stack crash - this is a serious instability in that network traffic can trivially crash mpy. We've enabled all sorts of LWIP debugging and even extended MPY socket object to expose socket.abort() and socket.abandon() methods in desperate attempts to handle socket exceptions with less heap burden; all to no avail. There is definitely a serious bug lurking in there - somewhere.

An easy way to recreate the crash mpy from the mpy command line is by entering the code below and then use NMAP to generate client traffic with nmap -p 80 -T3 -O -A -v -n 10.0.0.5 . After a few tries, mpy will crash with a lock-up requiring a hard reset. Oddly, if the socket.close() is not issued, we were not able to crash the unit. It has all the appearance of a nasty bug in the LWIP memory management; possibly similar to an LWIP issue addressed here: https://github.com/SuperHouse/esp-open-rtos/issues/101

uPython code to setup server and crash while being NMAP'd with -p 80 -T3 -O -A -v -n 10.0.0.5

import time, network, usocket
nic = network.LAN()
nic.config(trace=0)
nic.active(1)
nic.ifconfig(('10.0.0.5','255.255.255.0','10.0.0.254','10.0.0.254'))
server = usocket.socket(usocket.AF_INET, usocket.SOCK_STREAM)
server.setsockopt(usocket.SOCK_STREAM, usocket.SO_REUSEADDR, 1)
server.bind(('0.0.0.0', 80))
server.listen(1)
(socket, sockaddr) = server.accept()
<<<wait for traffic to trigger a socket handler call-back>>>
socket.close()

dpgeorge commented 5 years ago

Thanks for the details @t35tB0t . I have verified that there is at least one bug which is found by the nmap test: using an STM32F769DISC board running web.py, hitting it constantly with the nmap command above, does eventually lead to lwIP locking up. I traced it down to tcp_input(), where the linked list of active TCP PCB's has somehow become a circular list, and it's stuck in an infinite loop searching this circular list for an incoming port number that doesn't exist (probably because the TCP socket with that port was recently closed).

dpgeorge commented 5 years ago

@t35tB0t just to be clear, I wrote my reply above before I saw your new reply just above it. Sounds like it's the same issue. If you'd like to verify it, put a printf in the for-loop starting at line 225, for (pcb = tcp_active_pcbs; pcb != NULL; pcb = pcb->next) ...

t35tB0t commented 5 years ago

OK - added a printf tilde ~ at top of the line225 loop in tcp_in.c; left a bunch of other LWIP debug flags (these does affect timing and actually makes it take longer to trip up). The tail of a long mpy command line log spew due to NMAP traffic follows. There are a few occurrences of tildes ~ indicating we're looping over the active_pcbs. However, not really near the crash. And, it seems that the last debug message is often the "pubf deallocating" message and the same pbuf_alloc(length==4). Note that, further up in the log, the "Closing Socket..." is from mpy script on a normal completion of the socket handler (no exception).

+-+-+-+-+-+-+-+-+-+-+-+-+-+- tcp_input: flags -+-+-+-+-+-+-+-+-+-+-+-+-+-+
tcp_process: Connection RESET
pbuf_free(2000e54c)
pbuf_free: deallocating 2000e54c
pbuf_free(200139c4)
pbuf_free: deallocating 200139c4
pbuf_alloc(length=74)
pbuf_alloc: allocated pbuf 200139c4
pbuf_alloc(length=74) == 200139c4
pbuf_header: old 200139d4 new 200139e2 (-14)
pbuf_header: old 200139e2 new 200139f6 (-20)
pbuf_header: old 200139f6 new 20013a1e (-40)
tcp_input: packed for LISTENing connection.
tcp_parseopt: other
tcp_parseopt: NOP
tcp_parseopt: MSS
tcp_parseopt: other
tcp_parseopt: other
pbuf_alloc(length=4)
pbuf_alloc(length=4) == 2000e4dc
pbuf_header: old 2000e524 new 2000e510 (20)
pbuf_header: old 2000e510 new 2000e4fc (20)
pbuf_header: old 2000e4fc new 2000e4ee (14)
pbuf_free(200139c4)
pbuf_free: deallocating 200139c4
Closing socket...

Waiting for client...pbuf_alloc(length=74)
pbuf_alloc: allocated pbuf 200139c4
pbuf_alloc(length=74) == 200139c4
pbuf_header: old 200139d4 new 200139e2 (-14)
pbuf_header: old 200139e2 new 200139f6 (-20)
pbuf_header: old 200139f6 new 20013a1e (-40)
~tcp_input: packed for LISTENing connection.
tcp_parseopt: MSS
tcp_parseopt: other
tcp_parseopt: other
tcp_parseopt: other
tcp_parseopt: EOL
pbuf_alloc(length=4)
pbuf_alloc(length=4) == 2000e530
pbuf_header: old 2000e578 new 2000e564 (20)
pbuf_header: old 2000e564 new 2000e550 (20)
pbuf_header: old 2000e550 new 2000e542 (14)
pbuf_free(200139c4)
pbuf_free: deallocating 200139c4
pbuf_alloc(length=74)
pbuf_alloc: allocated pbuf 200139c4
pbuf_alloc(length=74) == 200139c4
pbuf_header: old 200139d4 new 200139e2 (-14)
pbuf_header: old 200139e2 new 200139f6 (-20)
pbuf_header: old 200139f6 new 20013a1e (-40)
~~tcp_input: packed for LISTENing connection.
tcp_parseopt: other
tcp_parseopt: NOP
tcp_parseopt: NOP
tcp_parseopt: other
tcp_parseopt: NOP
tcp_parseopt: MSS
pbuf_alloc(length=4)
pbuf_alloc(length=4) == 2000e584
pbuf_header: old 2000e5cc new 2000e5b8 (20)
pbuf_header: old 2000e5b8 new 2000e5a4 (20)
pbuf_header: old 2000e5a4 new 2000e596 (14)
pbuf_free(200139c4)
pbuf_free: deallocating 200139c4

<<<CRASHES HERE - NO FURTHER OUTPUT. NO PING RESPONSE>>

t35tB0t commented 5 years ago

ACK! OK Confirmed - Ran same test as above with all LWIP debugging off and got stuck with a tilde ~~ storm in the tcp_in.c line225_loop. However, the prior run with debugging on shows that we can get stuck in other ways... like some sort of heap allocation/corruption.

dpgeorge commented 5 years ago

Ok, thanks for confirming. It feels like a timing related issue, a packet coming in and being processed during some critical section of lwIP which is not being protected.

dpgeorge commented 5 years ago

BTW, I tried updating to the latest lwIP but the same bug (tcp_active_pcb becoming a circular list) was still there.

dpgeorge commented 5 years ago

Further debugging (with lwIP v2.0.3) shows that tcp_in.c:tcp_listen_input() is adding a TCP PCB to the tcp_active_pcbs list for a second time, and so creating the circular list, at the line TCP_REG_ACTIVE(npcb);.

dpgeorge commented 5 years ago

After a lot more debugging I find that certain lwIP buffers are being free'd twice, leading to the same buffer being used twice and creating the circular list. A double free is occurring due to a dangling reference, and the following seems to be the root cause:

an incoming connecting is put on the accept queue on the uPy side
the connection waits on the uPy accept queue, yet to be accepted and processed
while waiting the initiator cancels the connection
the waiting PCB is closed and freed by lwIP
uPy is not notified of this cancel event and retains the waiting accepted connection
uPy takes the connection off the accept queue and processes it, but this connection is stale and it's memory is already freed and possibly reused for other things

dpgeorge commented 5 years ago

A fix for the above scenario is posted as #4644.

dpgeorge commented 5 years ago

Sometimes when there are pseudo-zombie packets in the stack, a SIG-INT keyboard break exits the mpy Server script without killing those packets. This means that destroying the objects ie socket server does not release its resources.

This should be fixed by the 4th commit in #4644

dpgeorge commented 5 years ago

Fixes for some of the issues found here were merged through #4644.

Continuing with the bug hunting, following on from https://github.com/micropython/micropython/pull/4644#issuecomment-477658125 :

The scapy abort script (attached) was tweaked and wrapped in a batch file for repeating just enough times to fail. It is using a forced exit on a trap upon HTTP response - this is terrible code but it created the desired sequence on the wire so does the trick for today. The WireShark capture below here shows the failure and timings.

I tried again with an STM32F769DISC board and could not find any errors, I could run scapy2 many many times and still do normal HTTP requests from a browser to the STM. Here is a trace of the Eth packets coming in / going out the STM board, during a single run of the scapy2 test (via lan.config(trace=7), and TCP flag annotations added by hand at the end of the lines):

[   22179] ETHRX len=60 dst=ff:ff:ff:ff:ff:ff src=xx:xx:xx:xx:xx:xx type=ARP
[   22186] ETHTX len=42 dst=xx:xx:xx:xx:xx:xx src=02:60:94:35:1b:8b type=ARP

[   22231] ETHRX len=60 dst=02:60:94:35:1b:8b src=xx:xx:xx:xx:xx:xx type=IPv4 srcip=192.168.1.4 dstip=192.168.1.5 TCP srcport=14567 dstport=80 seqnum=1092497586 acknum=0 dataoff=5 flags=2 winsz=8192 SYN 
[   22249] ETHTX len=58 dst=xx:xx:xx:xx:xx:xx src=02:60:94:35:1b:8b type=IPv4 srcip=192.168.1.5 dstip=192.168.1.4 TCP srcport=80 dstport=14567 seqnum=6510 acknum=1092497587 dataoff=6 flags=12 winsz=6400 opts= 02 04 03 20 ACK-SYN

[   22289] ETHRX len=60 dst=02:60:94:35:1b:8b src=xx:xx:xx:xx:xx:xx type=IPv4 srcip=192.168.1.4 dstip=192.168.1.5 TCP srcport=14567 dstport=80 seqnum=1092497587 acknum=0 dataoff=5 flags=4 winsz=0 RST
[   22380] ETHRX len=60 dst=02:60:94:35:1b:8b src=xx:xx:xx:xx:xx:xx type=IPv4 srcip=192.168.1.4 dstip=192.168.1.5 TCP srcport=14567 dstport=80 seqnum=1092497587 acknum=6511 dataoff=5 flags=10 winsz=8192 ACK
[   22398] ETHTX len=54 dst=xx:xx:xx:xx:xx:xx src=02:60:94:35:1b:8b type=IPv4 srcip=192.168.1.5 dstip=192.168.1.4 TCP srcport=80 dstport=14567 seqnum=6511 acknum=1092497587 dataoff=5 flags=14 winsz=6400 RST-ACK

[   22416] ETHRX len=189 dst=02:60:94:35:1b:8b src=xx:xx:xx:xx:xx:xx type=IPv4 srcip=192.168.1.4 dstip=192.168.1.5 TCP srcport=14567 dstport=80 seqnum=1092497587 acknum=6511 dataoff=5 flags=18 winsz=8192 PSH-ACK
[   22434] ETHTX len=54 dst=xx:xx:xx:xx:xx:xx src=02:60:94:35:1b:8b type=IPv4 srcip=192.168.1.5 dstip=192.168.1.4 TCP srcport=80 dstport=14567 seqnum=6511 acknum=1092497722 dataoff=5 flags=14 winsz=6400 ACK-RST

That looks like it corresponds exactly with what scapy2 is sending. Wireshark also sees exactly these 9 packets.

I tried running it at a lower CPU frequency (108MHz instead of 216MHz) and results were the same.

t35tB0t commented 5 years ago

Pulled latest stack improvement commits from main branch (thx to DPG):

MicroPython v1.10-254-g7b5bf5f6f on 2019-03-31; NUCLEO-F429ZI with STM32F429.

The dropped-connection stack jamming bug is still present as tested with scapy2 script. I'll try to re-create on a F767 later this week. Meanwhile, will work on isolating a more precise failure mechanism/script.

t35tB0t commented 5 years ago

Cleaned up the scapy test script (see below). Technically, the client is doing a "TCP connection abandon" midstream so we've renamed it as such. The connection rate can be slowed way down and yet still readily fails the F429 client by causing the stack to jam its Tx buffers with zombie packets.

Packet capture shows that, each time we're jammed using the cAbandon script, we have exactly three packets orphaned by abandoned connections that end up stuck. LWIP tries re-transmit these possibly forever (over 40 minutes in queue and longer than my attention span). Two are with [FIN.PSH,ACK] flags and a small 171 byte payload and one packet with [PSH,ACK] flags and a full 536 byte payload. These packets appear to be stuck in queue for TCP Retransmission forever (over 40 minutes). During this time, ICMP inbound packets are replied to. But TCP SYN connections are ignored. Will repeat with F767 later this week.

scapy-cAbandon.zip Run verbose as: python scapy-cAbandon.py --verbose=True --source=10.0.0.80 --target=10.0.0.5

t35tB0t commented 5 years ago

UPDATE: Stack recovered full functionality after 2886 seconds and 11 or 12 retries for each packet. The timeouts seem to be related to the delays on the original sequence (in the capture above it was an 8 second delay so 12 retries with doubling each time is... too long). The TCP backoff timing with its doubling mechanism should probably have a realistic maximum time limit to trigger packet termination and buffer release. There's probably a max Tx retry count of 12 or so somewhere which I'm guessing is what is finally recovering the stack. Reducing the Tx retry counts will help. However, we still want some re-try resilience here so an added time limit is still indicated. e.g perhaps set tx retries to 4 and add a time limit of 4 seconds?

In any case, we still have a bug where closing a uPy socket does not release these packets immediately.

dpgeorge commented 5 years ago

The new scapy-cAbandon test still doesn't show an error for me, but I think I worked out why not: as seen in the ETH trace of mine above, the STM is receiving a "spurious" RST packet (the 5th line). The packets it receives (excluding ARP) are: SYN then RST then ACK then PSH+ACK. But the scapy-cAbandon test is supposed to be sending just SYN then RST then PSH+ACK. So somehow a RST is being inserted into the stream, I guess by my version of scapy or by the Linux TCP/IP stack. By adding in some debug prints to lwIP I confirm that this spurious RST is gracefully closing the TCP socket (opened via the first SYN), and the subsequent ACK and PSH+ACK are being ignored. And so there is no memory leak.

t35tB0t commented 5 years ago

Sorry, I should have mentioned the RST issue. The OS stack will likely respond to unexpected SYN-ACKs with a connection RST packet. My workstation has other things handling this so I didn't need to bother. On Linux, to prevent unwanted OS RST intervention, one way is to add an iptables firewall rule. e.g. With a scapy client running at 10.0.0.80 and a STM at 10.0.0.5, a narrow filter can be set up from a terminal prompt as shown below (remove with -D option instead of -A):

iptables -A OUTPUT -p tcp -s 10.0.0.80 -d 10.0.0.5 --dport 80 --tcp-flags RST RST -j DROP

dpgeorge commented 5 years ago

Ok, thanks for the hint, I've now managed to get the scapy-cAbandon test to show the issue of stack jamming. The issue is that the abandoned TCP connections are indeed consuming all the lwIP memory pools so it can't create new connections.

This case is handled by lwIP when using the netconn/socket API, but uPy doesn't use that, it uses the low-level PCB interface instead. lwIP has a default 20 second timeout for TCP connections that are closed, and will abort them (freeing up memory) after that timeout if they didn't close cleanly (ie no response from remote).

In #4671 I've implemented a very simple mechanism to abort TCP connections after a timeout (currently set at 2 seconds). This seems to work pretty well for the scapy-cAbandon test.

Also related is the SO_LINGER option which might be good to implement (to control this timeout).

t35tB0t commented 5 years ago

Agreed that the abandoned TCP connections are consuming LWIP memory pools. However, the timeout certainly was not 20 seconds. With an 8 seconds cAbandon delay factor, the stack was jammed over 45 minutes! I'm not sure if they timed out or if LWIP finally gave up after 12 retries? In any case, uPy had long ago closed those connections and we shouldn't have these things lingering so long in this small stack machines.

SO_LINGER seems to be very well suited for this purpose. I'm assuming that, in LWIP, this will let the uPy application code specify that we want all the packet resources released as soon as we close the socket. If we open a socket with SO_LINGER, then we can use appropriate timeouts and a few retires in uPy before harshly closing the connection. An alternative is to add an optional option to socket.close(type=value), where type can be normal -default- or abort or abandon. We already have the tcp_abort() and tcp_abandon() functions in tcp.c and may want to be able to control resetting a connection versus silently closing or gracefully closing with a full handshake.

Also, on socket.del(), I'm asserting that this should not be calling tcp_close() as it presently does. Destroying an object should do exactly that and immediately. So, I propose that the socket.del() destructor instead execute a tcp_abandon(). The most proper way to deal with the close method is probably by exposing support for the SO_LINGER option. But this may not give one the finer-grained control that a close(type=) method might. Note that the tcp_abort() issues a RST packet.

Attached diff is a quick implementation of changing the socket destructor from tcp_close() to tcp_abandon(). Tested with scapy_cAbandon script against the uPy web.py test server modified to delete the socket in exception handler... it works. The client connection abandons trigger a uPy socket exception and this results in execution of "del socket" and that appears to release the resources appropriately. IMHO, an instant end to socket activity is a more proper behavior for object destruction.

[socket_abandon_on_delete.zip] (https://github.com/micropython/micropython/files/3046709/socket_abandon_on_delete.zip)

[web.zip] (https://github.com/micropython/micropython/files/3046712/web.zip) Launch with web.start()

Finally, its not clear that there still isn't a bug related to socket closing. I thought the recent commits released resources on close. And yet, there clearly are both FIN-ACK and PSH-ACK type packets lingering after a socket close. It also is probably useful to assess all the timeout tuning values and adjust them to be better default matches for these STM32 short-stack devices.

dpgeorge commented 5 years ago

However, the timeout certainly was not 20 seconds.

Sorry, I was pretty terse in my comment just above: lwIP's netconn API has a 20 second default timeout, after which closed sockets are aborted. This timeout is not being used by uPy because it doesn't use the netconn API. Instead we need to do it ourselves, see #4671.

Finally, its not clear that there still isn't a bug related to socket closing.

That should be fixed by #4671.

It also is probably useful to assess all the timeout tuning values and adjust them to be better default matches for these STM32 short-stack devices.

I checked for lwIP timeouts and there are none, at least none that can be tuned via config macro variables. All the timeouts (like SO_LINGER) must be implemented by us in the uPy bindings in extmod/modlwip.c.

t35tB0t commented 5 years ago

Compiled a clean pull of 4671 and this appears to ride through client-abandoned mid-request connections.

Initial checks with aggressive NMAP and scapy scripts did not jam the stack. However, we still are trapping an ENOMEM exception. It is understood that the two second timeout is trying to provide a graceful close. Yet, this may come at the expense of being able to serve multiple simultaneous connections. When a connection fails and exits with a serious exception, the abandon-on-delete method skips the 2 second delay and immediately releases the resources the failed connection was using. No ENOMEM errors occur - just the client-abandoned connection ETIMOUT. The object del method when used to close a socket on exception, keeps this kind of traffic from using up all the buffers and thus jamming other connections.

t35tB0t commented 5 years ago

IMHO - this Zombie Packets issue 4590 is closed AFAIK. The stack now appears to be highly immune to NMAP scans, PING floods, SYN floods, Layer7 DoS disconnect attacks, and random combinations of all of the above. (as confirmed on a STM32F429 with pr4671. Will continue testing with F767).

The excellent bug sleuthing and commits by @dpgeorge up to #4671 have addressed the zombie packets and stack jamming effects. Improvements beyond scope of this issue are suggested:

Implementing various stack timeouts in uPy tailored to small memory and network-local devices
Specifically adding SO_LINGER stack option to permit uPy user code control of socket close behavior
Consider exposing user additional socket close options (tcp_abandon with reset option)
Consider changing uPy destructors so action is immediate with no further I/O and to save buffer space

-Cheers

dpgeorge commented 5 years ago

@t35tB0t I'm glad you considered it fixed, and thanks for providing all the reproducible tests, it definitely helps when looking for the root cause.

micropython / micropython

Zombie Ethernet Stack #4590