Closed knieriem closed 5 months ago
Do you have access to wireshark captures of the event? It would allow us to write a test to catch this behaviour and validate the fix.
Is there a specific format of the wireshark captures you would prefer? I could export via the export packet dissections menu, choosing text or json output. Or I could use save, choosing a format like pcapng.
Wireshark capture files: wget_stateClosing_captures.zip
wget_unexpected_stateClosing.pcapng
capture file showing the case when the unexpected stateClosing
panic occurs on the server side
wget_with_recvStateClosingPatch_applied.pcapng
capture file showing the message sequence observed with the added lines in ControlBlock.Recv
Command issued:
wget -O - http://192.168.5.100
Source node: IP addr: 192.168.5.1, MAC: 04:05:06:01:02:03
, running Linux 6.1.21-v8+ on RPi4, arm64
Dest node: IP addr: 192.168.5.100, MAC: 11:22:33:44:55:66
, running soypat/seqs
with http-server example
I'll try to take a look at this tomorrow and see if I can reproduce the bug in tests. Thank you so much for putting in the effort in describing the issue in detail <3
@knieriem Hey so I tried adding a test that replicated the exchange but was unable to make the program panic. I did notice that the states transitioned by your program differ to the test so there's still some work to do. Could you set your logger to slog.LevelDebug-2
, run the program and attach the logs of the TCP connection? That would help me see exactly what is happening to the TCP Transmission control block.
Here's the log of the test I wrote trying to replicate the bug:
=== RUN TestUnexpectedStateClosing
seqs_test.go:702: Established Exchange start
seqs_test.go:702: Established --> <SEQ=1><ACK=127><DATA=1192>[PSH,ACK] --> Closed
seqs_test.go:702: Established <-- <SEQ=127><ACK=1193>[ACK] <-- Closed
seqs_test.go:702: CloseWait <-- <SEQ=127><ACK=1193>[FIN,ACK] <-- Closed
seqs_test.go:702: LastAck --> <SEQ=1193><ACK=128>[FIN,ACK] --> Closed
seqs_test.go:702: Closed <-- <SEQ=128><ACK=1194>[ACK] <-- Closed
Hi, please see the log -- obtained using my 10Base-T1S network -- below.
[Edit: deleted a passage with an assumption that seemed possibly misleading]
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=78
time=2024-05-05T14:46:07.646+02:00 level=DEBUG msg=TCP:recv opt=20 ipopt=0 payload=0
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=TCPConn.recv:start
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=tcb:rcv state=SynRcvd rcv.nxt=3774658309 rcv.wnd=1482 challenge=false
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=recv:seg seg.seq=3774658308 seg.ack=0 seg.wnd=64240 seg.flags=[SYN] seg.data=0
time=2024-05-05T14:46:07.646+02:00 level=INFO msg=TCP:rx-statechange port=80 old=Listen new=SynRcvd rxflags=[SYN]
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=TCPConn.stateCheck:hasPending
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=tcb:pending-out seg.seq=0 seg.ack=3774658309 seg.wnd=1482 seg.flags=[SYN,ACK] seg.data=0
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=tcb:snd state=SynRcvd pend=0 snd.nxt=1 snd.una=0 snd.wnd=64240
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=tcb:snd seg.seq=0 seg.ack=3774658309 seg.wnd=1482 seg.flags=[SYN,ACK] seg.data=0
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=TCPConn.send:start
time=2024-05-05T14:46:07.646+02:00 level=DEBUG msg=TCP:send plen=54
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg="Stack:\tHandleEth" plen=54
time=2024-05-05T14:46:07.657+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=64
time=2024-05-05T14:46:07.657+02:00 level=DEBUG msg=TCP:recv opt=0 ipopt=0 payload=0
time=2024-05-05T14:46:07.657+02:00 level=DEBUG-2 msg=TCPConn.recv:start
time=2024-05-05T14:46:07.657+02:00 level=DEBUG-2 msg=tcb:rcv state=Established rcv.nxt=3774658309 rcv.wnd=1482 challenge=false
time=2024-05-05T14:46:07.657+02:00 level=DEBUG-2 msg=recv:seg seg.seq=3774658309 seg.ack=1 seg.wnd=64240 seg.flags=[ACK] seg.data=0
time=2024-05-05T14:46:07.658+02:00 level=INFO msg=TCP:rx-statechange port=80 old=SynRcvd new=Established rxflags=[ACK]
time=2024-05-05T14:46:07.658+02:00 level=DEBUG-2 msg=lst:noconn2snd
time=2024-05-05T14:46:07.658+02:00 level=INFO msg="new connection" remote=192.168.5.1:38102
time=2024-05-05T14:46:07.658+02:00 level=DEBUG-2 msg=TCPConn.SetReadDeadline:start
time=2024-05-05T14:46:07.658+02:00 level=DEBUG-2 msg=TCPConn.SetWriteDeadline:start
time=2024-05-05T14:46:07.658+02:00 level=DEBUG-2 msg=TCPConn.Read:start
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=184
time=2024-05-05T14:46:07.669+02:00 level=DEBUG msg=TCP:recv opt=0 ipopt=0 payload=126
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=TCPConn.recv:start
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=tcb:rcv state=Established rcv.nxt=3774658435 rcv.wnd=1482 challenge=false
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=recv:seg seg.seq=3774658309 seg.ack=1 seg.wnd=64240 seg.flags=[PSH,ACK] seg.data=126
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=TCPConn.stateCheck:hasPending
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=tcb:pending-out seg.seq=1 seg.ack=3774658435 seg.wnd=1356 seg.flags=[ACK] seg.data=0
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=tcb:snd state=Established pend=0 snd.nxt=1 snd.una=1 snd.wnd=64240
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=tcb:snd seg.seq=1 seg.ack=3774658435 seg.wnd=1356 seg.flags=[ACK] seg.data=0
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=TCPConn.send:start
time=2024-05-05T14:46:07.669+02:00 level=DEBUG msg=TCP:send plen=54
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg="Stack:\tHandleEth" plen=54
Got webpage request!
time=2024-05-05T14:46:07.674+02:00 level=DEBUG-2 msg=TCPConn.Write:start
time=2024-05-05T14:46:07.674+02:00 level=DEBUG-2 msg=TCPConn.Write:start
time=2024-05-05T14:46:07.680+02:00 level=DEBUG-2 msg=tcb:pending-out seg.seq=1 seg.ack=3774658435 seg.wnd=1482 seg.flags=[ACK] seg.data=1192
time=2024-05-05T14:46:07.680+02:00 level=DEBUG-2 msg=tcb:snd state=Established pend=0 snd.nxt=1193 snd.una=1 snd.wnd=64240
time=2024-05-05T14:46:07.681+02:00 level=DEBUG-2 msg=tcb:snd seg.seq=1 seg.ack=3774658435 seg.wnd=1482 seg.flags=[ACK] seg.data=1192
time=2024-05-05T14:46:07.681+02:00 level=DEBUG-2 msg=tcb:close state=Established
time=2024-05-05T14:46:07.681+02:00 level=DEBUG msg=TCP:delayed-close port=80
time=2024-05-05T14:46:07.681+02:00 level=DEBUG-2 msg=TCPConn.stateCheck:hasPending
time=2024-05-05T14:46:07.681+02:00 level=DEBUG-2 msg=TCPConn.send:start
time=2024-05-05T14:46:07.681+02:00 level=DEBUG msg=TCP:send plen=1246
time=2024-05-05T14:46:07.681+02:00 level=DEBUG-2 msg="Stack:\tHandleEth" plen=1246
time=2024-05-05T14:46:07.694+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=64
time=2024-05-05T14:46:07.694+02:00 level=DEBUG msg=TCP:recv opt=0 ipopt=0 payload=0
time=2024-05-05T14:46:07.694+02:00 level=DEBUG-2 msg=TCPConn.recv:start
time=2024-05-05T14:46:07.694+02:00 level=DEBUG-2 msg=tcb:rcv state=Established rcv.nxt=3774658435 rcv.wnd=1482 challenge=false
time=2024-05-05T14:46:07.694+02:00 level=DEBUG-2 msg=recv:seg seg.seq=3774658435 seg.ack=1193 seg.wnd=63176 seg.flags=[ACK] seg.data=0
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=tcb:close state=Established
time=2024-05-05T14:46:07.695+02:00 level=DEBUG msg=TCP:delayed-close port=80
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=TCPConn.stateCheck:hasPending
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=tcb:pending-out seg.seq=1193 seg.ack=3774658435 seg.wnd=1482 seg.flags=[FIN,ACK] seg.data=0
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=tcb:snd state=FinWait1 pend=0 snd.nxt=1194 snd.una=1193 snd.wnd=63176
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=tcb:snd seg.seq=1193 seg.ack=3774658435 seg.wnd=1482 seg.flags=[FIN,ACK] seg.data=0
time=2024-05-05T14:46:07.695+02:00 level=INFO msg=TCP:tx-statechange port=80 old=Established new=FinWait1 txflags=[FIN,ACK]
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=TCPConn.send:start
time=2024-05-05T14:46:07.695+02:00 level=DEBUG msg=TCP:send plen=54
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg="Stack:\tHandleEth" plen=54
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=64
time=2024-05-05T14:46:07.706+02:00 level=DEBUG msg=TCP:recv opt=0 ipopt=0 payload=0
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=TCPConn.recv:start
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=tcb:rcv state=Closing rcv.nxt=3774658436 rcv.wnd=1482 challenge=false
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=recv:seg seg.seq=3774658435 seg.ack=1193 seg.wnd=63176 seg.flags=[FIN,ACK] seg.data=0
time=2024-05-05T14:46:07.706+02:00 level=INFO msg=TCP:rx-statechange port=80 old=FinWait1 new=Closing rxflags=[FIN,ACK]
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=TCPConn.stateCheck:hasPending
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=64
time=2024-05-05T14:46:07.706+02:00 level=DEBUG msg=TCP:recv opt=0 ipopt=0 payload=0
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=TCPConn.recv:start
panic: unexpected stateClosing
Alright, all looks almost ready for the fix. I've just one question for you- you suggest setting the pending flag to ACK in the added switch case though after entering a TimeWait state no more data should be exchanged. Is sending out the ACK necessary in that case? I've applied your fix in #21 with a slight modification in how the pending flags are calculated. Are you able to test the fix?
Thank you for preparing the PR. I can confirm that the fix in #21 is working properly for my setup. Regarding your question -- I agree that no ACK should be sent (this is also what the state diagrams say).
[By the way, I figured out how to achieve the same message flow behaviour in my setup as in the original, cyw43439 based http-server example: Calling HandleEth more often will change the timing, so that not Closing state, but FinWait2 state is entered. Therefore a way to reproduce the behaviour involving the Closing state on the Pico W could be to call HandleEth slightly less often.]
Since I looked a lot at the wireshark capture logs the last days, I began to wonder why FinWait2 is even entered in the normal, not panic resulting message flow, and why TimeWait isn't entered directly from FinWait1. From my understanding in this particular case an ACK for a previous data segment (that happens to be received just after the server sent its FIN+ACK) is mistaken for a single ACK of the server's FIN+ACK, so that FinWait2 is entered instead of waiting for a matching FIN/ACK segment. If you don't mind, I would describe this in a separate issue.
a previous data segment [...] is mistaken for a single ACK of the server's FIN+ACK, so that FinWait2 is entered instead of waiting for a matching FIN/ACK segment. If you don't mind, I would describe this in a separate issue.
This sounds like it could cause issues going forward. Please do create a new issue detailing observations. Logs welcome! As for the PR will merge and close this issue by end of week if nothing else crops up.
Using
wget
to request a page from a version of your http-server example adapted for T1S, I found that at the end of the wget communication a Closing state is entered, that will result in a panic within aswitch
statement inControlBlock.Recv
.Apparently the condition
seg.ACK == tcb.snd.NXT
ingithub.com/soypat/seqs@v0.0.0-20240421220819-60c7db9451e0/control.go:245
is not fulfilled in this case.When using a browser instead of wget, I don't observe the Closing state: In this case, a direct transition from FinWait1 to TimeWait can be seen:
I added the lines
-- based on a TCP state diagram -- to the switch statement in
ControlBlock.Recv
, which results in the following message flow:Not sure if the added lines are sufficient.