soypat / seqs

seqs: the hottest, most idiomatic userspace TCP/IP implementation on the internet. lwip in go basically
BSD 3-Clause "New" or "Revised" License
45 stars 5 forks source link

unexpected stateClosing during HTTP request using wget #19

Closed knieriem closed 5 months ago

knieriem commented 6 months ago

Using wget to request a page from a version of your http-server example adapted for T1S, I found that at the end of the wget communication a Closing state is entered, that will result in a panic within a switch statement in ControlBlock.Recv.

level=INFO msg=TCP:tx-statechange port=80 old=Established new=FinWait1 txflags=[FIN,ACK]
level=INFO msg=TCP:rx-statechange port=80 old=FinWait1 new=Closing rxflags=[FIN,ACK]
panic: unexpected stateClosing

goroutine 1 [running]:
github.com/soypat/seqs.(*ControlBlock).Recv(0x40000cc890, {0xf99d3916, 0x4b4, 0x0, 0xf8da, 0x10})
        github.com/soypat/seqs@v0.0.0-20240421220819-60c7db9451e0/control_user.go:187 +0x744

Apparently the condition seg.ACK == tcb.snd.NXT in github.com/soypat/seqs@v0.0.0-20240421220819-60c7db9451e0/control.go:245 is not fulfilled in this case.

When using a browser instead of wget, I don't observe the Closing state: In this case, a direct transition from FinWait1 to TimeWait can be seen:

level=INFO msg=TCP:rx-statechange port=80 old=FinWait1 new=TimeWait rxflags=[FIN,ACK]

I added the lines

    case StateClosing:
        if seg.Flags.HasAny(FlagACK) {
            tcb.state = StateTimeWait
            pending, err = FlagACK, nil
        }

-- based on a TCP state diagram -- to the switch statement in ControlBlock.Recv, which results in the following message flow:

level=INFO msg=TCP:tx-statechange port=80 old=Established new=FinWait1 txflags=[FIN,ACK]
level=INFO msg=TCP:rx-statechange port=80 old=FinWait1 new=Closing rxflags=[FIN,ACK]
level=INFO msg=TCP:rx-statechange port=80 old=Closing new=TimeWait rxflags=[ACK]
level=INFO msg=lst:freeConnForReuse lport=80 rport=49158

Not sure if the added lines are sufficient.

soypat commented 6 months ago

Do you have access to wireshark captures of the event? It would allow us to write a test to catch this behaviour and validate the fix.

knieriem commented 6 months ago

Is there a specific format of the wireshark captures you would prefer? I could export via the export packet dissections menu, choosing text or json output. Or I could use save, choosing a format like pcapng.

knieriem commented 6 months ago

Wireshark capture files: wget_stateClosing_captures.zip

Command issued:

Source node: IP addr: 192.168.5.1, MAC: 04:05:06:01:02:03, running Linux 6.1.21-v8+ on RPi4, arm64

Dest node: IP addr: 192.168.5.100, MAC: 11:22:33:44:55:66, running soypat/seqs with http-server example

soypat commented 6 months ago

I'll try to take a look at this tomorrow and see if I can reproduce the bug in tests. Thank you so much for putting in the effort in describing the issue in detail <3

soypat commented 6 months ago

@knieriem Hey so I tried adding a test that replicated the exchange but was unable to make the program panic. I did notice that the states transitioned by your program differ to the test so there's still some work to do. Could you set your logger to slog.LevelDebug-2, run the program and attach the logs of the TCP connection? That would help me see exactly what is happening to the TCP Transmission control block.

Here's the log of the test I wrote trying to replicate the bug:

=== RUN   TestUnexpectedStateClosing
    seqs_test.go:702: Established Exchange start
    seqs_test.go:702: Established --> <SEQ=1><ACK=127><DATA=1192>[PSH,ACK] --> Closed
    seqs_test.go:702: Established <-- <SEQ=127><ACK=1193>[ACK]     <-- Closed
    seqs_test.go:702: CloseWait   <-- <SEQ=127><ACK=1193>[FIN,ACK] <-- Closed
    seqs_test.go:702: LastAck     --> <SEQ=1193><ACK=128>[FIN,ACK] --> Closed
    seqs_test.go:702: Closed      <-- <SEQ=128><ACK=1194>[ACK]     <-- Closed
knieriem commented 6 months ago

Hi, please see the log -- obtained using my 10Base-T1S network -- below.

[Edit: deleted a passage with an assumption that seemed possibly misleading]

time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=78
time=2024-05-05T14:46:07.646+02:00 level=DEBUG msg=TCP:recv opt=20 ipopt=0 payload=0
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=TCPConn.recv:start
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=tcb:rcv state=SynRcvd rcv.nxt=3774658309 rcv.wnd=1482 challenge=false
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=recv:seg seg.seq=3774658308 seg.ack=0 seg.wnd=64240 seg.flags=[SYN] seg.data=0
time=2024-05-05T14:46:07.646+02:00 level=INFO msg=TCP:rx-statechange port=80 old=Listen new=SynRcvd rxflags=[SYN]
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=TCPConn.stateCheck:hasPending
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=tcb:pending-out seg.seq=0 seg.ack=3774658309 seg.wnd=1482 seg.flags=[SYN,ACK] seg.data=0
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=tcb:snd state=SynRcvd pend=0 snd.nxt=1 snd.una=0 snd.wnd=64240
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=tcb:snd seg.seq=0 seg.ack=3774658309 seg.wnd=1482 seg.flags=[SYN,ACK] seg.data=0
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg=TCPConn.send:start
time=2024-05-05T14:46:07.646+02:00 level=DEBUG msg=TCP:send plen=54
time=2024-05-05T14:46:07.646+02:00 level=DEBUG-2 msg="Stack:\tHandleEth" plen=54
time=2024-05-05T14:46:07.657+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=64
time=2024-05-05T14:46:07.657+02:00 level=DEBUG msg=TCP:recv opt=0 ipopt=0 payload=0
time=2024-05-05T14:46:07.657+02:00 level=DEBUG-2 msg=TCPConn.recv:start
time=2024-05-05T14:46:07.657+02:00 level=DEBUG-2 msg=tcb:rcv state=Established rcv.nxt=3774658309 rcv.wnd=1482 challenge=false
time=2024-05-05T14:46:07.657+02:00 level=DEBUG-2 msg=recv:seg seg.seq=3774658309 seg.ack=1 seg.wnd=64240 seg.flags=[ACK] seg.data=0
time=2024-05-05T14:46:07.658+02:00 level=INFO msg=TCP:rx-statechange port=80 old=SynRcvd new=Established rxflags=[ACK]
time=2024-05-05T14:46:07.658+02:00 level=DEBUG-2 msg=lst:noconn2snd
time=2024-05-05T14:46:07.658+02:00 level=INFO msg="new connection" remote=192.168.5.1:38102
time=2024-05-05T14:46:07.658+02:00 level=DEBUG-2 msg=TCPConn.SetReadDeadline:start
time=2024-05-05T14:46:07.658+02:00 level=DEBUG-2 msg=TCPConn.SetWriteDeadline:start
time=2024-05-05T14:46:07.658+02:00 level=DEBUG-2 msg=TCPConn.Read:start
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=184
time=2024-05-05T14:46:07.669+02:00 level=DEBUG msg=TCP:recv opt=0 ipopt=0 payload=126
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=TCPConn.recv:start
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=tcb:rcv state=Established rcv.nxt=3774658435 rcv.wnd=1482 challenge=false
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=recv:seg seg.seq=3774658309 seg.ack=1 seg.wnd=64240 seg.flags=[PSH,ACK] seg.data=126
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=TCPConn.stateCheck:hasPending
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=tcb:pending-out seg.seq=1 seg.ack=3774658435 seg.wnd=1356 seg.flags=[ACK] seg.data=0
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=tcb:snd state=Established pend=0 snd.nxt=1 snd.una=1 snd.wnd=64240
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=tcb:snd seg.seq=1 seg.ack=3774658435 seg.wnd=1356 seg.flags=[ACK] seg.data=0
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg=TCPConn.send:start
time=2024-05-05T14:46:07.669+02:00 level=DEBUG msg=TCP:send plen=54
time=2024-05-05T14:46:07.669+02:00 level=DEBUG-2 msg="Stack:\tHandleEth" plen=54
Got webpage request!
time=2024-05-05T14:46:07.674+02:00 level=DEBUG-2 msg=TCPConn.Write:start
time=2024-05-05T14:46:07.674+02:00 level=DEBUG-2 msg=TCPConn.Write:start
time=2024-05-05T14:46:07.680+02:00 level=DEBUG-2 msg=tcb:pending-out seg.seq=1 seg.ack=3774658435 seg.wnd=1482 seg.flags=[ACK] seg.data=1192
time=2024-05-05T14:46:07.680+02:00 level=DEBUG-2 msg=tcb:snd state=Established pend=0 snd.nxt=1193 snd.una=1 snd.wnd=64240
time=2024-05-05T14:46:07.681+02:00 level=DEBUG-2 msg=tcb:snd seg.seq=1 seg.ack=3774658435 seg.wnd=1482 seg.flags=[ACK] seg.data=1192
time=2024-05-05T14:46:07.681+02:00 level=DEBUG-2 msg=tcb:close state=Established
time=2024-05-05T14:46:07.681+02:00 level=DEBUG msg=TCP:delayed-close port=80
time=2024-05-05T14:46:07.681+02:00 level=DEBUG-2 msg=TCPConn.stateCheck:hasPending
time=2024-05-05T14:46:07.681+02:00 level=DEBUG-2 msg=TCPConn.send:start
time=2024-05-05T14:46:07.681+02:00 level=DEBUG msg=TCP:send plen=1246
time=2024-05-05T14:46:07.681+02:00 level=DEBUG-2 msg="Stack:\tHandleEth" plen=1246
time=2024-05-05T14:46:07.694+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=64
time=2024-05-05T14:46:07.694+02:00 level=DEBUG msg=TCP:recv opt=0 ipopt=0 payload=0
time=2024-05-05T14:46:07.694+02:00 level=DEBUG-2 msg=TCPConn.recv:start
time=2024-05-05T14:46:07.694+02:00 level=DEBUG-2 msg=tcb:rcv state=Established rcv.nxt=3774658435 rcv.wnd=1482 challenge=false
time=2024-05-05T14:46:07.694+02:00 level=DEBUG-2 msg=recv:seg seg.seq=3774658435 seg.ack=1193 seg.wnd=63176 seg.flags=[ACK] seg.data=0
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=tcb:close state=Established
time=2024-05-05T14:46:07.695+02:00 level=DEBUG msg=TCP:delayed-close port=80
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=TCPConn.stateCheck:hasPending
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=tcb:pending-out seg.seq=1193 seg.ack=3774658435 seg.wnd=1482 seg.flags=[FIN,ACK] seg.data=0
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=tcb:snd state=FinWait1 pend=0 snd.nxt=1194 snd.una=1193 snd.wnd=63176
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=tcb:snd seg.seq=1193 seg.ack=3774658435 seg.wnd=1482 seg.flags=[FIN,ACK] seg.data=0
time=2024-05-05T14:46:07.695+02:00 level=INFO msg=TCP:tx-statechange port=80 old=Established new=FinWait1 txflags=[FIN,ACK]
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg=TCPConn.send:start
time=2024-05-05T14:46:07.695+02:00 level=DEBUG msg=TCP:send plen=54
time=2024-05-05T14:46:07.695+02:00 level=DEBUG-2 msg="Stack:\tHandleEth" plen=54
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=64
time=2024-05-05T14:46:07.706+02:00 level=DEBUG msg=TCP:recv opt=0 ipopt=0 payload=0
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=TCPConn.recv:start
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=tcb:rcv state=Closing rcv.nxt=3774658436 rcv.wnd=1482 challenge=false
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=recv:seg seg.seq=3774658435 seg.ack=1193 seg.wnd=63176 seg.flags=[FIN,ACK] seg.data=0
time=2024-05-05T14:46:07.706+02:00 level=INFO msg=TCP:rx-statechange port=80 old=FinWait1 new=Closing rxflags=[FIN,ACK]
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=TCPConn.stateCheck:hasPending
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=Stack.RecvEth:start plen=64
time=2024-05-05T14:46:07.706+02:00 level=DEBUG msg=TCP:recv opt=0 ipopt=0 payload=0
time=2024-05-05T14:46:07.706+02:00 level=DEBUG-2 msg=TCPConn.recv:start
panic: unexpected stateClosing
soypat commented 6 months ago

Alright, all looks almost ready for the fix. I've just one question for you- you suggest setting the pending flag to ACK in the added switch case though after entering a TimeWait state no more data should be exchanged. Is sending out the ACK necessary in that case? I've applied your fix in #21 with a slight modification in how the pending flags are calculated. Are you able to test the fix?

knieriem commented 6 months ago

Thank you for preparing the PR. I can confirm that the fix in #21 is working properly for my setup. Regarding your question -- I agree that no ACK should be sent (this is also what the state diagrams say).

[By the way, I figured out how to achieve the same message flow behaviour in my setup as in the original, cyw43439 based http-server example: Calling HandleEth more often will change the timing, so that not Closing state, but FinWait2 state is entered. Therefore a way to reproduce the behaviour involving the Closing state on the Pico W could be to call HandleEth slightly less often.]


Since I looked a lot at the wireshark capture logs the last days, I began to wonder why FinWait2 is even entered in the normal, not panic resulting message flow, and why TimeWait isn't entered directly from FinWait1. From my understanding in this particular case an ACK for a previous data segment (that happens to be received just after the server sent its FIN+ACK) is mistaken for a single ACK of the server's FIN+ACK, so that FinWait2 is entered instead of waiting for a matching FIN/ACK segment. If you don't mind, I would describe this in a separate issue.

soypat commented 5 months ago

a previous data segment [...] is mistaken for a single ACK of the server's FIN+ACK, so that FinWait2 is entered instead of waiting for a matching FIN/ACK segment. If you don't mind, I would describe this in a separate issue.

This sounds like it could cause issues going forward. Please do create a new issue detailing observations. Logs welcome! As for the PR will merge and close this issue by end of week if nothing else crops up.