mirage / mirage-tcpip

TCP/IP networking stack in pure OCaml, using the Mirage platform libraries. Includes IPv4/6, ICMP, and UDP/TCP support.
https://mirage.io
ISC License
339 stars 87 forks source link

Fixing memory leaks when a simultaneous close is happening #489

Closed TheLortex closed 2 years ago

TheLortex commented 2 years ago

When the client and the server want to close a connection at the same time, they both send a FIN.

I have identified situations where mirage-tcpip might keep the pcb forever, stuck in closing states, causing a memory leak. This might be linked to https://github.com/mirage/mirage-tcpip/issues/488.

This PR adds two tests exposing the behavior.

Test 1: close_ack_scenario

Both client and server sends a FIN, then they ACK each other's FIN. The state transition for the server is:

The problem is that the connection state would stay in Time_wait forever. This is fixed by commit Add timewait timeout for Closing -> Time_wait transition.

Test 2: close_reset_scenario

Now, we assume the client doesn't ackowledge the FIN. The connection is then in a Closing(_) state.

I have encountered a situation where it would send a RST instead with a valid but not expected sequence number. In that case, check_valid_segment returns ChallengeAck and writes into q.rx_data but because we are in a closing state that mailbox variable is not read anymore (as in flow.ml Rx.thread would have returned). So the challenge ack is not sent and the connection stay in this zombie state.

The fix for this is to use the send_ack mailbox variable that can be used to send an empty ack even if the rx thread is stopped: enable challenge ack even after FIN

Untested: What if the client stops communicating ?

In this Closing(_) state, we are waiting for the client to ACK the server's FIN. If nothing happens, the retransmission mechanism should re-send the FIN. Instead, the retransmission timer is currently stopped if the connection is in the Closing(_) state

Add retransmit timer when in a Closing state

Disclaimer

TCP is hard ! I would be very happy if someone can review these changes and make sure nothing is broken with these patches.

haesbaert commented 2 years ago

This is a great find ! :1st_place_medal: I'm back from :palm_tree: next week and will have a look

balrajsingh commented 2 years ago

I too am away. I'm away till July 10, I will definitely look after I get back. But briefly, the problem looks real and the fix looks correct. I just want to look a bit more carefully to make sure it doesn't lead to an ACK storm where both sides keep sending unnecessary ACKs back and forth forever. This should probably also be checked by looking at traces were simultaneous closes can clearly be seen and then the connection goes on to close correctly without an ACK storm.

On Fri, Jul 1, 2022, 6:00 PM Christiano Haesbaert @.***> wrote:

This is a great find ! 🥇 I'm back from 🌴 next week and will have a look

— Reply to this email directly, view it on GitHub https://github.com/mirage/mirage-tcpip/pull/489#issuecomment-1172488948, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG6JEK7MPW24A5B3JUA7E3VR4IZTANCNFSM52JJ3RSA . You are receiving this because your review was requested.Message ID: @.***>

ansiwen commented 2 years ago

Thanks folks for looking into this and thanks a lot @TheLortex for the great work so far!

Some context: This work resulted from a memory leak that we only can observe with an unikernel running as a Muen subject (=VM). As a kvm instance I can't reproduce the issue. Also it only shows up when the requests (from curl in this case) are directly done on the network interface connected to the server hardware Muen is running on. If the request comes from another routed network (from a container for example) it also doesn't show up. ~The sad news is: If I didn't do something wrong, this fix still doesn't fix our issue. There seems to be more luring.~ Edit: this patch fixes this! But still there could be more lurking. So it would be great if you folks could look for something more that might be leaking.

A detail that might be relevant, is that the clock provided by the Muen kernel has a reduced resolution for security reasons. Not sure this is really relevant, but just to keep it in mind.

Here some info from our internal ticket:

I created pcap files and debug logs for both cases. As you can see, the debug logs show, that in the leaking case (right side), the pcb entry is not removed:

Bildschirmfoto 2022-06-27 um 16 11 52

At the same time the pcap traces show that in the leaking case only one RST package is sent, in the non-leaking case there are four, but I'm not sure if this is relevant:

without leak:

trace-no-leak

with leak:

trace-leak

Here the pcap files with 1000 requests for both cases (with and without leak): leak-info-1000-without-leak.pcap.zip leak-info-1000-with-leak.pcap.zip

EDIT: I can also move this to a new issue, if you prefer to merge this PR as is.

haesbaert commented 2 years ago

If I read this right, in https://github.com/mirage/mirage-tcpip/blob/bb6e85d29e048a2ecf730e5737e231e4723db596/src/tcp/state.ml#L124 It is always an error to transition to TIME_WAIT purely, without actually arming the timeout. Looking at Stevens I also don't see any transition to TIME_WAIT where the timeout shouldn't be armed.

ansiwen commented 2 years ago

Not sure if that detail is important, but please note, that the last of several RSTs in the non-leaking case has another Seq value. There is no RST with Seq=805 in the leaking case. Maybe it's broken in both cases, but the additional RST "rescues" the situation?

TheLortex commented 2 years ago

Yes exactly. RST with Seq=804 is supposed to trigger a challenge ACK, as it is not the next expected Seq number, as the previous FIN also uses Seq=804. That's why only RST with Seq=805 is handled.

hannesm commented 2 years ago

I have not looked into the details of this PR, but @ansiwen reported there's still a leak: is the pcap the same as before this patch, or is there a difference?

ansiwen commented 2 years ago

Not sure about pcap, would need to create it, but at least the log output was 100% identical.

ansiwen commented 2 years ago

A short heads-up that this fix does indeed fix the leak we are observing. Will update with more details tomorrow.

ansiwen commented 2 years ago

Because mirage ignored the pin in config.ml I didn't actually try the patched version before. I pinned now directly in opam and then it finally worked!

Here is how the debug log looks like now (right) compared to before (left):

Bildschirmfoto 2022-07-19 um 17 15 46

The network trace looks like this now:

Bildschirmfoto 2022-07-19 um 16 24 29

leak-info-1000-baremetal-with-fix4.pcap.zip

haesbaert commented 2 years ago

Can we get another OK for this ?

TheLortex commented 2 years ago

@hannesm or @balrajsingh would one you be available to review the patch ? We're now confident that it fixes Nitrokey's issue.

hannesm commented 2 years ago

Code looks fine (though I'm still not an expert at this TCP/IP stack), @ansiwen reports it solves the issue. Would you mind to write an entry for CHANGES and cut a release @TheLortex (I can as well do the release if you like).

hannesm commented 2 years ago

Question for @ansiwen: did you test 1813264 or which commit?

TheLortex commented 2 years ago

Would you mind to write an entry for CHANGES and cut a release @TheLortex (I can as well do the release if you like).

Great, I'll take care of that.

@ansiwen tested on 1bd6270b2b4a3f3974a0402b44aa61a984f2d8ab, and normally the additional commit 1813264ba564136b0400f7f4a66034ea05fa24e7 does not change the behavior of the stack

balrajsingh commented 2 years ago

I mainly looked to see if the send_challenge_ack change breaks anything else - this is the central change as I understand it, please let me know if not so.

I think that the fix is good. It is specific to the challenge ACK situation only, when no data needs to be pushed up to the application. Your test is also convincing. I too agree that the change can be merged.

And @TheLortex, yes TCP is hard. So glad that you are looking at it in such detail. Many thanks!

ansiwen commented 2 years ago

@ansiwen tested on 1bd6270b2b4a3f3974a0402b44aa61a984f2d8ab, and normally the additional commit 1813264ba564136b0400f7f4a66034ea05fa24e7 does not change the behavior of the stack

No, actually I tested still on b9b88d956279bc8b31732cf063f2541fd21c427d, I will test tomorrow again with the latest commit.

TheLortex commented 2 years ago

I have added a changelog entry and I am ready to cut a patch release today when @ansiwen confirms that the latest commit still works !

ansiwen commented 2 years ago

I can confirm that 1813264ba564136b0400f7f4a66034ea05fa24e7 still fixes the issue.

dinosaure commented 2 years ago

Let's merge and cut a release so! Thanks all for your work.