Nondeterministic connection breaking on AWS

We experienced that TCP connections created with TCPunch can randomly fail on AWS. So far, we have not found the primary issue - the observed behavior that a TCP message is suddenly lost after exchanging 16 - 64 kB of data between peers. The data is sent, as verified by the Wireshark analysis, but the receiver keeps retrying for a TCP packet that never arrives. We have been able to reproduce the issue between two VMs as well.

So far, we have implemented a workaround that attempts to exchange 64 kB between two peers and restarts the pairing process.

[ ] Can we still reproduce the problem?
[ ] Can changing the default packet size influence the problem?

spcl / fmi

Nondeterministic connection breaking on AWS #9