Open nashif opened 7 years ago
by Paul Sokolovsky:
So, as I was afraid, this issue is blocking for the implementation of BSD Sockets layer. The matter, sockets layer has to queue incoming packets (per socket/context), until app requests data from them. At the same time, Zephyr IP stack always advertises ability to receive more data. As the peer can send packets very quickly, soon all receive buffers can be used up, and further packets from the peer will be dropped. That means they won't be acknowledged to the peer, it will notice that, and will use exponential backoff when resending them. As the process still has positive feedback loop, very soon packer rate from peer will crawl to a halt.
What's interesting is that in my initial testing against http://archive.ubuntu.com this issue didn't visibly pop up, but there're some other apparent issues, so connections with such far away remote host die. I decided to do less ambitious testing using local Apache, and there issue described above was immediately visible.
I coded a simple and dirty patch to manage receive window in a simple way, and with it, issue with Apache was gone and I was able to download 200MB (in multiple HTTP requests) without issues. I'm going to post RFC on the mailing list and the code above as WIP show-off.
by Andrei Laperie:
We won't be able to fix it by 1.9. Shifting to 1.10. Since there is a workaround for most of the effects if this issue, lowering a priority to Medium
by Paul Sokolovsky:
There're actually no known workaround (to me). Any big TCP transfer with real-world Internet servers would deadlock without this feature. But I of course agree that this can't be planned for 1.9 any longer, though I hope to prototype an alternative solution (with respect to https://github.com/zephyrproject-rtos/zephyr/pull/81)
Blocks GH-1769
Related to GH-1841
Reported by Paul Sokolovsky:
subsys/net/ip/tcp.c:get_recv_wnd() has the following comment:
Unfortunately, if an app queues received data (doesn't free data fragments), it's our business, because number of free data buffer decreases, yet a peer keeps us bombarding with more data packets. And receive window handling will be definitely required for GH-1769.
There're 2 approaches to receive window management:
(Imported from Jira ZEP-1999)