zephyriot / zep-jira14

0 stars 0 forks source link

IP stack: No TCP receive window handling #1840

Open nashif opened 7 years ago

nashif commented 7 years ago

Reported by Paul Sokolovsky:

subsys/net/ip/tcp.c:get_recv_wnd() has the following comment:

        /* We don't queue received data inside the stack, we hand off
         * packets to synchronous callbacks (who can queue if they
         * want, but it's not our business).  So the available window
         * size is always the same.  There are two configurables to
         * check though.
         */
        return min(NET_TCP_MAX_WIN, NET_TCP_BUF_MAX_LEN);

Unfortunately, if an app queues received data (doesn't free data fragments), it's our business, because number of free data buffer decreases, yet a peer keeps us bombarding with more data packets. And receive window handling will be definitely required for GH-1769.

There're 2 approaches to receive window management:

(Imported from Jira ZEP-1999)

nashif commented 7 years ago

by Paul Sokolovsky:

So, as I was afraid, this issue is blocking for the implementation of BSD Sockets layer. The matter, sockets layer has to queue incoming packets (per socket/context), until app requests data from them. At the same time, Zephyr IP stack always advertises ability to receive more data. As the peer can send packets very quickly, soon all receive buffers can be used up, and further packets from the peer will be dropped. That means they won't be acknowledged to the peer, it will notice that, and will use exponential backoff when resending them. As the process still has positive feedback loop, very soon packer rate from peer will crawl to a halt.

What's interesting is that in my initial testing against http://archive.ubuntu.com this issue didn't visibly pop up, but there're some other apparent issues, so connections with such far away remote host die. I decided to do less ambitious testing using local Apache, and there issue described above was immediately visible.

I coded a simple and dirty patch to manage receive window in a simple way, and with it, issue with Apache was gone and I was able to download 200MB (in multiple HTTP requests) without issues. I'm going to post RFC on the mailing list and the code above as WIP show-off.

nashif commented 7 years ago

by Andrei Laperie:

We won't be able to fix it by 1.9. Shifting to 1.10. Since there is a workaround for most of the effects if this issue, lowering a priority to Medium

nashif commented 7 years ago

by Paul Sokolovsky:

There're actually no known workaround (to me). Any big TCP transfer with real-world Internet servers would deadlock without this feature. But I of course agree that this can't be planned for 1.9 any longer, though I hope to prototype an alternative solution (with respect to https://github.com/zephyrproject-rtos/zephyr/pull/81)

nashif commented 7 years ago

Blocks GH-1769

nashif commented 7 years ago

Related to GH-1841