zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.12k stars 6.21k forks source link

TCP: Zero window probe packet incorrect #47966

Closed ssharks closed 1 year ago

ssharks commented 1 year ago

Describe the bug The Zero Window Probe is supposed to send the next byte in line. This is described by:

https://www.wireshark.org/docs/wsug_html_chunked/ChAdvTCPAnalysis.html

TCP ZeroWindowProbe Set when the sequence number is equal to the next expected sequence number, the segment size is one, and last-seen window size in the reverse direction was zero.

If the single data byte from a Zero Window Probe is dropped by the receiver (not ACKed), then a subsequent segment should not be flagged as retransmission if all of the following conditions are true for that segment: The segment size is larger than one. The next expected sequence number is one less than the current sequence number.

This affects “Fast Retransmission”, “Out-Of-Order”, or “Retransmission”.

The current implementation actually sends an ACK packet with the sequence number of the previous byte. This is the behavior that a TCP Keep Alive packet should have.

See: https://github.com/zephyrproject-rtos/zephyr/blob/0649679b98308d89a223407fa61216fc39d402bf/subsys/net/ip/tcp.c#L1308

To Reproduce This issue was discovered during code review in PR https://github.com/zephyrproject-rtos/zephyr/pull/47115

Expected behavior Sending the next byte in line

Impact Depending on the TCP implementation on the receiving side, it could happen the Zero Window Probe is not acknowledged rendering the ZWP mechanism non functional. This is likely the case when the other side does not support keep alive.

Environment (please complete the following information): NA

rlubos commented 1 year ago

The link you've posted does not explain what to do in a case when there is no data to send. Keep-alive based probe, from the experience, actually more often seen in the wild (but I might be wrong here) seems to be more universal in this case.

ssharks commented 1 year ago

TCP Keep-alive and Zero Window Probe are two different things.

In basis TCP should never send an acknowledgement to an ACK message, otherwise you keep sending acknowledgements back and forth, which will never work. If ZWP is implemented as defined in the reference, this will always work, simply because there is always 1 byte of payload.

TCP keep alive is a special case and requires an exception of the rule to to never acknowledge an ACK message. This does not have to be implemented by all TCP stacks. That's why the TCP keep alive packets uses the sequence_number-1. Implementing only sending an acknowledgment to and ACK with the sequence number that is 1 out of order when there is no data to be send, will make the TCP keep alive work, without the risk of creating an endless ACK loop.

rlubos commented 1 year ago

I can only aggree on the fact, that a textbook example of TCP Keep-alive and ZWP are indeed two different things. It doesn't change the fact however, that you can achieve the same result (i. e. window update) by sending a Keep-alive packet instead of a textbook ZWP. And it's not that only some wacky TCP implementations in the wild do that - Linux does the very same thing: image

In basis TCP should never send an acknowledgement to an ACK message, otherwise you keep sending acknowledgements back and forth, which will never work.

This would only be true if both ends kept sending acknowledgments with older sequence than expected, which is wrong behavior. In a real scenario however, both ends will synchronize, so the acknowledment exchange would stop.

TCP keep alive is a special case and requires an exception of the rule to to never acknowledge an ACK message. This does not have to be implemented by all TCP stacks. That's why the TCP keep alive packets uses the sequence_number-1.

I'm sorry but I can't agree with this. Receiving a packet with an older than expected sequence number means that the acknowledgment sent to the peer was lost and it has to be resent. RFC 1122 makes it clear taht even if the packet with the older seq has not data, it shoud be acknowledged:

Unfortunately, some misbehaved TCP implementations fail to respond to a segment with SEG.SEQ = SND.NXT-1 unless the segment contains data.

So to summarize, I'm not really sure if it's worthwhile to spend time reworking the ZWP mechanism, w/o a clear benefit from it. The implementation would only get more complex, and it's still not clear to me how should the ZWP look like if there is no new data to send.

ssharks commented 1 year ago

A ZWP only needs to be send when there is data to be send, in order to make it work properly we need to have an option for tcp_send_data, where we can specify the length to be transmitted = 1 after setting conn->unacked_len to 0.

rlubos commented 1 year ago

A ZWP only needs to be send when there is data to be send

This will be problematic in terms of POLLOUT processing, imagine the following scenario:

  1. Zephyr uses poll()/non-blocking send() to handle TX side and sends a data packet exactly of the size of the peer receive window. Peer acknowledges the data, reporting zero window.
  2. Now Zephyr is blocked in poll() waiting for a non-blocking send() to be possible. In order to achieve that, the window has to be open again. I
  3. f there's no proactive window update from the peer side, the connection gets stuck - TCP won't send ZWP as there's no data queued, and the application does not queue new data as this would break the non-blocking requirement.

Even if poll() is not used, AFAIR the blocking send() will not allow queueing new data unless TX window on our side allows it (and if peer reported zero window condition, our TX window size is also 0).

So unless I miss something, we'd need to allow to queue new data even if the TX window is full. This could lead however to other issues we've faced in the past with the buffer availability.

ssharks commented 1 year ago

Hmmm... did not realize the polling situation. Then you would need to accept the write despite the receive window being 0 after the persist timer has elapsed. This indeed complicates stuff quite a bit.

On the ACK side then it is only required to acknowledge an ACK when the sequence number is 1 behind, not all ACK with previous sequence numbers.

rlubos commented 1 year ago

@ssharks I think we should decide what to do with this ticket. I think that we can agree that while current solution is not 100% compliant with TCP specifications, it is simpler (specifically considering our TCP stack), serves its purpose (triggering ACK from the peer side), and is compatible with major stacks in the wild (Linux). I see two options here:

  1. Let's agree to disagree on the topic and leave our current ZWP mechanism as is (which would be my preferable option). Bringing @jukkar into the loop for an opinion as well.
  2. Insist on rewriting the ZWP logic to implement the model ZWP handling (i.e. include data byte in the probe). I would consider converting the issue into an enhancement then rather than a bug then, as it doesn't seem that the zero-window handling is currently broken (just achieved with different means).
ssharks commented 1 year ago

This is a difficult topic. I agree on the fact that it is working in a lot of cases. Without a major overhaul there is no better solution to it. Let's leave it as is in the meantime. We can spend our energy better at things like collision avoidance.

carlescufi commented 1 year ago

Closing this as per agreement between maintainer and reporter. Please reopen or open a new issue if you decide to rewrite/overhaul the logic in the future.