private-octopus / picoquic

Minimal implementation of the QUIC protocol
MIT License
527 stars 156 forks source link

Max retransmission timeout question [research] #1543

Closed ElNiak closed 7 months ago

ElNiak commented 10 months ago

Hello,

We are currently working on formal verification of temporal properties of the QUIC protocol transforming liveness temporal properties to safety properties.

To validate our methodology, we decided to focus on a simple QUIC feature: the idle timeout connection termination.

However we have one "problem" during our experiments about the maximum retransmission timeout connection termination that is triggered before the idle timeout. We identified this code to be the "problem": https://github.com/private-octopus/picoquic/blob/1e2979e8db0957c8ee798940091c4d0ef13bf8af/picoquic/sender.c#L1694

We didn't see in the RFC9000, RFC9002 nor MPQUIC draft, nor draft-bonaventure-iccrg-schedulers-01 a property indicating that the connection should be silently closed after 7 failed retransmissions.

We understand why it is present but since we focus on formal specification based on the RFC, we are interested to know why and from where this feature has been added.

A property that could fit that choice is the following one: "An endpoint MAY discard connection state if it does not have a validated path on which it can send packets; see Section 8.2"

Maybe it is due to leftover code from MPQUIC ? (we see some comments related to that extension)

Could you give us some information about this implementation choice?

Thank you!

ElNiak

huitema commented 10 months ago

On 8/31/2023 2:37 AM, ElNiak wrote:

Hello,

We are currently working on formal verification of temporal properties of the QUIC protocol transforming liveness temporal properties to safety properties.

To validate our methodology, we decided to focus on a simple QUIC feature: the idle timeout connection termination.

However we have one "problem" during our experiments about the maximum retransmission timeout connection termination that is triggered before the idle timeout. We identified this code to be the "problem": https://github.com/private-octopus/picoquic/blob/1e2979e8db0957c8ee798940091c4d0ef13bf8af/picoquic/sender.c#L1694 https://github.com/private-octopus/picoquic/blob/1e2979e8db0957c8ee798940091c4d0ef13bf8af/picoquic/sender.c#L1694

We didn't see in the RFC9000, RFC9002 nor MPQUIC draft, nor draft-bonaventure-iccrg-schedulers-01 a property indicating that the connection should be silently closed after 7 failed retransmissions.

We understand why it is present but since we focus on formal specification based on the RFC, we are interested to know why and from where this feature has been added.

Most transport implementations will terminate a connection if a packet cannot be acknowledged after some number of retransmissions. That maximum number vary between implementations, because it is a tradeoff between giving up too soon and breaking the connection while the next retry might have succeeded, and trying too long, which let the application stuck. 7 is actually on the long side of that tradeoff -- common value are three or four. The retransmission attempts will succeed 99% of the time if the packet loss rate is 50%, and succeed of course way more often if the packet loss rate is less than that.

Comparable set up are often found in TCP stacks, and in fact in several other transport protocols.

A property that could fit that choice is the following one: "An endpoint MAY discard connection state if it does not have a validated path on which it can send packets; see Section 8.2 https://www.rfc-editor.org/rfc/rfc9000#migrate-validate"

Maybe it is due to leftover code from MPQUIC ? (we see some comments related to that extension)

No, it is deliberate.

Could you give us some information about this implementation choice?

See above. I am surprised that this goes in the way of the "timeout" tests. The timeout process is not tied to retransmission. If you want to test timeouts, the endpoints should still acknowledge packets that they have received.

-- Christian Huitema

ElNiak commented 10 months ago

Thanks for the response, in the RFC it is said that an idle timeout connection is launched when: "If a max_idle_timeout is specified by either endpoint in its transport parameters ([Section 18.2] the connection is silently closed and its state is discarded when it remains idle for longer than the minimum of the max_idle_timeout value advertised by both endpoints."

Our test consists of starting the connection and then at some random moment in the connection, we stop responding and sending packets. For us, "idle connection" mean " an open connections that is not used".

But while doing the test, the retransmission timeout threshold triggered before the idle timeout.

huitema commented 10 months ago

Your test is simulating a broken connection, not an idle connection. When your test nodes enter "idleness", they should still continue obeying protocol rules: receiving data, sending acks, etc. Idleness refers to the application being inactive: for example, a web client that stops requesting new web pages; or, in QUIC terms, an end point that stops opening new streams or writing new data on streams.

Picoquic is fine. You have to fix your test setup.

huitema commented 7 months ago

In any case, this has been fixed in PR #1582, only disconnect on timeout, not retransmissions.