packet loss on high RTT connections

private-octopus / picoquic

Minimal implementation of the QUIC protocol

MIT License

544 stars 161 forks source link

packet loss on high RTT connections #1728

Closed hfstco closed 1 month ago

hfstco commented 1 month ago

If I try to transmit data over a geostationary/high BDP connection packet number 3 on the server side is lost every time. Sometimes it is packet number 4. CUBIC immediately goes to CA, because it is a picoquic_congestion_notification_timeout.

In the .qlogs it is packet number 0 (1 RTT packet). In the code the lost_packet_number in ack_state is 3.

I set up a picoquicdemo server with a LetsEncrypt certificate ./picoquicdemo -k /path/to/certs/key.pem -c /path/to/certs/cert.pem -p 50020 -G cubic -q . -n h3 (loss also occurs with default certs ./picoquicdemo -k ./certs/key.pem -c ./certs/cert.pem -p 50020 -G cubic -q . -n h3)

and transmit 1 MB of data to the client ./picoquicdemo -D -G cubic -q . -n h3 quic.server.tld 50020 /1000000

I tried to increase the initial retransmit timer. https://github.com/private-octopus/picoquic/blob/df9d060f3b9de40de8b6458261f949de1dd29857/picoquic/picoquic_internal.h#L56 But even if I set it to 5 seconds, the loss occurs. (I will submit the results later.)

The problem is not present in the satellite_cubic test.

branch: picoquic master (fresh clone, just added a single fprintf() to print the loss to the terminal)

results: 1) netem (server <-> tc <-> client) (50/5 Mbit/s, 300 ms latency, 1x BDP Buffer) Attachment: netem.server.qlog 2) konnect (https://europe.konnect.com/en-AT) (50/5 Mbit/s, ~300 ms latency, ? BDP Buffer) Attachment: konnect.server.qlog 3) skydsl (https://www.skydsl.eu/en-DE/Personal/Internet-via-satellite) (50/5 Mbit/s, ~300 ms latency, ? BDP Buffer) Attachment: skydsl.server.qlog

results_letsencrypt_certs.zip results_default_certs.zip

huitema commented 1 month ago

The packet loss on high latency handshakes is kind of expected. It has not impacted scenarios so far because the client just repeats the initial packet several times, until it is finally acknowledged, and then the server may also repeat the first flight packets several times, until they are acknowledged. As long as the "idle timer" is larger than the RTT, it just works and automatically discovers the RTT.

It is hard to fix. Maybe using synchronized clocks and time stamps, the server could estimate a one-way delay. But that requires infrastructure, i.e., clocks synchronized to better that 0.1sec precision. And it also requires passing a time stamp in the client first flight, maybe using a TLS extension or a QUIC transport parameter. Sounds like research.

I said it did not cause side effect, but you are finding a side effect in Cubic. Maybe the simplest solution is to fix the Cubic implementation so it does not react to packet losses before the RTT has been computed.

huitema commented 1 month ago

The cubic fix is simple. The cubic code has a test when in "start up" mode, lines 247-260 in cubic.c. We need is to disable that test and not update the "rtt_filter" state if the handshake is not complete. The simple way to do that is to test the connection state:

        case picoquic_congestion_notification_repeat:
        case picoquic_congestion_notification_ecn_ec:
        case picoquic_congestion_notification_timeout:
            if (cnx->cnx_state < picoquic_state_client_almost_ready) {
                /* Do nothing, maybe add a log message */
            } else {
                /* For compatibility with Linux-TCP deployments, we implement a filter so
                 * Cubic will only back off after repeated losses, not just after a single loss.
                 */
                if ((notification == picoquic_congestion_notification_ecn_ec ||
                    picoquic_hystart_loss_test(&cubic_state->rtt_filter, notification, ack_state->lost_packet_number, PICOQUIC_SMOOTHED_LOSS_THRESHOLD)) &&
                    (current_time - cubic_state->start_of_epoch > path_x->smoothed_rtt ||
                        cubic_state->recovery_sequence <= picoquic_cc_get_ack_number(cnx, path_x))) {
                    path_x->is_ssthresh_initialized = 1;
                    picoquic_cubic_enter_recovery(cnx, path_x, notification, cubic_state, current_time);
                }
            }
            break;

But I cannot test that change now, and I will not be able to work on it for the next 3 days.

hfstco commented 1 month ago

The packet loss on high latency handshakes is kind of expected. It has not impacted scenarios so far because the client just repeats the initial packet several times, until it is finally acknowledged, and then the server may also repeat the first flight packets several times, until they are acknowledged. As long as the "idle timer" is larger than the RTT, it just works and automatically discovers the RTT.

So far, simply increasing the retransmit timer has worked. I don't know why it does not work anymore. I have to investigate.

It is hard to fix. Maybe using synchronized clocks and time stamps, the server could estimate a one-way delay. But that requires infrastructure, i.e., clocks synchronized to better that 0.1sec precision. And it also requires passing a time stamp in the client first flight, maybe using a TLS extension or a QUIC transport parameter. Sounds like research.

one-way delay could work. But yes, time synchronization is the key. We talked internally about deploying a high-precision time synchronization in our testbed. Perhaps we will be able to deliver results at some point. But there are currently 3 more papers on our TODO list. :/

I said it did not cause side effect, but you are finding a side effect in Cubic. Maybe the simplest solution is to fix the Cubic implementation so it does not react to packet losses before the RTT has been computed.

Yes, it is not ideal if CUBIC goes straight to the start in CA.

The cubic fix is simple. The cubic code has a test when in "start up" mode, lines 247-260 in cubic.c. We need is to disable that test and not update the "rtt_filter" state if the handshake is not complete. The simple way to do that is to test the connection state:

Thank you. 👍 I will give it a try.

hfstco commented 1 month ago

Short update because on vacation next week:

Tried your solution from above without success. Packet loss already happens in cnx->cnx_state == picoquic_state_read.

Screenshot 2024-08-09 at 18 52 43

Maybe I'll get an idea next week.

PS.: I'll have to think about your contribution to HyStart++. Time was short this week.

huitema commented 1 month ago

I am looking at the "netem" trace that exhibits your issue, early "timeout" loss followed by exit of hystart, and then a slow ramp up driven by the cubic equation:

There are two separate issues:

1) The packet number 0 is declared lost on timeout at T= 605.63. This seems wrong, if only because packets 0 and 1 will be acknowledged 10 ms later. The loss seems to be triggered by receiving the last crypto message from the client (transition to ready state), receiving the first 1RTT message from the client, and having the first initialization of the RTT.

2) The packet number 0 is acknowledged 10ms later, but this does not trigger the proper "spurious loss" reaction. Since the loss of that packet triggered a transition from slow start to congestion avoidance, the "spurious loss" should undo that transition, and reset the Cubic state to slow start and the CWIN to whatever the value was before the spurious loss.

Point of note: the main difference between the "netem" and the "ketem" scenario is the presence of 0-RTT packets in the "netem" test. The 0-RTT transfer fails, but that is still probably an important part of any repro.

huitema commented 1 month ago

The "skydsl" trace shows exactly the same pattern:

There are also 0-RTT packets in this trace, and they are also dropped.

huitema commented 1 month ago

The 'satellite' tests do not exhibit this pattern, including the test "h3zero_satellite", which in theory has the same traffic pattern. That's annoying, because it would be much easier to debug if we had a repro in the test environment.

After analysis, it seems that the failed tests only differ from the successful test by a detail of the RTT computation. In the successful tests, the first ack has the ack-delay set to 0 -- because it is a simulation, and virtual time did not progress between receiving the packet and sending the ack. In the failed test, the ack delay is set to "24", i.e., 0.192 ms. This small delay is subtracted from the "sample" value when setting the RTT. In theory that should not matter, but there was another bug, the retransmission delay was somehow set to exactly the RTT. That small difference is enough to explain why there is no packet loss in the simulation, and there is one in the "network" test.

I think that PR #1731 will fix the RTT computation issue, and thus avoid the spurious loss detection. Can you please review that PR and test the code in your environment?

huitema commented 1 month ago

Fixed in PR #1731