quinn-rs / quinn

Async-friendly QUIC implementation in Rust
Apache License 2.0
3.87k stars 396 forks source link

Android is broken on master (GSO is enabled) #2042

Closed stormshield-damiend closed 1 week ago

stormshield-damiend commented 1 week ago

The following commit https://github.com/quinn-rs/quinn/commit/adc4a0684105dfefa31356e531e6c02d7e1a5c53 (MacOs optimizations) introduced a side effect on android.

These two modifications lead to BATCH_SIZE being set to 32 on android platform, enabling GSO :

Tests are done on and Android 14, armv8, phone using Wifi with and MTU of 1500 :

We observed similar issue on an android 9 x86 emulator.

djc commented 1 week ago

cc @larseggert / @mxinden -- have you deployed quinn-udp 0.5.7 to Nightly already?

mxinden commented 1 week ago

Taking a look now.

cc @larseggert / @mxinden -- have you deployed quinn-udp 0.5.7 to Nightly already?

We have not yet. Also, we don't do GSO (yet).

mxinden commented 1 week ago

The following commit adc4a06 (MacOs optimizations) introduced a side effect on android.

Yes, you are right. Sorry for not catching this earlier.

On a more general note, ignoring the fact that this should not have happened in https://github.com/quinn-rs/quinn/commit/adc4a0684105dfefa31356e531e6c02d7e1a5c53, what is the reason to disable GSO on Android?

I am not familiar with quinn-proto, e.g. the above debug_assert.

Ralith commented 1 week ago

what is the reason to disable GSO on Android?

I think this was an accident. Probably it didn't occur to me that android didn't count as linux for target_os purposes when I wrote the original code.

We reach the debug_assert in PacketBuilder::new()

Is this fixed by #2046?

Ralith commented 1 week ago

https://github.com/quinn-rs/quinn/pull/2047 should get this stuff working as intended on non-Linux Unixes in general.

thomaseizinger commented 1 week ago

I just tested latest main is still broken for me. Specifically, I am getting:

I/O error (os error 5)

Which then in turn triggers the following on the first packet:

11-18 13:22:22.241 3732 3850 E connlib : quinn_udp::imp: got transmit error, halting segmentation offload

thomaseizinger commented 1 week ago

I/O error (os error 5)

So interestingly enough, this only happens when my phone is on mobile data. When I am on WiFI, GSO and GRO work just fine. Additionally, halting GSO doesn't seem to have any impact in that case.

I have a suspicion that Android doesn't like to be told to "do GSO" even if the segment length is just 1. So perhaps we need a special-case to check if segment_length == payload_length and not set any GSO related flags in that case.

thomaseizinger commented 1 week ago

I have a suspicion that Android doesn't like to be told to "do GSO" even if the segment length is just 1. So perhaps we need a special-case to check if segment_length == payload_length and not set any GSO related flags in that case.

So with https://github.com/quinn-rs/quinn/pull/2050 applied, I am seeing the following:

  1. The newly added log when setting the initial socket option does not show up (meaning setting the initial socket option works).
  2. Sending individual packets works again.
  3. Attempting to send packets using GSO still fails but due to the new condition of not setting segment size if == contents.len(), resetting max_gso_segments actually works (assuming the calling code respects that value.

Here is a tidied up log excerpt from our application:

11-18 16:41:30.733 V wire::net::send: num_packets=1 segment_size=240
11-18 16:41:30.832 V wire::net::recv: num_packets=1 segment_size=28 trailing_bytes=0
11-18 16:41:31.172 V wire::net::send: num_packets=1 segment_size=240
11-18 16:41:31.769 V wire::net::recv: num_packets=1 segment_size=28 trailing_bytes=0
11-18 16:41:41.363 V wire::net::send: num_packets=2 segment_size=244
11-18 16:41:41.364 E quinn_udp::imp: got transmit error, halting segmentation offload
11-18 16:41:41.364 D connlib_client_shared::eventloop: Tunnel error: I/O error (os error 5)
11-18 16:41:41.365 V wire::net::send: num_packets=1 segment_size=256
11-18 16:41:41.365 V wire::net::send: num_packets=1 segment_size=256
11-18 16:41:41.366 V wire::net::send: num_packets=1 segment_size=88
11-18 16:41:41.367 V wire::net::send: num_packets=1 segment_size=88
11-18 16:41:43.109 V wire::net::recv: num_packets=2 segment_size=1312 trailing_bytes=386

As you can see, we first send and receive some smaller packets. Then, we try to send one with GSO, that fails and we then continue without GSO.