quinn-rs / quinn

Async-friendly QUIC implementation in Rust
Apache License 2.0
3.76k stars 380 forks source link

I saw no advantage to download speed compared to a simple python http tcp server #1712

Closed shiqifeng2000 closed 10 months ago

shiqifeng2000 commented 10 months ago

By using the example client/server, I am testing a bidirectional connection and transfer a video,

But using python simple http server speed is 100M/s

by using the example client, the download speed is 69M/s

May i have some explanation about the speed?

djc commented 10 months ago

Does your simple HTTP server setup include TLS? If not, it's not a fair comparison.

Ralith commented 10 months ago

You also shouldn't expect QUIC to improve performance for simple linear data transfers under ideal network conditions. QUIC provides improved multiplexing, security, and recovery behavior, but can't easily improve on what TCP is already great at.

shiqifeng2000 commented 10 months ago

You also shouldn't expect QUIC to improve performance for simple linear data transfers under ideal network conditions. QUIC provides improved multiplexing, security, and recovery behavior, but can't easily improve on what TCP is already great at.

I'm told the quic uses multiplexing for transfering, so if I download 2 or more, the performance maybe better?

Or is it because using fec redundant packeting here drag down the speed?

And another question is, do we need quic if we got a reliable network, which scenario is best for the quic

shiqifeng2000 commented 10 months ago

Does your simple HTTP server setup include TLS? If not, it's not a fair comparison.

by using rust/actix-web, https, self-signed cert

Screenshot from 2023-11-29 14-56-36

Ralith commented 10 months ago

I'm told the quic uses multiplexing for transfering, so if I download 2 or more, the performance maybe better?

No. Your link's capacity is fixed.

Or is it because using fec redundant packeting here drag down the speed?

FEC is not part of QUIC.

which scenario is best for the quic

Consider using QUIC when TCP's limitations are causing you specific problems.

jaketakula commented 10 months ago

hey @shiqifeng2000, all quick impls will in some degrees slower than tls1.3+tcp. pls take a look at this good article for details and how to speed it up. you can use tcp in local network and quic in network gateways.

https://www.fastly.com/blog/measuring-quic-vs-tcp-computational-efficiency

shiqifeng2000 commented 9 months ago

hey @shiqifeng2000, all quick impls will in some degrees slower than tls1.3+tcp. pls take a look at this good article for details and how to speed it up. you can use tcp in local network and quic in network gateways.

https://www.fastly.com/blog/measuring-quic-vs-tcp-computational-efficiency

thank you, i will take a good look

shiqifeng2000 commented 9 months ago

hey @shiqifeng2000, all quick impls will in some degrees slower than tls1.3+tcp. pls take a look at this good article for details and how to speed it up. you can use tcp in local network and quic in network gateways.

https://www.fastly.com/blog/measuring-quic-vs-tcp-computational-efficiency

hi @jaketakula

hey @shiqifeng2000, all quick impls will in some degrees slower than tls1.3+tcp. pls take a look at this good article for details and how to speed it up. you can use tcp in local network and quic in network gateways. https://www.fastly.com/blog/measuring-quic-vs-tcp-computational-efficiency

thank you, i will take a good look

image it seems the author has heavily optimized the transportation, they are

  1. ack delay
  2. gso
  3. package increase

regarding our code, is there anything I can do to follow the suggestion above? I tried to add some test code, but no effect, here's the code

**let mut server_config = quinn::ServerConfig::with_crypto(Arc::new(server_crypto));
let transport_config = Arc::get_mut(&mut server_config.transport).unwrap();
transport_config.max_concurrent_uni_streams(0u8.into());
let mut acks = quinn::AckFrequencyConfig::default();
acks.ack_eliciting_threshold(10u32.into());
transport_config.ack_frequency_config(Some(acks));
transport_config.enable_segmentation_offload(true);**
Ralith commented 9 months ago

What OS are you on? Did you adjust the ACK frequency config on the client as well? What's your CPU load? Have you tried messing with the MTU? Have you profiled?

shiqifeng2000 commented 9 months ago

What OS are you on? Did you adjust the ACK frequency config on the client as well? What's your CPU load? Have you tried messing with the MTU? Have you profiled?

ubuntu 22,

the code in testing is ack adjusting, I assume? let mut acks = quinn::AckFrequencyConfig::default(); acks.ack_eliciting_threshold(10u32.into());

CPU 12 core, it takes about 100-150%, cpu model is Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz, I found mtu mutiple fields so did not try. Yes, I have profiled.

The transporting reached 200M~250M, just for 1 bi-transportation, that's very close to the ack 10 part. The Gso part, I find only the enable_segmentation_offload api, and it claims

/// GSO dramatically reduces CPU consumption when sending large numbers of packets with the same
/// headers, such as when transmitting bulk data on a connection. However, it is not supported
/// by all network interface drivers or packet inspection tools. `quinn-udp` will attempt to
/// disable GSO automatically when unavailable, but this can lead to spurious packet loss at
/// startup, temporarily degrading performance.

by further digging, that part should be gso10

/// This can be lower than the maximum platform capabilities, to avoid excessive /// memory allocations when calling poll_transmit(). Benchmarks have shown /// that numbers around 10 are a good compromise. const MAX_TRANSMIT_SEGMENTS: usize = 10;

that should at least lift me to 348M, maybe this is close? but the client side do not support customizing transportation setting config, or say, directly. image

image

by changing the quinn code to enable client transport setting, client speed rises. image I found the receiving speed is apparently lower that sending speed.

by changing MAX_TRANSMIT_SEGMENTS to 20, no effecting, maybe there's other cap setting to limit speed in quinn.

image

and as for mtu, there's 3 field in transport config, initial_mtu: INITIAL_MTU, min_mtu: INITIAL_MTU, mtu_discovery_config: Some(MtuDiscoveryConfig::default()), by changing INITIAL_MTU to 1460, it seems a little bit rising, but the speed is unsable, so not sure image image

Really hope quinn could provide some 'official tutorial' for speeding up the transfering

Ralith commented 9 months ago

Yes, I have profiled.

What did the profiler tell you?

The transporting reached 200M~250M

I thought you said you were getting "69M"? It's not clear if you're talking about MiB/s, MB/s, Mb/s or something else. Please label your units completely and consistently.

The Gso part, I find only the enable_segmentation_offload api, and it claims

GSO is enabled by default when available. That API is only useful for turning it off when in a broken environment.

but the client side do not support customizing transportation setting config

Yes, it does. See ClientConfig::transport_config.

by further digging, that part should be gso10

MAX_TRANSMIT_SEGMENTS is not related to GSO. GSO is enabled by default.

and as for mtu, there's 3 field in transport config,

Increasing both initial_mtu and min_mtu is the best bet for testing. Many overheads in QUIC are per-packet, so this should give a reliable speedup for long-running bulk data transfers. Exercise caution raising min_mtu in production, for obvious reasons.

Really hope quinn could provide some 'official tutorial' for speeding up the transfering

Quinn's default settings are tuned for a good balance of performance and reliability on a 100Mbps internet link. If there was a simple button to push to make it faster, that would be the default.

shiqifeng2000 commented 9 months ago

my bad to confuse the unit. at first, I use the '69MB/s', that is byte, then the later unit are of 'Mbps' since I read the topic and follow his unit, thats bits. The 69MB/s means 480~560Mbps transportation, that's pretty fast according to the topic I read. So the speed is optimized by default?

by profiling, I mean I have tested and recorded the speed, suggest me if I need some tool to support the data

ClientConfig::transport_config

Well, there is, but need to replace with new ones every time, it will be nicer if just use the previous one for updating

GSO is enabled by default

that's great, but I wonder if there's any way to set GSO10 or GSO20?

Increasing both initial_mtu and min_mtu is the best bet for testing.

that's really helpful, I will try to use this for testing

The later testing in previous post is based on bi-transportation, both server and client try to send large data to each other, maybe that's the reason that drag down the speed, will test more

Update, by using this setting, I get speed 580Mbps

    transport_config.max_concurrent_uni_streams(0u8.into());
    let mut acks = quinn::AckFrequencyConfig::default();
    acks.ack_eliciting_threshold(10u32.into());
    transport_config.ack_frequency_config(Some(acks));
    transport_config.enable_segmentation_offload(true);
    transport_config.initial_mtu(1460);
    transport_config.min_mtu(1460);

image

so, that's the best I can do, I guess?