private-octopus / picoquic

Minimal implementation of the QUIC protocol
MIT License
547 stars 161 forks source link

Having issues getting the right throughput #1518

Closed sureshpsu closed 6 months ago

sureshpsu commented 1 year ago

Hi, Greetings. I have a WiFi only setup, which gives 600Mps with iperf (udp) but with picoquic it gives only 20Mbps why is this? Please find the following command and output using picoquic. Please let me know if there is some thing that I am missing here. ( i also tried with picoquicdemo with "-a perf" as well, didn't work)

Command on the client# ./picoquicdemo -b 100mb -n test -G cubic -D 192.168.10.231 4433 "1:0:-:2000000000:0" Starting Picoquic (v1.1.8.1) connection to server = 192.168.10.231, port = 4433 Set ALPN to h3 based on stored ticket Set version to 0x00000001 based on stored ticket Files not saved to disk (-D, no_disk) Testing scenario: <1:0:-:2000000000:0> Max stream id bidir remote before start = 0 (0) Starting client connection. Version = 1, I-CID: d451a2cf428d597b Max stream id bidir remote after start = 2044 (512) Max stream id bidir remote after 0rtt = 2044 (512) Opening stream 0 to GET /2000000000 Waiting for packets. Client port (AF=2): 36084. The session was properly resumed! Zero RTT data is accepted! Negotiated ALPN: h3 Almost ready!

Connection established. Version = 1, I-CID: d451a2cf428d597b, verified: 1 Stream 0 ended after 2000000000 bytes All done, Closing the connection. Received a request to close the connection. The connection is closed! Out of 1 zero RTT packets, 1 were acked by the server. Quic Bit was greased by the client. Quic Bit was greased by the server. ECN was received (ect0: 1649061, ect1: 0, ce: 0). ECN was acknowledged (ect0: 156383, ect1: 0, ce: 0). Received 2000000052 bytes in 759.590272 seconds, 21.063988 Mbps. Sent 61 bytes in 759.590272 seconds, 0.000001 Mbps. max_data_local: 6003145728 max_stream_data_local: 6000001185 max_data_remote: 1048576 max_stream_data_remote: 0 ack_delay_remote: 1000 ... 2145 max_ack_gap_remote: 60 ack_delay_local: 1000 ... 25000 max_ack_gap_local: 40 max_mtu_sent: 1252 max_mtu_received: 1440 Received ticket from test (h3): ticket time = 1688855062920, kx = 17, suite = 1301, 118 ticket, 32 secret. lifetime = 100000, age_add = 1f325f82, 0 nonce, 97 ticket, 8 extensions. ticket extensions: 42(ED: ffffffff), Client exit with code = 0

also please find the following wifi card details

ethtool -k wlp1s0

Features for wlp1s0: rx-checksumming: off [fixed] tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on [fixed] tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: off tx-scatter-gather: off [fixed] tx-scatter-gather-fraglist: off [fixed] tcp-segmentation-offload: off tx-tcp-segmentation: off [fixed] tx-tcp-ecn-segmentation: off [fixed] tx-tcp-mangleid-segmentation: off [fixed] tx-tcp6-segmentation: off [fixed] generic-segmentation-offload: off [requested on] generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: off [fixed] tx-vlan-offload: off [fixed] ntuple-filters: off [fixed] receive-hashing: off [fixed] highdma: off [fixed] rx-vlan-filter: off [fixed] vlan-challenged: off [fixed] tx-lockless: off [fixed] netns-local: on [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-gre-csum-segmentation: off [fixed] tx-ipxip4-segmentation: off [fixed] tx-ipxip6-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-udp_tnl-csum-segmentation: off [fixed] tx-gso-partial: off [fixed] tx-sctp-segmentation: off [fixed] tx-esp-segmentation: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] hw-tc-offload: off [fixed] esp-hw-offload: off [fixed] esp-tx-csum-hw-offload: off [fixed] rx-udp_tunnel-port-offload: off [fixed]

Thanking you, Warm regards, Suresh

huitema commented 1 year ago

Hard to tell what is happening.

In that scenario, the performance usually depend on the server. What kind of server is it? What server parameters are you using?

The client is asking for 2GB. You are gathering logs on the client (option -b). Have you tried without logs? Are you gathering log on the server? Can you turn that off too?

Can you use performance tools and trace CPU load, memory consumption, disk activity?

Does your system support UDP GSO?

sureshpsu commented 1 year ago

Hi Huitema, I have turned off all the logs on client and server. Now 1) i get 300Mbps for WiFi but iperf gives 575-650Mbps. 2) I get 900Mbps for WiGig but iperf gives 2.4Gbps.

I did use top tool to record cpu usage - it was pretty low (12% CPU usage and 0.2 for memory)....

My server is a desktop machine (i7), which has 10G and 1G ethernet card connected to the Wireless AP(s), and does not support UDP GSO.....

Both client and server is a ubuntu-22.04/20 machines.

BUT iperf runs perfectly fine with the right throughput......for both UDP and TCP as well.

another issue is that if I introduce 10% loss then the throughput drops to 10Mpbps with picoquicdemo code BUT iperf gives much better throughput like 450Mbps etc.,.

We are working to publish a conference paper - if we can have a meeting or 1:1 call that will be great.. and also any help resolve this issue will help.

regards, suresh

huitema commented 1 year ago

On 7/11/2023 7:16 PM, sureshpsu wrote:

Hi Huitema, I have turned off all the logs on client and server. Now 1) i get 300Mbps for WiFi but iperf gives 575-650Mbps. 2) I get 900Mbps for WiGig but iperf gives 2.4Gbps.

I did use top tool to record cpu usage - it was pretty low (12% CPU usage and 0.2 for memory)....

The picoquic implementation is single threaded. What was the CPU usage of the core on which it was running?

My server is a desktop machine (i7), which has 10G and 1G ethernet card connected to the Wireless AP(s), and does not support UDP GSO.....

UDP GSO is key to performance. I am actually surprised that you obtain 900Mbps without it. The main performance bottleneck is the UDP socket API. Without that, the socket API accounts for 70 to 80% of the CPU usage.

Both client and server is a ubuntu-22.04/20 machines.

I think that ubuntu-22.04 does support UDP GSO. Can you change that configuration?

BUT iperf runs perfectly fine with the right throughput......for both UDP and TCP as well.

another issue is that if I introduce 10% loss then the throughput drops to 10Mpbps with picoquicdemo code BUT iperf gives much better throughput like 450Mbps etc.,.

That's a direct effect of congestion control. 10% loss is very large. Did you try using BBR? It would be much less sensitive to packet losses.

We are working to publish a conference paper - if we can have a meeting or 1:1 call that will be great.. and also any help resolve this issue will help.

Pretty busy this week. We could have a conference call next week.

-- Christian Huitema

sureshpsu commented 1 year ago

Hi Huitema, Greetings. Thanks for you prompt response - before we schedule a meeting, I would like to fix my setup. I had following questions as well 1) How many streams do you need to bring the performance comparable to iperf? 2) Which Linux server supports UDP GSO ? will SuSe or CentOS work or should we go for windows machine? 3) Also do you have any benchmark comparison for picoQUIC performance.

Thanking you, Warm regards, Suresh

huitema commented 1 year ago

On 7/12/2023 11:05 PM, sureshpsu wrote:

Hi Huitema, Greetings. Thanks for you prompt response - before we schedule a meeting, I would like to fix my setup. I had following questions as well 1) How many streams do you need to bring the performance comparable to iperf?

That's more complicated than creating streams. On the server side, you will want to assign connections to a set of working threads, and implement something like RSS to direct incoming packets to the right threads. That will require some work on scheduling, multithreading, etc.

On the client, you would want to support multiple connections. The easy way would be to split the load between multiple processes, e.g., starting several picoquicdemo processes each downloading 1GB, and assuming that the OS will place them on different cores.

2) Which Linux server supports UDP GSO ? will SuSe or CentOS work or should we go for windows machine?

I am not a great Linux specialist. I assumed that the recent Ubuntu versions did that.

3) Also do you have any benchmark comparison for picoQUIC performance.

There are many on the web, e.g., https://dial.uclouvain.be/memoire/ucl/object/thesis:37954

-- Christian Huitema

sureshpsu commented 1 year ago

Hi Huitema, Greetings. I am having issues with pico-quic and also few questions. Is it possible to setup a meeting this week to discuss about it. if so, please let me know a good time to talk,

Thanking you, Warm regards, Suresh

huitema commented 1 year ago

Sorry, still very busy. If you are using picoquicdemo, can you share the command line options that you are using on server and client?

huitema commented 1 year ago

UDP GSO was introduced in Linux kernel version 4.18. From https://askubuntu.com/questions/517136/list-of-ubuntu-versions-with-corresponding-linux-kernel-version:

Ubuntu version Code name Linux kernel version
23.10 Mantic Minotaur 6.5
23.04 Lunar Lobster 6.2
22.10 Kinetic Kudu 5.19
22.04 Jammy Jellyfish 5.15
21.10 Impish Indri 5.13
21.04 Hirsute Hippo 5.11
20.10 Groovy Gorilla 5.8
20.04 Focal Fossa 5.4
19.10 Eoan Ermine 5.3
19.04 Disco Dingo 5.0
18.10 Cosmic Cuttlefish 4.18

You may have to turn it on, and some drivers may not be able to support it. See: https://sandilands.info/sgordon/segmentation-offloading-with-wireshark-and-ethtool