Maintaining high throughput

rycwo commented 1 year ago

Thank you for this great library!

I have been running some tests to see how far I can stretch WebRTC data channels. I've seen the benchmarks and read through a number of the other issues, which have been very insightful. In my testing I've run into a scenario that is giving unexpectedly poor performance. I'm struggling to understand why this is the case — I was hoping somebody could shed some light for me.

I've prepared a patch that modifies the example client slightly to demonstrate the issue:

periodic-send.patch

Essentially the changes simulate:

Queuing a large buffer's worth of data (16 MB) into a "ring buffer" every 30 secs
Sending this large buffer in 32 KB messages whenever the data channel buffer is empty (as per benchmark.cpp)

I've taken care to ensure I am not blocking rtc::DataChannel::onOpen (I am blocking rtc::DataChannel::onBufferedAmountLow in the patch, but using a detached thread there to send doesn't seem to make a difference).

Testing steps:

Start an example signaling server on a remote AWS instance
Start the modified client on the same remote AWS instance
Start the example web client on my local machine

What appears to happen: within a few seconds from the first couple hundred or so messages, messages send progressively slower. Up to a point where it bottoms out at a really low throughput of < 1 Mbps!

Below are a couple of screenshots from chrome://webrtc-internals/ while running the test.

Stats graphs for transport:

Stats graphs for candidate-pair:

You can clearly see the logarithmic drop-off in the bytesReceived_in_bits/s.

I am also able to recreate the same slowdown by connecting to the patched client with an unmodified example client.

Any idea why this might be happening? Is this related to flow control/congestion? If so, how come the first set of messages send nice and quickly?

At its slowest point, bufferedAmountLow seems to be called between every message, which seems to suggest the receiver isn't receiving the messages quickly enough? If so, given there's a bit of a break between the first and second sets of messages, shouldn't the receiving buffer clear out in time?

Is there any way for me to debug this further? Maybe get a sense of libdatachannel/usrsctp buffer sizes and settings?

What can I do to fix/mitigate the slowdown?

Any help would be appreciated, thanks again!

paullouisageneau commented 1 year ago

Thank you for reporting. Is the issue specific to the AWS instance? I can't reproduce on my side as the throughput looks as expected.

For instance, this is the behavior I get with the patched client over an emulated 20Mbit/s bottleneck with 20ms RTT:

rycwo commented 1 year ago

Thanks for the fast response! Does this mean you would not expect the performance to degrade like I was showing?

I'm also not able to reproduce the issue locally.

But I carried out the same test on a remote Linode and I'm seeing the same slowdown (albeit with slightly different numbers, but a similar, steep logarithmic graph nonetheless).

I should mention that new connections always start with high throughput before dropping down, not sure if this gives any more insight into the problem?

paullouisageneau commented 1 year ago

Interesting, what kind of connection do you use on the browser side?

rycwo commented 1 year ago

Wired Ethernet to my desktop, but it's reproducible over Wi-Fi. In both remote tests, the connection is Transatlantic (Linode/AWS in US East ↔ UK).

paullouisageneau commented 1 year ago

Wired Ethernet to my desktop, but it's reproducible over Wi-Fi. In both remote tests, the connection is Transatlantic (Linode/AWS in US East left_right_arrow UK).

OK, and what is the bandwidth? What is the typical ping time to US East?

rycwo commented 1 year ago

It looks like I'm getting about 97 ms RTT:

14 packets transmitted, 14 received, 0% packet loss, time 13020ms
rtt min/avg/max/mdev = 96.697/97.093/97.549/0.218 ms

On the browser side the bandwidth is about 160 Mbps:

paullouisageneau commented 1 year ago

I'm unable to reproduce the behavior by emulating equivalent parameters. Is it possible that your ISP throttles unknown UDP flows to something like 1Mbit/s? It kind of looks like it and such practices sadly exist.

Would you be able to test from another Internet connection to help pinpoint what's causing the issue?

rycwo commented 1 year ago

That's frustrating, I'll try to test with another internet connection located in the UK. Is there any way I can find out if there is any ISP throttling happening from my current connection?

In the meantime, running the same test between Linode and AWS seems to give relatively consistent/good throughput. I don't have any graphs to show since I'm running the tests on the command-line, but it's peaking at about 100 Mbits/s which is consistent with what I would expect.

rycwo commented 1 year ago

I've managed to carry out the same test from a different connection in the UK (to the US East Linode):

And another connection connection in Canada (to the US East Linode):

image (1)

It would be helpful if there was a way for me to confirm your suggestion that my ISP may be throttling UDP flow. Wouldn't this also affect other WebRTC applications (e.g. Google Meet, Zoom)? Or are media streams potentially handled differently?

rycwo commented 1 year ago

With iperf3 -c <host> -uR -i 1 -b 100M -t 60 (Linode US East to UK) I am able to get a sustained 100 Mbps bitrate:

iperf3 output

``` [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-1.00 sec 12.0 MBytes 101 Mbits/sec 0.130 ms 254/8979 (2.8%) [ 5] 1.00-2.00 sec 11.9 MBytes 100 Mbits/sec 0.157 ms 0/8632 (0%) [ 5] 2.00-3.00 sec 11.9 MBytes 100 Mbits/sec 0.121 ms 2/8633 (0.023%) [ 5] 3.00-4.00 sec 11.9 MBytes 100 Mbits/sec 0.131 ms 3/8633 (0.035%) [ 5] 4.00-5.00 sec 11.9 MBytes 100 Mbits/sec 0.116 ms 0/8633 (0%) [ 5] 5.00-6.00 sec 11.9 MBytes 100 Mbits/sec 0.117 ms 1/8632 (0.012%) [ 5] 6.00-7.00 sec 11.9 MBytes 100 Mbits/sec 0.128 ms 3/8633 (0.035%) [ 5] 7.00-8.00 sec 11.9 MBytes 100 Mbits/sec 0.117 ms 3/8633 (0.035%) [ 5] 8.00-9.00 sec 11.9 MBytes 100 Mbits/sec 0.116 ms 1/8631 (0.012%) [ 5] 9.00-10.00 sec 11.9 MBytes 100 Mbits/sec 0.141 ms 0/8634 (0%) [ 5] 10.00-11.00 sec 11.9 MBytes 100 Mbits/sec 0.156 ms 1/8631 (0.012%) [ 5] 11.00-12.00 sec 11.9 MBytes 100 Mbits/sec 0.129 ms 1/8632 (0.012%) [ 5] 12.00-13.00 sec 11.9 MBytes 100 Mbits/sec 0.145 ms 0/8634 (0%) [ 5] 13.00-14.00 sec 11.9 MBytes 100 Mbits/sec 0.129 ms 3/8633 (0.035%) [ 5] 14.00-15.00 sec 11.9 MBytes 100 Mbits/sec 0.116 ms 1/8632 (0.012%) [ 5] 15.00-16.00 sec 11.9 MBytes 100 Mbits/sec 0.131 ms 1/8633 (0.012%) [ 5] 16.00-17.00 sec 11.9 MBytes 100 Mbits/sec 0.120 ms 1/8633 (0.012%) [ 5] 17.00-18.00 sec 11.9 MBytes 100 Mbits/sec 0.119 ms 2/8632 (0.023%) [ 5] 18.00-19.00 sec 11.9 MBytes 100 Mbits/sec 0.130 ms 3/8633 (0.035%) [ 5] 19.00-20.00 sec 11.9 MBytes 100 Mbits/sec 0.117 ms 0/8633 (0%) [ 5] 20.00-21.00 sec 11.9 MBytes 99.9 Mbits/sec 0.134 ms 1/8628 (0.012%) [ 5] 21.00-22.00 sec 11.9 MBytes 100 Mbits/sec 0.151 ms 4/8636 (0.046%) [ 5] 22.00-23.00 sec 11.9 MBytes 100 Mbits/sec 0.129 ms 1/8633 (0.012%) [ 5] 23.00-24.00 sec 11.9 MBytes 100 Mbits/sec 0.118 ms 0/8631 (0%) [ 5] 24.00-25.00 sec 11.9 MBytes 100 Mbits/sec 0.147 ms 5/8634 (0.058%) [ 5] 25.00-26.00 sec 11.9 MBytes 100 Mbits/sec 0.148 ms 2/8633 (0.023%) [ 5] 26.00-27.00 sec 11.9 MBytes 100 Mbits/sec 0.141 ms 0/8632 (0%) [ 5] 27.00-28.00 sec 11.9 MBytes 100 Mbits/sec 0.153 ms 2/8632 (0.023%) [ 5] 28.00-29.00 sec 11.9 MBytes 100 Mbits/sec 0.128 ms 1/8634 (0.012%) [ 5] 29.00-30.00 sec 11.9 MBytes 100 Mbits/sec 0.132 ms 2/8632 (0.023%) [ 5] 30.00-31.00 sec 11.9 MBytes 100 Mbits/sec 0.126 ms 0/8633 (0%) [ 5] 31.00-32.00 sec 11.9 MBytes 100 Mbits/sec 0.125 ms 3/8633 (0.035%) [ 5] 32.00-33.00 sec 11.9 MBytes 100 Mbits/sec 0.134 ms 3/8632 (0.035%) [ 5] 33.00-34.00 sec 11.9 MBytes 100 Mbits/sec 0.125 ms 0/8633 (0%) [ 5] 34.00-35.00 sec 11.9 MBytes 100 Mbits/sec 0.119 ms 3/8633 (0.035%) [ 5] 35.00-36.00 sec 11.9 MBytes 100 Mbits/sec 0.115 ms 1/8632 (0.012%) [ 5] 36.00-37.00 sec 11.9 MBytes 100 Mbits/sec 0.113 ms 0/8633 (0%) [ 5] 37.00-38.00 sec 11.9 MBytes 100 Mbits/sec 0.124 ms 2/8632 (0.023%) [ 5] 38.00-39.00 sec 11.9 MBytes 100 Mbits/sec 0.121 ms 2/8633 (0.023%) [ 5] 39.00-40.00 sec 11.9 MBytes 99.9 Mbits/sec 0.135 ms 4/8632 (0.046%) [ 5] 40.00-41.00 sec 11.9 MBytes 100 Mbits/sec 0.137 ms 0/8633 (0%) [ 5] 41.00-42.00 sec 11.9 MBytes 100 Mbits/sec 0.123 ms 2/8633 (0.023%) [ 5] 42.00-43.00 sec 11.9 MBytes 100 Mbits/sec 0.143 ms 1/8632 (0.012%) [ 5] 43.00-44.00 sec 11.9 MBytes 100 Mbits/sec 0.121 ms 0/8633 (0%) [ 5] 44.00-45.00 sec 11.9 MBytes 100 Mbits/sec 0.122 ms 1/8633 (0.012%) [ 5] 45.00-46.00 sec 11.9 MBytes 100 Mbits/sec 0.141 ms 0/8632 (0%) [ 5] 46.00-47.00 sec 11.9 MBytes 100 Mbits/sec 0.127 ms 1/8633 (0.012%) [ 5] 47.00-48.00 sec 11.9 MBytes 100 Mbits/sec 0.139 ms 0/8632 (0%) [ 5] 48.00-49.00 sec 11.9 MBytes 100 Mbits/sec 0.124 ms 0/8633 (0%) [ 5] 49.00-50.00 sec 11.9 MBytes 100 Mbits/sec 0.130 ms 2/8632 (0.023%) [ 5] 50.00-51.00 sec 11.9 MBytes 100 Mbits/sec 0.129 ms 0/8632 (0%) [ 5] 51.00-52.00 sec 11.9 MBytes 100 Mbits/sec 0.126 ms 0/8634 (0%) [ 5] 52.00-53.00 sec 11.9 MBytes 100 Mbits/sec 0.125 ms 0/8632 (0%) [ 5] 53.00-54.00 sec 11.9 MBytes 100 Mbits/sec 0.138 ms 0/8632 (0%) [ 5] 54.00-55.00 sec 11.9 MBytes 100 Mbits/sec 0.145 ms 2/8633 (0.023%) [ 5] 55.00-56.00 sec 11.9 MBytes 100 Mbits/sec 0.123 ms 3/8633 (0.035%) [ 5] 56.00-57.00 sec 11.9 MBytes 100 Mbits/sec 0.130 ms 0/8633 (0%) [ 5] 57.00-58.00 sec 11.9 MBytes 100 Mbits/sec 0.135 ms 1/8632 (0.012%) [ 5] 58.00-59.00 sec 11.9 MBytes 100 Mbits/sec 0.144 ms 0/8633 (0%) [ 5] 59.00-60.00 sec 11.9 MBytes 100 Mbits/sec 0.122 ms 3/8633 (0.035%) - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate Jitter Lost/Total Datagrams [ 5] 0.00-60.14 sec 717 MBytes 100 Mbits/sec 0.000 ms 0/519131 (0%) sender [ 5] 0.00-60.00 sec 715 MBytes 100 Mbits/sec 0.122 ms 332/518303 (0.064%) receiver iperf Done. ```

Not sure if this gives any more insight?

rycwo commented 1 year ago

Adding more information from further tests carried out today.

Running the libdatachannel example client with changes to print the receiving bitrate gives the following:

Output

``` DataChannel from tqCE open Bytes received: 0.26 Mbps Bytes received: 36.18 Mbps Bytes received: 22.81 Mbps Bytes received: 7.34 Mbps Bytes received: 4.98 Mbps Bytes received: 2.36 Mbps Bytes received: 2.62 Mbps Bytes received: 1.57 Mbps Bytes received: 1.84 Mbps Bytes received: 2.36 Mbps Bytes received: 2.10 Mbps Bytes received: 2.88 Mbps Bytes received: 3.93 Mbps Bytes received: 4.98 Mbps Bytes received: 3.67 Mbps Bytes received: 2.62 Mbps Bytes received: 1.31 Mbps Bytes received: 1.57 Mbps Bytes received: 2.36 Mbps Bytes received: 2.62 Mbps Bytes received: 2.62 Mbps Bytes received: 3.41 Mbps Bytes received: 2.88 Mbps Bytes received: 3.93 Mbps Bytes received: 4.46 Mbps Bytes received: 6.03 Mbps Bytes received: 6.55 Mbps Bytes received: 7.34 Mbps Bytes received: 2.62 Mbps Bytes received: 1.84 Mbps Bytes received: 1.84 Mbps Bytes received: 1.57 Mbps Bytes received: 1.84 Mbps Bytes received: 1.57 Mbps Bytes received: 1.57 Mbps Bytes received: 1.84 Mbps Bytes received: 2.62 Mbps Bytes received: 3.41 Mbps Bytes received: 4.72 Mbps Bytes received: 3.15 Mbps Bytes received: 2.36 Mbps Bytes received: 3.15 Mbps Bytes received: 2.36 Mbps Bytes received: 1.57 Mbps Bytes received: 1.05 Mbps Bytes received: 1.57 Mbps Bytes received: 1.84 Mbps Bytes received: 2.88 Mbps Bytes received: 2.62 Mbps Bytes received: 2.88 Mbps Bytes received: 3.41 Mbps Bytes received: 2.88 Mbps Bytes received: 2.10 Mbps Bytes received: 1.31 Mbps ```

Running a test aiortc client I wrote:

aiortc-example.py

Output

``` DataChannel from LRoS opened Bytes received: 13.11 Mbps Bytes received: 31.46 Mbps Bytes received: 35.65 Mbps Bytes received: 36.95 Mbps Bytes received: 17.04 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 5.49 Mbps Bytes received: 20.19 Mbps Bytes received: 20.70 Mbps Bytes received: 22.29 Mbps Bytes received: 24.64 Mbps Bytes received: 25.16 Mbps Bytes received: 15.72 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 5.23 Mbps Bytes received: 28.57 Mbps Bytes received: 29.10 Mbps Bytes received: 29.87 Mbps Bytes received: 31.18 Mbps Bytes received: 10.21 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps Bytes received: 0.00 Mbps ```

Both tests were carried out with between a US East server and my local UK machine.

The libdatachannel client demonstrates the slowdown, but the aiortc client behaves as I would hope/expect (the performance isn't outstanding, but the speed doesn't slow down to < 1 Mbps). This seems to suggest ISP throttling is not playing a part in the slowdown.

I noticed libdatachannel, Firefox, and ~Chrome~ (edit: Chrome might be using dcSCTP?) are all using usrsctp, while aiortc uses it's own "pure Python" SCTP implementation. Is it possible usrsctp is causing issues on the receiving end under certain conditions?

Update:

Enabling verbose logging on either of the sending or receiving libdatachannel clients seems to prevent the gradual slowdown issue (albeit at a slightly slower speed). I assume then the slowdown has something to do with sending too much at once, but I thought that's what bufferedAmountLowThreshold/onBufferedAmountLow is supposed to help users manage. :thinking:

rycwo commented 1 year ago

Further update:

Playing around with data channel configuration seems to workaround the issue, but I'm not entirely sure why.

Ordered?	Reliable?	Sustained throughput?
:heavy_check_mark:	:heavy_check_mark:	:x:
:x:	:heavy_check_mark:	:x:
:heavy_check_mark:	:x:	:heavy_check_mark:
:x:	:x:	:heavy_check_mark:

As long as the data channel is not reliable, I am not seeing the slowdown.

paullouisageneau commented 1 year ago

Nice investigation, I think the iperf results with UDP constant bitrate rule out throttling, since if it was the case you would see consistently high packet loss after a while as everything over the allowed rate would be dropped.

Verbose logging preventing the issue might hint at the UDP buffer being too small to sustain traffic bursts. You could try increasing the maximum socket buffer sizes:

$ sudo sysctl -w net.core.wmem_max=1048576
$ sudo sysctl -w net.core.rmem_max=1048576

WintonChan commented 1 year ago

i have the same problem. i run the benchmark test node-datachannel benchmarking tool with the 0.18 version. When rtt is 50ms, the datachannel throughput is only 8.804520. When rtt is 100ms, throughput is 18.937009. may the new version has some bug?

Any help would be appreciated, thanks again!

paullouisageneau / libdatachannel

Maintaining high throughput #832