Open rycwo opened 1 year ago
Thank you for reporting. Is the issue specific to the AWS instance? I can't reproduce on my side as the throughput looks as expected.
For instance, this is the behavior I get with the patched client over an emulated 20Mbit/s bottleneck with 20ms RTT:
Thanks for the fast response! Does this mean you would not expect the performance to degrade like I was showing?
I'm also not able to reproduce the issue locally.
But I carried out the same test on a remote Linode and I'm seeing the same slowdown (albeit with slightly different numbers, but a similar, steep logarithmic graph nonetheless).
I should mention that new connections always start with high throughput before dropping down, not sure if this gives any more insight into the problem?
Interesting, what kind of connection do you use on the browser side?
Wired Ethernet to my desktop, but it's reproducible over Wi-Fi. In both remote tests, the connection is Transatlantic (Linode/AWS in US East ↔ UK).
Wired Ethernet to my desktop, but it's reproducible over Wi-Fi. In both remote tests, the connection is Transatlantic (Linode/AWS in US East left_right_arrow UK).
OK, and what is the bandwidth? What is the typical ping time to US East?
It looks like I'm getting about 97 ms RTT:
14 packets transmitted, 14 received, 0% packet loss, time 13020ms
rtt min/avg/max/mdev = 96.697/97.093/97.549/0.218 ms
On the browser side the bandwidth is about 160 Mbps:
I'm unable to reproduce the behavior by emulating equivalent parameters. Is it possible that your ISP throttles unknown UDP flows to something like 1Mbit/s? It kind of looks like it and such practices sadly exist.
Would you be able to test from another Internet connection to help pinpoint what's causing the issue?
That's frustrating, I'll try to test with another internet connection located in the UK. Is there any way I can find out if there is any ISP throttling happening from my current connection?
In the meantime, running the same test between Linode and AWS seems to give relatively consistent/good throughput. I don't have any graphs to show since I'm running the tests on the command-line, but it's peaking at about 100 Mbits/s which is consistent with what I would expect.
I've managed to carry out the same test from a different connection in the UK (to the US East Linode):
And another connection connection in Canada (to the US East Linode):
It would be helpful if there was a way for me to confirm your suggestion that my ISP may be throttling UDP flow. Wouldn't this also affect other WebRTC applications (e.g. Google Meet, Zoom)? Or are media streams potentially handled differently?
With iperf3 -c <host> -uR -i 1 -b 100M -t 60
(Linode US East to UK) I am able to get a sustained 100 Mbps bitrate:
Not sure if this gives any more insight?
Adding more information from further tests carried out today.
Running the libdatachannel example client with changes to print the receiving bitrate gives the following:
Running a test aiortc client I wrote:
Both tests were carried out with between a US East server and my local UK machine.
The libdatachannel client demonstrates the slowdown, but the aiortc client behaves as I would hope/expect (the performance isn't outstanding, but the speed doesn't slow down to < 1 Mbps). This seems to suggest ISP throttling is not playing a part in the slowdown.
I noticed libdatachannel, Firefox, and ~Chrome~ (edit: Chrome might be using dcSCTP?) are all using usrsctp, while aiortc uses it's own "pure Python" SCTP implementation. Is it possible usrsctp is causing issues on the receiving end under certain conditions?
Update:
Enabling verbose logging on either of the sending or receiving libdatachannel clients seems to prevent the gradual slowdown issue (albeit at a slightly slower speed). I assume then the slowdown has something to do with sending too much at once, but I thought that's what bufferedAmountLowThreshold
/onBufferedAmountLow
is supposed to help users manage. :thinking:
Further update:
Playing around with data channel configuration seems to workaround the issue, but I'm not entirely sure why.
Ordered? | Reliable? | Sustained throughput? |
---|---|---|
:heavy_check_mark: | :heavy_check_mark: | :x: |
:x: | :heavy_check_mark: | :x: |
:heavy_check_mark: | :x: | :heavy_check_mark: |
:x: | :x: | :heavy_check_mark: |
As long as the data channel is not reliable, I am not seeing the slowdown.
Nice investigation, I think the iperf results with UDP constant bitrate rule out throttling, since if it was the case you would see consistently high packet loss after a while as everything over the allowed rate would be dropped.
Verbose logging preventing the issue might hint at the UDP buffer being too small to sustain traffic bursts. You could try increasing the maximum socket buffer sizes:
$ sudo sysctl -w net.core.wmem_max=1048576
$ sudo sysctl -w net.core.rmem_max=1048576
i have the same problem. i run the benchmark test node-datachannel benchmarking tool with the 0.18 version. When rtt is 50ms, the datachannel throughput is only 8.804520. When rtt is 100ms, throughput is 18.937009. may the new version has some bug?
Any help would be appreciated, thanks again!
Thank you for this great library!
I have been running some tests to see how far I can stretch WebRTC data channels. I've seen the benchmarks and read through a number of the other issues, which have been very insightful. In my testing I've run into a scenario that is giving unexpectedly poor performance. I'm struggling to understand why this is the case — I was hoping somebody could shed some light for me.
I've prepared a patch that modifies the example client slightly to demonstrate the issue:
periodic-send.patch
Essentially the changes simulate:
benchmark.cpp
)I've taken care to ensure I am not blocking
rtc::DataChannel::onOpen
(I am blockingrtc::DataChannel::onBufferedAmountLow
in the patch, but using a detached thread there to send doesn't seem to make a difference).Testing steps:
What appears to happen: within a few seconds from the first couple hundred or so messages, messages send progressively slower. Up to a point where it bottoms out at a really low throughput of < 1 Mbps!
Below are a couple of screenshots from
chrome://webrtc-internals/
while running the test.Stats graphs for transport:
Stats graphs for candidate-pair:
You can clearly see the logarithmic drop-off in the
bytesReceived_in_bits/s
.I am also able to recreate the same slowdown by connecting to the patched client with an unmodified example client.
Any idea why this might be happening? Is this related to flow control/congestion? If so, how come the first set of messages send nice and quickly?
At its slowest point,
bufferedAmountLow
seems to be called between every message, which seems to suggest the receiver isn't receiving the messages quickly enough? If so, given there's a bit of a break between the first and second sets of messages, shouldn't the receiving buffer clear out in time?Is there any way for me to debug this further? Maybe get a sense of libdatachannel/usrsctp buffer sizes and settings?
What can I do to fix/mitigate the slowdown?
Any help would be appreciated, thanks again!