sipsorcery-org / sipsorcery

A WebRTC, SIP and VoIP library for C# and .NET. Designed for real-time communications apps.
https://sipsorcery-org.github.io/sipsorcery
Other
1.44k stars 438 forks source link

Data channels quickly slow down to a crawl under pressure #1088

Open lostmsu opened 6 months ago

lostmsu commented 6 months ago

Please see https://github.com/sipsorcery-org/sipsorcery/pull/1087 for the test project.

P.S. I think a unit test for transmitting a few GiB of data should be put in place.

lostmsu commented 6 months ago

OK, I had some midnight debugging, and I believe the main reason is incorrect reset of _congestionWindow in both places where it is set to _defaultMTU.

In the bit that refers to T3-rtx timer I believe this line above it is the cause: https://github.com/sipsorcery-org/sipsorcery/blob/54662858cafbde57ac3a5602509067c6f6dc5dc3/src/net/SCTP/SctpDataSender.cs#L573

Here T3-rtx timer is considered expired when RTO milliseconds passed since the last time the chunk was sent at. I believe this is wrong, because from my understanding of RFC T3-rtx timer is not the chunk property, but rather the destination property. I might be mistaken here, maybe LastSentAt is a clever trick that does exactly that, but it is not immediately clear from the code.

The second bit that enters retransmit mode I believe simply used the wrong formula from RFC. It should have used cwnd = ssthresh instead. E.g. the line should be _congestionWindow = _slowStartThreshold.

Additionally, I believe in CalculateCongestionWindow all comparisons to _congestionWindow must be replaced with "or equal" comparisons again from reading RFC9620. For example, (7.2.1)

MUST use the slow-start algorithm to increase cwnd only if the current congestion window is being fully utilized

I interpret this as outstandingBytes >= _congestionWindow while the code has outstandingBytes > _congestionWindow.

ris-work commented 6 months ago

@lostmsu, thank you for finding this out. I would like to have the patches applied to the one I am currently using. It says that the commits don't belong to a branch. Are the patches under the same license as SipSorcey?

I also noticed that on a not-really-bad server, with dd if=/dev/zero bs=1k piped to it, the throughput is about 768kB/s (kilobytes) and the CPU usage is... not bad, but ok. I have no idea how to monkeypatch C# projects locally so it might take some time. Rust gives me around 2-3 MB/s, but due to some issues, I had to move from webrtc-rs with TURN relay candidates with coturn to SipSorcery. I could not pinpoint the details, but node-wrtc works well under the same circumstances though it is unmaintained, at about 1.3 Mb/s (which is pretty okay for a language like JS.).

lostmsu commented 6 months ago

@ris-work same license

If you want to experiment, ignore the first commit, I edited it and force-pushed as the second commit. You would need to get their branch and cherry-pick the commits from it (the branch has many breaking changes of various stability).