tobiasfl / tobias-master-thesis-webrtc

0 stars 0 forks source link

Implementing ROSIEEE and FSE-NG #8

Closed tobiasfl closed 6 months ago

tobiasfl commented 2 years ago

Hi, Joakim's code was very helpful and I can register callbacks for setting the cwnd of the SCTP connections now. Next I will implement ROSIEEE, but I am a bit confused about on thing. Here is a picture of the algorithm from the paper so you don't have to find it: image I am able to get both the RTT and rate change values from the update call which will be made in send_side_bandwidth_estimation.cc but I don't think there is a way for me to get the number "i" of the last received RTCP packet. Would it work if "i" in my case is a counter that is incremented for each update call?

safiqul commented 2 years ago

I think it’s just a counter - they used an array to hold last i values to calculate these values.

You can do the same thing.. i think!

On 4 Oct 2021, at 14:10, tobiasfl @.**@.>> wrote:

Hi, Joakim's code was very helpful and I can register callbacks for setting the cwnd of the SCTP connections now. Next I will implement ROSIEEE, but I am a bit confused about on thing. Here is a picture of the algorithm from the paper so you don't have to find it: [image]https://user-images.githubusercontent.com/33148663/135848497-4c966433-410e-46a6-90a3-7c44d7ab45d7.png I am able to get both the RTT and rate change values from the update call which will be made in send_side_bandwidth_estimation.cchttp://send_side_bandwidth_estimation.cc but I don't think there is a way for me to get the number "i" of the last received RTCP packet. Would it work if "i" in my case is a counter that is incremented for each update call?

β€” You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/safiqul/tobias-master-thesis-webrtc/issues/8, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEHMKPLVJALLWEITBKINFHTUFGKRVANCNFSM5FJGQXEA.

tobiasfl commented 2 years ago

I've been having some trouble implementing ROSIEEE, the problem seems to be the way they smooth the max rate in the pseudocode in the paper. This is what the formula looks like: image The way I understand sigma notation is that if for instance i=6 and N=5 this should is calculated as: SRmax_i = (1/N) * (Rmax_1 + Rmax_2 + Rmax_3 + Rmax_4 + Rmax_5 + Rmax_6) However this leads to the smoothed rate growing a lot since they are using more values than N, which in the end leads to a crash because of integer overflow in my implementation of the algorithm. They way I understand N moving point moving average which this is supposed to be is that you typically do for instance: SRmax_i = (1/N) * (Rmax_2 + Rmax_3 + Rmax_4 + Rmax_5 + Rmax_6) Is it a typo in the research paper or am I not reading the notation right? Is it possible to find the code they used in the paper somewhere?

safiqul commented 2 years ago

ROSIEEE is probably full of bugs. we should only look at their recent versions.

check algorithm 1: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=936957

Is it possible to find the code they used in the paper somewhere?

I don't think so; but I can ask if you want. They implemented ROSSIEE in OMNET AFAIR.

tobiasfl commented 2 years ago

Did you mistakenly send a different paper? That one seems to be about something else.

safiqul commented 2 years ago

so sorry for the wrong paper, just don't read it :)

I accidentally removed the number 4 from the following URL

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9369574

tobiasfl commented 2 years ago

Can finally focus on this again now. Regarding the implementation of FSE-NG, the last paper you sent seem to customize the algorithm for it to with SCReAM since that is what they are testing in the paper, while the earlier paper tests with NADA, is still relevant for me to implement it and test with GCC then? If still relevant, should I do the NADA or SCReAM variation, according to the paper SCReAM is window- based so I assume NADA is more similar to GCC?

safiqul commented 2 years ago

Good observation; we should take the NADA version!

On 20 Oct 2021, at 13:17, tobiasfl @.**@.>> wrote:

Can finally focus on this again now. Regarding the implementation of FSE-NG, the last paper you sent seem to customize the algorithm for it to with SCReAM since that is what they are testing in the paper, while the earlier paper tests with NADA, is still relevant for me to implement it and test with GCC then? If still relevant, should I do the NADA or SCReAM variation, according to the paper SCReAM is window- based so I assume NADA is more similar to GCC?

β€” You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/safiqul/tobias-master-thesis-webrtc/issues/8#issuecomment-947567907, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEHMKPLUBYFWYE4PFYWY7ZLUH2QMFANCNFSM5FJGQXEA.

tobiasfl commented 2 years ago

Hi, have spent the day testing the FSE-NG implementation. It seems to work alright when testing with different combinations of priority and types of flows, e.g 1 or 2 sctp flows and 2 rtp flows starting at different times. The bandwidth is mostly utilized, it is split fairly among the RTP flows, though the SCTP flows get a bit less bandwidth because the FSE-NG works that way, it only sets the max_cwnd.(at least the SCTP flows have no way of killing the RTP flows this way)

However when I tried having a TCP flow running in the background(by using iperf), both the RTP and SCTP flows get killed as can be seen in this plot: image The plot is probably a bit confusing since the labelling is a bit weird, but you can at least clearly see after the 100000ms mark(when I started the background traffic) that all the flows gets less than 0,5mbit even though the link has a capacity of 10mbit! I think the reason for this is that the FSE-NG only uses update calls from GCC which are delay- based to set the max_cwnd of the SCTP. This means that neither has a chance of competing with the TCP flow, the RTP flows never had a chance probably, and the SCTP gets dragged down with them since the FSE-NG only allows it to have max_cwnd lower or equal to the rates given by GCC.

Do you have any other ideas for cases I can test with to further show that FSE-NG is not good enough, at least in it's current version?

mwelzl commented 2 years ago

This is really interesting, and your interpretation makes sense. However, you're guessing: "the RTP flows never had a chance probably, and the SCTP gets dragged down with them ...." - RTP flows should survive the fight against TCP, GCC is designed to ensure that (whether it works or not is a different question). So, the first thing to test would be: take the same scenario, but without FSE-NG (or any other form of FSE). Just default Chromium. Let's make sure that we're not "solving" GCC's problem.

safiqul commented 2 years ago

I also agree with Michael - we should check if this is really a gcc problem. I know that they have solved this by adjusting the threshold (gamma or something like that, see gcc draft) to solve this problem. I reported this bug to the gcc developers.

tobiasfl commented 2 years ago

I've now tried the same scenario without any FSE type running: image It seems both SCTP and GCC struggle to compete with TCP, SCTP(the orange line) gets a little bit of bandwidth though compared to when the FSE-NG was running. As a sanity check I also tried running a single RTP flow, a TCP(with iperf, hope that it is not doing anything extra) flow starting after a while after the RTP started and the interface limited to 10mbit and 50ms delay. I tried this both with normal chromium without FSE, as well an installed version of chromium and normal chrome just to be safe. In all cases the RTP flows, get absolutely killed as soon as I start iperf. chromium built from source code without any FSE enabled: image Normal chromium installation checking the rate in chrome://webrtc-internals: image Normal chrome installation(when I stopped iperf you can see the rate go up again): image I also tried running only a single SCTP flow with chromium built from source code without any FSE enabled(it goes up again when I stopped iperf): image I was afraid maybe iperf was not running a normal TCP connection and perhaps doing something more extreme so I also tried having a file transfer running in the background with netcat while trying a single RTP stream again. However it still gets killed when the file transfer begins: image

Am I doing something wrong/ testing with unrealistic conditions or is it really possible that both GCC and SCTP's congestion control struggle this much to compete against TCP? And if so, I don't really see how just coupling the congestion control's could fix this. :/

mwelzl commented 2 years ago

Well, in this case, we could do the opposite of ROSIEEE and give the SCTP behavior to all :-) but that's not a fix - it's bad design on top of bad design, leading to something that altogether works a tiny bit better. Like minus and minus gives plus :-D

We'll all meet in person tomorrow. Let's use this opportunity to chat about this problem.

safiqul commented 2 years ago

Which chromium version are you using? Let's talk tomorrow.

tobiasfl commented 2 years ago

Small update: I created this script to use hbf instead tbf:

IF=enp0s31f6
DELAY=50ms
BW=10mbit

#first delete any previous config
tc qdisc del dev $IF root

tc qdisc add dev $IF root handle 1: htb default 1

tc class add dev $IF parent 1: classid 1:1 htb rate $BW ceil $BW

tc qdisc add dev $IF parent 1:1 netem limit 1000 delay $DELAY loss 0.0%

tc filter add dev $IF protocol ip parent 1: prio 1 matchall flowid 1:1

However I still get the same results, both SCTP and GCC are struggling to compete with TCP, which seems sketchy given that SCTP should be able to compete with TCP as we discussed. Does anything in the script seem off to you? When I ping I do get the 50ms delay, and when I run iperf both with tcp and udp they both are limited to approx. 10mbit/s. Could queue limit have something to say?

Just out of curiosity I tried setting the packet loss at 2% instead of 0% and ran one RTP flow versus a TCP flow, with 10mbit/s bandwidth and 50ms delay. Then RTP got all the bandwidth it wanted and seemed unaffected: image However TCP unsurprisingly set a much lower rate of 2.20 Mbits/sec given that it experiences packet loss during the whole connection(normally it steals the whole 10mbit's).

mwelzl commented 2 years ago

Brrr. Well, nothing to do but to dig deeper:

The cwnd of SCTP is probably very hard to get, but if you just plot the throughput over time, using time intervals that are roughly in the order of an RTT, you should see a behaviour resembling the TCP sawtooth. Here's an example: figs 7 and 8 here: https://folk.universitetetioslo.no/michawe/research/publications/mic2002.pdf Promise not to laugh - a terrible pdf with no embedded fonts, and ... just ... bad. I was young! :) Anyway, these plots were produced by simply looking at throughput, with these scripts: https://folk.universitetetioslo.no/michawe/research/tools/bandwidth-monitor/index.html

Very very simple. Not a real cwnd plot, but you should see this kind of behaviour nevertheless. When alone, for sure, you should see it, also for SCTP - and when not alone... well, can we see what happens then? How does the plot change? This could give us a hint about what's going on here.

An idea: could it be that the SCTP rate is simply limited to a maximum??? I think I once heard this somewhere!

safiqul commented 2 years ago

Default queue size is 1000 packets; look at the scripts I sent you - try to set the queue size to a BDP.

safiqul commented 2 years ago

An idea: could it be that the SCTP rate is simply limited to a maximum??? I think I once heard this somewhere!

Good catch; I remember that they also limit the SCTP rate. I told you about this πŸ˜‚ - I encountered this problem before.

tobiasfl commented 2 years ago

Good catch; I remember that they also limit the SCTP rate. I told you about this joy - I encountered this problem before.

Ahh ok :laughing: Is it generally limited in Chromium or are they just limiting it for WebRTC?

safiqul commented 2 years ago

For WebRTC, I think.

mwelzl commented 2 years ago

@safiqul - ok, right, you told me: but I remembered! :-)

tobiasfl commented 2 years ago

I tried running one SCTP vs. TCP and plotting the actual SCTP cwnd by printing the value in the code, I have not found any place they actually limit the SCTP rate, but it does look like the maximum limit is what stops it from competing. As you can see the cwnd can't go any higher than 102156. The point where it goes to the top is when I started the TCP flow. Plot of SCTP cwnd: image However when converted to send rate it is very low since the RTT was ridiculously high(up to 2124ms when cwnd was at it's highest) Plot of SCTP cwnd converted to send rate: image I should probably check the TCP cwnd and rtt as well? I have the data in a tcpdump file, I'll have a look at it later.

mwelzl commented 2 years ago

In an earlier message, you wrote "Could queue limit have something to say?". YES. The queue limit (I don't know, is it the same as in TEACUP? Safiqul - you seemed to know how TEACUP does it, maybe you could share it here?) should be set to a value that is at most one bandwidth * delay product (BDP), or TCP (or SCTP) will start acting quite strangely. Note, in fact, here, "delay" is the RTT. When you have exactly one BDP of queuing, and the queue gets full, it will exactly double the RTT.

So, the ridiculously high RTT is simply something that we should never have. With your settings above: 10 mbit, 50ms delay, you should have (10 000 000 / 8) * 0.1 = 125000 bytes of queuing for a BDP, i.e. 83 packets, and when this queue is full, your maximum RTT should be around 200ms. I think picking something like 30 packets is a reasonable queue length for tests, then, I wouldn't even go as high as a BDP.

Also, regarding the cwnd above, a stupid question: is this in bytes or packets? (just to avoid that we're looking at CRAZILY high numbers...)

safiqul commented 2 years ago

In an earlier message, you wrote "Could queue limit have something to say?". YES. The queue limit (I don't know, is it the same as in TEACUP? Safiqul - you seemed to know how TEACUP does it, maybe you could share it here?)

@mwelzl I already sent him the teacup settings.

@tobiasfl check pfifo limit from below:

[ganon] run: tc class add dev ifb0 parent 1: classid 1:1 htb rate 200mbit ceil 200mbit [ganon] run: tc qdisc add dev ifb0 parent 1:1 handle 1001: pfifo limit 34 [ganon] run: tc filter add dev ifb0 protocol ip parent 1: handle 1 fw flowid 1:1 [ganon] run: tc class add dev 10Gb parent 1: classid 1:1 htb rate 1000mbit ceil 1000mbit [ganon] run: tc qdisc add dev 10Gb parent 1:1 handle 1001: netem limit 1000 delay 1ms loss 0.0% [ganon] run: tc filter add dev 10Gb protocol ip parent 1: handle 1 fw flowid 1:1 action mirred egress redirect dev ifb0 [ganon] run: tc class add dev ifb1 parent 1: classid 1:1 htb rate 200mbit ceil 200mbit [ganon] run: tc qdisc add dev ifb1 parent 1:1 handle 1001: pfifo limit 34 [ganon] run: tc filter add dev ifb1 protocol ip parent 1: handle 1 fw flowid 1:1 [ganon] run: tc class add dev 10Ga parent 1: classid 1:1 htb rate 1000mbit ceil 1000mbit [ganon] run: tc qdisc add dev 10Ga parent 1:1 handle 1001: netem limit 1000 delay 1ms loss 0.0% [ganon] run: tc filter add dev 10Ga protocol ip parent 1: handle 1 fw flowid 1:1 action mirred egress redirect dev ifb1 [ganon] run: iptables -t mangle -A POSTROUTING -s 172.16.10.0/24 -d 172.16.11.0/24 -j MARK --set-mark 1

Also, regarding the cwnd above, a stupid question: is this in bytes or packets? (just to avoid that we're looking at CRAZILY high numbers...)

+1, I also want to know.

tobiasfl commented 2 years ago

Also, regarding the cwnd above, a stupid question: is this in bytes or packets? (just to avoid that we're looking at CRAZILY high numbers...)

I am pretty sure it is in bytes, next to the struct definition in the code they have commented "actual cwnd" and for instance when they make sure the value is not below the mtu, they set it in this way: net->cwnd = net->mtu - sizeof(struct sctphdr); which implies it is bytes they are dealing with.

Anyways, I set the netem qdisc queue limit to 34 packets like in the teacup settings and like @mwelzl suggested above, now things are finally making more sense. (I did not use pfifo qdisc since I did not know how to still add the delay at the same time, hope that doesn't matter, it seems to be the same when I apply packet limit to netem) GCC finally competes against TCP, in the graph below I started the TCP flow after the 12:46:00 mark right before the the rate increase: image

SCTP also did better, however it does seem to get less than TCP, would tweaking the queue limit more fix this or does it look alright?(getting around 2mbit/s is pretty bad when just competing with one TCP flow on a 10mbit/s link :unamused: , however it is not using the full bandwidth before the TCP flow is started either :man_shrugging: ) Here is the SCTP cwnd(in bytes): image And the rate, converted with the RTT and cwnd: image

I will spend some time testing the FSE-NG again now that the tc configuration finally is more sensible. Any other things I should test first?

safiqul commented 2 years ago

It is worth checking if the SCTP flow can get 10? Just use one data channel for file transfer without any video before you start with FSE-NG.

tobiasfl commented 2 years ago

Hmm for some reason it does not get higher than around 4 even when running completely alone. Is the max limit specified somewhere or did you find it in the code? Rate on the left and cwnd on the right: image When running iperf with tcp alone it reports that it gets 9.68 Mbits/sec.

safiqul commented 2 years ago

Probably there is a max limit but I don't know where :( ... Because of the limit, I always asked you to test with capacity <=5

another test: set the bw to 3 and see what happens! It should be able to get approx 3.

tobiasfl commented 2 years ago

Alright, yeah I'll go back to 3mbps then, I was mainly trying with 10 because they had a case with that in the FSE-NG paper. Should queue limit still be around 30 with lower bandwidth?

safiqul commented 2 years ago

test with two queue limits: a BDP and half a BDP -> what's your rtt?

safiqul commented 2 years ago

I was mainly trying with 10 because they had a case with that in the FSE-NG paper.

I think they implemented FSE-NG in OMNET; so it was just simulation!

tobiasfl commented 2 years ago

I used rtt of 50ms with 10mbps, should that be the same with 3mbps?

safiqul commented 2 years ago

50ms is fine for testing! if the default packet size = 1500 Bytes, rtt = 50 ms and bw = 3 Mb -> queue limit in BDP is - 12 packets

tobiasfl commented 2 years ago

SCTP is still a bit weird with 3 mbps, when I use 12 as queue limit it only uses 1,5mbps and 1 mbps when competing with TCP: image However I tried with 3mbps and 24 as queue limit, then it uses approx 3mbps, but gets 0,5mbps or less when I start TCP: image

safiqul commented 2 years ago

Strange; could you please check the pcap file? are there any losses? Given the bw is 3 and the SCTP flow is only getting approx 1.5, I do not understand why the rate is fluctuating.

Try with Firefox too! See if it exhibits the same behavior.

tobiasfl commented 2 years ago

could you please check the pcap file? are there any losses?

The SCTP packets are encrypted so I can't see that I think? However it seems like chromium dumps the SCTP packets in the log file, so I can try to extract and see that later.

Tried with Firefox instead, then it gets a bit more, around 2 mbps.

safiqul commented 2 years ago

Can't you see the loss stats using chrome-internals?

Check the following tool:

https://github.com/nplab/WebRTC-Data-Channel-Playground

tobiasfl commented 2 years ago

Can't you see the loss stats using chrome-internals?

Does not seem like it.

But they explain how to convert the dumped packets in the log file to pcap in the link you sent, so I can try that.

tobiasfl commented 2 years ago

Like we spoke about last meeting I have ran a single SCTP flow with bandwidth=3mbit/s, delay=50ms and queuelimit=12. I managed to turn on some extra logging in the chromium sctp code so that it logs stuff like cwnd, rtt, ssthresh etc. automatically. I have also created a pcap file with the all the sctp packets. (if you'd like a look yourself, the log files plus pcap file is uploaded here(you have to download them to see them): https://github.com/safiqul/tobias-master-thesis-webrtc/tree/main/sctp_testing) Here is a plot of the cwnd if that is interesting: image

However I'm not exactly sure how I can use all this information to find what is wrong with my tc configuration. Last meeting you said that the peak of the cwnd should be BDP + queue, just so I'm clear on this does that mean the following?

peak cwnd should be == (bits_per_sec RTT_in_seconds) + (MSS 8 * queue_length)

If this is what you meant, the peak cwnd in my last test was only 23883 while the formula above says that it should be 29400..

I also noticed that dumped sctp packets are only 1230 bytes according to the pcap file, does that mean that MSS actually equals 1230 or have I misunderstood something?

Also if you were wondering, from the log file I can see that the reason the cwnd is lowered is in fact packet loss.

mwelzl commented 2 years ago

That plot looks a bit strange - I wonder about the strange movements on the top, e.g. at x=20000. The first thing I'd do is zoom in - look at a much shorter X range, to see if this looks "normal"... is it a linear increase?

About your equation, I don't get " 8 " in the "MSS8*queue length" part. The cwnd isn't in bytes... maybe that's the small mistake that makes the formula diverge from the top end of cwnd?

I also noticed that dumped sctp packets are only 1230 bytes according to the pcap file, does that mean that MSS actually equals 1230 or have I misunderstood something?

The total packet size is what matters for the queue. Strange that it's only 1230? Should be 1500 or so.... but who knows, maybe some special decision for WebRTC... anyway, MSS = packet size minus headers. So this could also account for a small mistake in your cwnd calculation above.

Also if you were wondering, from the log file I can see that the reason the cwnd is lowered is in fact packet loss.

Aha! That's good to know, that's how it should be.

tobiasfl commented 2 years ago

That plot looks a bit strange - I wonder about the strange movements on the top, e.g. at x=20000. The first thing I'd do is zoom in - look at a much shorter X range, to see if this looks "normal"... is it a linear increase?

I zoomed in, it looks linear to me, also the big spike near the end. image

The cwnd isn't in bytes... maybe that's the small mistake that makes the formula diverge from the top end of cwnd?

Yes, the "*8" was there because I think the cwnd is in bytes in the log, I'm pretty sure they have it stored in bytes in the source code so I think it is printed directly like that and not converted. However I tried recalculating, I had forgotten a 0 so the answer was actually 294000 so I am totally off anyway..

I calculated that with a MSS=1230 the queue size should actually be 15, it got a bit higher but still did not use the whole 3mbit bandwidth. I'm gonna try doing some different stuff in the tc script and see if it suddenly works, sick of spending time on this..

mwelzl commented 2 years ago

From a quick look, my first feeling is that there's something limiting cwnd (processing delay, some other program logic, ...) because how else could we get the peak after t=150000?

Now, let's calculate: 3 Mbit * 100 ms RTT (according to your script) = 25 packets. Add 12 packets more to this, then you have 37 packets. To express this in cwnd, you need to multiply it with the MSS. I tried with your 1230 from above (but is this really it?), and that gives me 45510, which is WAY above everything this plot shows. (Half that, if you really did use an RTT of 50ms, is 22755, which MIGHT be realistically fitting the large peak that we see after t=150000 ... it's a little larger, but it may also exceed the top by 1 or 2 packets... but netem is adding one-way delay, so I don't think you do use an RTT of 50ms?)

We need to understand what's going on or this all just won't make sense. Regarding this:

sick of spending time on this..

I'm sorry to say that it's a part of the experience... this is how research is, or any in-depth technical investigation. You need to pull out your magnifying glass and look very deep at the details, and maybe find that even the code that you're working with is actually broken. This yields more meaningful documentation than ... well, many papers and other documents out there.

tobiasfl commented 2 years ago

I'm sorry to say that it's a part of the experience... this is how research is, or any in-depth technical investigation.

Yeah, I guess it would be boring if it was too easy as well :smile: I'll keep at it.

I tried with your 1230 from above (but is this really it?)

tcpdump says the outgoing packets are 1225, (the SCTP dumped packets in pcap file are though 1230) is this a confirmation that it at least is not 1500 and is either 1225 or 1230? I can take a look in the code to see if I can see where they configure this stuff as well to be sure. If so I should probably increase queue limit to 15 packets though right? (3 000 000 0,050)/(1230 8)=15

but netem is adding one-way delay, so I don't think you do use an RTT of 50ms?)

Could you explain this a bit further? Yes netem is adding a one- way delay of 50ms would this not imply an RTT of at least 50ms? Here is a plot of rtt if you find that interesting, is it supposed to reach those peaks when the queue is full? image

So the current possibilites are:

my first feeling is that there's something limiting cwnd (processing delay, some other program logic, ...) because how else could we get the peak after t=150000?

I struggle a bit to understand how this is possible though when what seems to stop it from growing the cwnd further is packet loss i.e. that it sends faster than the netem queue is emptied? Also when trying with Firefox the same problems seem to happen which also points to it being my tc config. Nevertheless it is probably a good idea for me to dig more in the SCTP code, especially where they configure stuff.

tobiasfl commented 2 years ago

I checked out the Firefox SCTP source code, it seems to be based on the same original library so sadly I can't rule out there being an error in the code solely based on the fact that Firefox displays the same problems I have with Chromium..

safiqul commented 2 years ago

I checked out the Firefox SCTP source code, it seems to be based on the same original library so sadly I can't rule out there being an error in the code solely based on the fact that Firefox displays the same problems I have with Chromium..

It's not worth checking other browsers because they use the same webrtc code from google, and SCTP code was written by Michael Tuexen and his team.

safiqul commented 2 years ago

have you checked Netperfmeter from https://github.com/nplab/WebRTC-Data-Channel-Playground ?

safiqul commented 2 years ago

one more question: I know that you are just running one flow. Have you really checked that your flow is not competing with others? try to capture all the outgoing traffic from your interface.

mwelzl commented 2 years ago

@tobiasfl :

tcpdump says the outgoing packets are 1225, (the SCTP dumped packets in pcap file are though 1230) is this a confirmation that it at least is not 1500 and is either 1225 or 1230?

Ah yes, you said this before; yes, this is a confirmation of that.

but netem is adding one-way delay, so I don't think you do use an RTT of 50ms?) Could you explain this a bit further?

Oh YES: our "TEACUP" testbed configures the OWD both ways, so both Safiqul and I have become used to thinking: "RTT = 2 * configured OWD", but this is of course nonsense here! You're right, and I'm sorry!

tobiasfl commented 2 years ago

have you checked Netperfmeter from https://github.com/nplab/WebRTC-Data-Channel-Playground ?

Yes, I tried it earlier but it did not work for me, have you tried it recently? I suspect there might be something deprecated since the repo has not been updated in 2 years. However if we suspect the problem might be my test app I could copy the source code and try to change any deprecated stuff.

one more question: I know that you are just running one flow. Have you really checked that your flow is not competing with others? try to capture all the outgoing traffic from your interface.

Double checked that now, there are some occasional extra packets that I think are related to the websockets used for signalling. I doubt they have any serious impact, I should probably look a bit more into it though just to be safe.

tobiasfl commented 2 years ago

Is this at all relevant? https://stackoverflow.com/a/56330726 I did apply the optimization tips for message size and low/max threshold to my application, but it did not change anything.

mwelzl commented 2 years ago

Finally, some delayed answers:

Here is a plot of rtt if you find that interesting, is it supposed to reach those peaks when the queue is full?

No. I calculated that the BDP is 25 packets, but the queue only has 12. With a full BDP of queuing (i.e., 25 packets in this case), the RTT would double (i.e., up to approx 100ms). This goes even above 100, with a much smaller queue. How did you measure the RTT? BTW the plot nicely shows that the RTT indeed drops down to a bit over 50ms, which seems right.

my first feeling is that there's something limiting cwnd (processing delay, some other program logic, ...) because how else could we get the peak after t=150000? I struggle a bit to understand how this is possible though when what seems to stop it from growing the cwnd further is packet loss i.e. that it sends faster than the netem queue is emptied?

You're right - if that's really the case, it perhaps points at tc problems instead. But it's weird... a tc setting shouldn't ever produce such unpredictable behavior. Are you using virtual machines? Maybe that's the problem, some timing issue there? I suggest to run a very simple TCP transfer to see if the tc setting is right! No browser, no SCTP, nothing. TCP, and set it to Reno. This should give you a "normal" cwnd, and if SCTP from the browser is totally different, we know it's the browser. If the TCP cwnd plot doesn't look normal at all, then we know it's your setup.

I let Safiqul read the stackoverflow page :)