tobiasfl / tobias-master-thesis-webrtc

0 stars 0 forks source link

Start implementing fse-algorithm 1? #7

Closed tobiasfl closed 6 months ago

tobiasfl commented 3 years ago

I've gotten chromium installed on the other laptop, it works ok. I have experienced crashes there as well, but it seems that when I do a tiny change in the code and build it stops crashing for awhile when I test it afterwards. I have been printing out and plotting the new_bitrate and current_target inside SendSideBandwidthEstimation::UpdateTargetBitrate. When I did not set a constant rate and just plotted the current_target it looks like this which I think makes sense: image

I've also tried setting the current_target to a constant rate of 300kbps. With a constant rate the current_target remains at 300kbps since it is not set any other places in the code. However the new_bitrate being sent in to UpdateTargetBitrate is sometimes more than double the value of 300kbps, does that make sense? Here is plot(not very pretty): image

I also have a screenshot of the webrtc internals data in the browser taken while having set the current_target to a constant rate of 300kbps. It seems to fluctuate around the same "ballpark" as 300kbps but very loosely, do you think it is to different from the constant rate and I should look more into why or can we assume it is a normal mismatch between the target and what actually can be sent?(chromium is quite laggy when I'm running it, could that make it more "jittery"?): image

Should I continue with trying to implement fse-algorithm 1 and update the FSE with new_bitrate from inside SendSideBandwidthEstimation::UpdateTargetBitrate? Or do I have to find a different place in the congestion controller to do that?

safiqul commented 3 years ago

I've gotten chromium installed on the other laptop, it works ok. I have experienced crashes there as well, but it seems that when I do a tiny change in the code and build it stops crashing for awhile when I test it afterwards. I have been printing out and plotting the new_bitrate and current_target inside SendSideBandwidthEstimation::UpdateTargetBitrate. When I did not set a constant rate and just plotted the current_target it looks like this which I think makes sense: image

Is the min rate 350k? or 300? I am just curious.. maybe setting below the minimum will force the CC to set the rate back to 350? What is the rate change between the first and second?

I've also tried setting the current_target to a constant rate of 300kbps. With a constant rate the current_target remains at 300kbps since it is not set any other places in the code. However the new_bitrate being sent in to UpdateTargetBitrate is sometimes more than double the value of 300kbps, does that make sense? Here is plot(not very pretty):

image

This is initial ramp-up; I think both NADA and GCC do this in order to get up to a higher rate when there is no delay. To know for sure, just check how gcc calculates a new rate - hint: just look at the rate update equation.

I also have a screenshot of the webrtc internals data in the browser taken while having set the current_target to a constant rate of 300kbps. It seems to fluctuate around the same "ballpark" as 300kbps but very loosely, do you think it is to different from the constant rate and I should look more into why or can we assume it is a normal mismatch between the target and what actually can be sent?(chromium is quite laggy when I'm running it, could that make it more "jittery"?): image

Looks good; staying close to 300kps..

Should I continue with trying to implement fse-algorithm 1 and update the FSE with new_bitrate from inside SendSideBandwidthEstimation::UpdateTargetBitrate? Or do I have to find a different place in the congestion controller to do that?

Yes, continue with FSE algorithm 1. Good job! It's becoming interesting now :)..

tobiasfl commented 3 years ago

Update: I spent the last couple of days basically making a new test app based on the old one + examples online. It did take longer than expected but there were some issues related to the old one using deprecated stuff and that my implementation of screen sharing I did before summer did not work properly. I figured if we are going to for instance add screen sharing over the data channel etc. like discussed before summer it would be easier if everything was up to date now. It was also quite fun so I didn't mind.

Ran chromium with the current implementation of the FSE and the new test app to see what happens. The results are probably not that interesting since I gave both flows the same priority, and their desired rate each UPDATE call was set to the same as their CC_R which means they just end up with the same FSE_R as each of them sent in. I also did not limit the bandwidth on my laptop yet. The flow that started late in both pictures are the screen sharing flows. image image

So now I should probably limit the bandwidth on my pc and experiment a bit with the FSE implementation. However there are some considerations I wonder a bit about:

safiqul commented 3 years ago

I spent the last couple of days basically making a new test app based on the old one + examples online. It did take longer than expected but there were some issues related to the old one using deprecated stuff and that my implementation of screen sharing I did before summer did not work properly. I figured if we are going to for instance add screen sharing over the data channel etc. like discussed before summer it would be easier if everything was up to date now. It was also quite fun so I didn't mind.

Does it mean that the current test app uses both screensharing and video as two separate. video streams? If you think it's too complex.. we can think about it.. But you know the reason why we wanted to do this.

Ran chromium with the current implementation of the FSE and the new test app to see what happens. The results are probably not that interesting since I gave both flows the same priority, and their desired rate each UPDATE call was set to the same as their CC_R which means they just end up with the same FSE_R as each of them sent in. I also did not limit the bandwidth on my laptop yet. The flow that started late in both pictures are the screen sharing flows.

I do not get it.. explain what the red and orange lines are? We should probably discuss this in person!

Should the screen sharing flow have a video or something playing? In both instances above it was simply sending a stream of a very static app like spotify or the code editor, which probably means that flow didn't need a higher rate than it got in the plots above.

good question; a video should be used here to get a higher rate. If you are using screen sharing over a data channel, I would recommend to use a large file transfer for the testing. I think we should use screen sharing for either a demo or advanced testing.

I'm a bit confused about what role the desired rate plays here, the way I understand it from the coupled congestion control rfc it is basically meant for situations where the congestion controller gives a higher rate than the application/flow(?) actually needs and consequently any extra bandwidth the controller calculates should rather be shared with other flows? Is there a way for us to know the actually needed bitrate or will always desired_rate >= CC_R when it comes to GCC?

yes, you are right. For full HD quality, a video flow won't need for more than 6/7 Mbps (for example). If you do not consider desired rate here, you would probably slow down the other flow if you just simply divide the aggregate by n (numbere of flows),

I assume that each GCC-controller is running in a separate thread, since the FSE is shared between them I think it could lead to race conditions, do I have to implement mutual exclusion in the FSE code before testing or is the chance of it happening so low that it can currently be ignored?

It's better to handle this. If you are at the IFI tomorrow, feel free to drop by my office if you want to have a serious discussion?

tobiasfl commented 3 years ago

Does it mean that the current test app uses both screensharing and video as two separate. video streams? If you think it's too complex.. we can think about it.. But you know the reason why we wanted to do this.

Sorry, I was a bit unclear here, yes currently I have implemented it with two separate RTP video streams 😀

For some of other stuff it's probably best if I drop by your office tomorrow if that fits. 👍

safiqul commented 3 years ago

Anytime, before 14:00 :). See you tomorrow.

tobiasfl commented 3 years ago

Went by your office after lunch but you were not there so posting here instead, just a small question. I'm trying to connect my other computer to the webrtc test app through the local network but it seems to not work. I think the network is stopping me from doing this somehow, is there a way to circumvent it or can I for instance limit the bandwidth on the loopback and just run everything on the same computer for primitive testing today at least?

safiqul commented 3 years ago

It should work; did you use wireshark to check your packets?

On 6 Sep 2021, at 13:20, tobiasfl @.**@.>> wrote:

Went by your office after lunch but you were not there so posting here instead, just a small question. I'm trying to connect my other computer to the webrtc test app through the local network but it seems to not work. I think the network is stopping me from doing this somehow, is there a way to circumvent it or can I for instance limit the bandwidth on the loopback and just run everything on the same computer for primitive testing today at least?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/safiqul/tobias-master-thesis-webrtc/issues/7#issuecomment-913569200, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEHMKPMNLCMQ7DE22AREXOTUASPWDANCNFSM5C5BA2GQ.

tobiasfl commented 3 years ago

I tried to use wireshark to check the packets now, I have barely used it before though so not sure I'm using correctly but it seems none of the TCP packets arrive at the computer hosting the server. I tried to ping it from the other computer though and that worked...

Update: I found a hack to do it, so using that now.

safiqul commented 3 years ago

Will reply you later :(.. running now!

Sent from my iPhone

On 6 Sep 2021, at 14:41, tobiasfl @.***> wrote:



I tried to use wireshark to check the packets now, I have barely used it before though so not sure I'm using correctly but it seems none of the TCP packets arrive at the computer hosting the server. I tried to ping it from the other computer though and that worked...

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/safiqul/tobias-master-thesis-webrtc/issues/7#issuecomment-913618778, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEHMKPIE2IT2J7D6FDHZB73UASZGNANCNFSM5C5BA2GQ.

safiqul commented 3 years ago

Can you ping the server?

Sent from my iPhone

On 6 Sep 2021, at 15:16, Safiqul Islam @.***> wrote:

 Will reply you later :(.. running now!

Sent from my iPhone

On 6 Sep 2021, at 14:41, tobiasfl @.***> wrote:



I tried to use wireshark to check the packets now, I have barely used it before though so not sure I'm using correctly but it seems none of the TCP packets arrive at the computer hosting the server. I tried to ping it from the other computer though and that worked...

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/safiqul/tobias-master-thesis-webrtc/issues/7#issuecomment-913618778, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEHMKPIE2IT2J7D6FDHZB73UASZGNANCNFSM5C5BA2GQ.

safiqul commented 3 years ago

where did you try it? IFI usually blocks all ports except 80 and 443!

tobiasfl commented 3 years ago

I tried it at ifi on the 9th floor, that must be the reason then. Thank you! Sadly I'm working tomorrow but hopefully it'll work when I try it on wednesday. 👍

safiqul commented 3 years ago

no worries; have a nice evening!

On 6 Sep 2021, at 20:23, tobiasfl @.**@.>> wrote:

I tried it at ifi on the 9th floor, that must be the reason then. Thank you! Sadly I'm working tomorrow but hopefully it'll work when I try it on wednesday. 👍

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/safiqul/tobias-master-thesis-webrtc/issues/7#issuecomment-913802264, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEHMKPJI2TSBPYAJ7R6KGZTUAUBKZANCNFSM5C5BA2GQ.

tobiasfl commented 3 years ago

Hi, have a little question regarding how the FSE updates GCC flows. Should the FSE do the "clamping" that is done in the GCC code at the moment. They seem to make sure that the new bitrate being set is at least bigger than a minimum configured bitrate and at least smaller than an upper limit. The upper limit is the minimum of three different values: a receiver limit signaled by the receiver through REMB RTCP messages, the delay based estimate and lastly a configured max bit rate.

I certainly think we should never give anyone a higher rate than signaled by the receiver. But they way it is now, we would simply throw away the excess bandwidth since that limit is applied in the update call. Would it make sense to send in the upper limit value as the desired rate? Then the FSE would take it into account earlier and be able to share extra bandwidth with other flows instead? Tell me if the question is unclear hehe.

safiqul commented 3 years ago

I wonder what is the configured max bit rate here - is it max rate that a connection can achieve? You are right. In practice, one should not give a higher rate than a receiver should expect. It’s probably doing something strange here, assuming that it could just jump to a higher rate because we are not capacity-limited. Setting the upper limit value to the desired rate is good if you know that it is max-rate?

Sent from my iPhone

On 13 Sep 2021, at 09:58, tobiasfl @.***> wrote:



Hi, have a little question regarding how the FSE updates GCC flows. Should the FSE do the "clamping" that is done in the GCC code at the moment. They seem to make sure that the new bitrate being set is at least bigger than a minimum configured bitrate and at least smaller than an upper limit. The upper limit is the minimum of three different values: a receiver limit signaled by the receiver through REMB RTCP messages, the delay based estimate and lastly a configured max bit rate.

I certainly think we should never give anyone a higher rate than signaled by the receiver. But they way it is now, we would simply throw away the excess bandwidth since that limit is applied in the update call. Would it make sense to send in the upper limit value as the desired rate? Then the FSE would take it into account earlier and be able to share extra bandwidth with other flows instead? Tell me if the question is unclear hehe.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/safiqul/tobias-master-thesis-webrtc/issues/7#issuecomment-917937115, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEHMKPNX3XAPN5EQBXO6KNTUBWVKRANCNFSM5C5BA2GQ.

tobiasfl commented 3 years ago

Update: Found a bug in my FSE code which lead to the second stream sometimes being stuck in a while loop, having fixed it things are more stable. I have run tests for case 1, (both flows having priority 1 and DR being 1gb) here is one of the plots: image Both flows are certainly getting the same FSE rate and are fluctuating around 1,5gbps which is good. However I have noticed a pattern at the point where flow 2 starts. It seems that flow 1 throttles its rate significantly as soon as flow 2 starts. Is this to be expected or should they really both get 1,5 gbps almost immediately if we assume flow 1 has gotten to 3gbps already? It is more apparent in this plot where I waited longer before starting flow 2: image I also think it's a bit strange that the single flow was allowed to get all the way to 4mbps when the interface is limited to 3gbps, but that I assume is only GCC behaviour and nothing to do with the FSE since it's only a single flow at that point.

The traffic control settings for the interface was rate 3mbit, burst 50kb and latency 20ms. I know the screenshots of the plots are not very easy to read, I have pictures of the plots in this repo in the test_data folder. The pcap files are tcpdumps, the txt files are the data used for creating the plots in the form of: \<flownum>-\<CC or FSE rate> \<timestamp> \<current rate>. Do I have to look more into why it happens or can I continue testing case 2(one flow having double the priority of the other).

safiqul commented 3 years ago

Are you at the ifi? just drop by if you are.

else, see inline:

On 17 Sep 2021, at 11:38, tobiasfl @.**@.>> wrote:

Update: Found a bug in my FSE code which lead to the second stream sometimes being stuck in a while loop, having fixed it things are more stable. I have run tests for case 1, (both flows having priority 1 and DR being 1gb) here is one of the plots: [image]https://user-images.githubusercontent.com/33148663/133759116-1b9e89cd-21b0-4def-80b6-6c749eb9069c.png Both flows are certainly getting the same FSE rate and are fluctuating around 1,5gbps which is good. However I have noticed a pattern at the point where flow 2 starts. It seems that flow 1 throttles its rate significantly as soon as flow 2 starts. Is this to be expected or should they really both get 1,5 gbps almost immediately if we assume flow 1 has gotten to 3gbps already? It is more apparent in this plot where I waited longer before starting flow 2:

Looks okay to me, 3 Mbps, :)

[image]https://user-images.githubusercontent.com/33148663/133759956-4e0255aa-e99b-4ac2-b877-07e1d1e3a051.png I also think it's a bit strange that the single flow was allowed to get all the way to 4mbps when the interface is limited to 3gbps, but that I assume is only GCC behaviour and nothing to do with the FSE since it's only a single flow at that point.

The traffic control settings for the interface was rate 3mbit, burst 50kb and latency 20ms. I know the screenshots of the plots are not very easy to read, I have pictures of the plots in this repo in the test_data folder. The pcap files are tcpdumps, the txt files are the data used for creating the plots in the form of:

- . Do I have to look more into why it happens or can I continue testing case 2(one flow having double the priority of the other). If I remember, case 2 was about making one flow limited to DR=1, right? should the other one not get more than DR-limited flow? Case 3: try with two different priorities: e.g, 1 and 2. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe.
tobiasfl commented 3 years ago

Yeah, you're right case 2 was one flow limited to DR=1

tobiasfl commented 2 years ago

Found a problem in the algorithm when only a single flow is running for some time before the second one starts and I am wondering how it is supposed to be handled. Assume that a only single flow f is running and it's desired rate(DR) is 1024kbit which means it's previous FSE_R is 1024kbit and sum_of_calculated_rates(S_CR) is e.g. 1024kbit. Also assume the CC_R is at a level higher than 1024kbit, let's say 1600kbit in this example . Then according to step 3a of the rfc8699 algorithm S_CR should get a new value in the following way:

S_CR = S_CR + CC_R(f) - FSE_R(f)

In this example this leads to S_CR being the following:

S_CR = 1024kbit + 1600kbit - 1024kbit = 1600kbit.

Consequently from the rest of the algorithm the new FSE_R(f) will be 1024kbit even though S_CR is 1600kbit because DR is only 1024kbit. If we assume CC_R and DR stays the same for the next update call the problem will be visible. When S_CR which currently is 1600kbit gets recalculated it will grow, even though CC_R is the same value as earlier!

S_CR = 1600kbit + 1600kbit - 1024kbit = 2176kbit

When I Iet the first flow run for some time (e.g.) 10 seconds, S_CR will be much larger than the actual bandwidth. In case 2 where flow 2 has DR of infinity, all that extra bandwidth is then handed to flow 2 and the graph will look like this: image

In my code I've now fixed this by handling the Update call a bit differently when only one flow is running. Is this a mistake in the rfc or have I misunderstood something about the algorithm/rfc?

No stress if the explanation is a bit too long- winded and unclear it can also wait until Wednesday and we can discuss it in person.

:)

safiqul commented 2 years ago

Ah, right.. we do not recommend to turn on FSE when there is only one flow running: even the test cases for fse were designed for multiple flows. It doesn’t make sense to just use it for one flow.

I think you do not want to let S_CR just grow for one flow when the flow is application-limited. But, I am glad that you fixed it for one flow case - so you won’t have to turn it on in the middle.

Sent from my iPhone

On 20 Sep 2021, at 17:35, tobiasfl @.***> wrote:



Found a problem in the algorithm when only a single flow is running for some time before the second one starts and I am wondering how it is supposed to be handled. Assume that a only single flow f is running and it's desired rate(DR) is 1024kbit which means it's previous FSE_R is 1024kbit and sum_of_calculated_rates(S_CR) is e.g. 1024kbit. Also assume the CC_R is at a level higher than 1024kbit, let's say 1600kbit in this example . Then according to step 3a of the rfc8699 algorithm S_CR should get a new value in the following way:

S_CR = S_CR + CC_R(f) - FSE_R(f)

In this example this leads to S_CR being the following:

S_CR = 1024kbit + 1600kbit - 1024kbit = 1600kbit.

Consequently from the rest of the algorithm the new FSE_R(f) will be 1024kbit even though S_CR is 1600kbit because DR is only 1024kbit. If we assume CC_R and DR stays the same for the next update call the problem will be visible. When S_CR which currently is 1600kbit gets recalculated it will grow, even though CC_R is the same value as earlier!

S_CR = 1600kbit + 1600kbit - 1024kbit = 2176kbit

When I Iet the first flow run for some time (e.g.) 10 seconds, S_CR will be much larger than the actual bandwidth. In case 2 where flow 2 has DR of infinity, all that extra bandwidth is then handed to flow 2 and the graph will look like this: [image]https://user-images.githubusercontent.com/33148663/134029187-021c4761-cdda-4251-a24e-7b6b754a6062.png

In my code I've now fixed this by handling the Update call a bit differently when only one flow is running. Is this a mistake in the rfc or have I misunderstood something about the algorithm/rfc?

No stress if the explanation is a bit too long- winded and unclear it can also wait until Wednesday and we can discuss it in person.

:)

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/safiqul/tobias-master-thesis-webrtc/issues/7#issuecomment-923039838, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEHMKPNY2LAUS7TYQIG4F63UC5IELANCNFSM5C5BA2GQ.

tobiasfl commented 2 years ago

That clears it up for me, thank you! :)

tobiasfl commented 2 years ago

I now have some rough data for all 3 test cases plus without FSE for comparison. Plots, source txt file for the plots and tcpdumps from the tests can be found in this repo in the test_data folder. All tests were run with 3mbit bandwidth limit, 20ms latency and burst of 25kbit. (it seems burst configuration is mandatory, I have not found a way to use token- bucket filter without it in the command)

Normal congestion control without FSE: image It is very clear here that the second flow never gets up to a fair share of bandwidth.

Case 1, both flows having desired rate of 1gbit and same priority: image Seems pretty consistent with bandwidth limit on the interface with both getting around 1,5 mbit.

Case 2, the first flow having a desired rate of 1mbit, the second flow still having desired rate of 1gbit and both having the same priority: image The first flow very clearly gets limited to 1mbit and the second is getting the aggregate rate. If you see closely, the first one gets more than 1mbit before the second one starts, this is obviously because the FSE is not limiting it until the second flow starts. However the second flow is getting very fluctuations in the bitrate compared to the other plots, I assume this is to be expected since it experiencing the other flow's increases and decreases through the aggregate.

Case 3, both flows having a desired rate of 1gbit, the first one having priority of 1 and the second flow having priority of 2: image This seems to work as expected, when the second flow starts it gets 2/3 of the bandwidth while the first one get 1/3.

Is there anything more I have to resolve before starting to implement a prototype of the algorithm from the German papers? It is very clear from all the plots that they for some reason throttle the rate of the running flow when a new one is added do we have to find out why/ where in the code or can this currently be ignored?

safiqul commented 2 years ago

You can ignore this for now. Let's discuss about this in person.

The plots look good! Look at ROSIEE and it's time now to think how we can design an algorithm for coupling data+video flows :)