Closed richb-hanover closed 2 years ago
Let me summarize, I believe there are two issues at play here: 1) Ping-rate per reflector: each individual reflector will only give useful delay responses up to a certain aggregate rate (that is true no matter whether we use ICM echo requests, ICMP timestamps, or NTP time requests), so as good citizens we should not overwehelm individual reflectors (and might consider even adding reflectors to the pool). By default ping uses 1Hz and non super-users can increase this up to 5Hz, so I expect most reflectors will be willing to tolerate rates in that range (as long as the aggregate of all concurrent users of the reflector do not cross the total rate limit).
2) Effective delay sampling rate in the control loop: here in the spirit to respond early to bufferbloat, we should strive to respond as early as sanely possible (so full busy looping/pooling probably is not okay).
I think we can achieve a high 2) without causing a high 1) by interleaving delay probes to multiple reflectors.
To be good netizens, we need to monitor the responses and back-off/switch to different reflectors if we get signs of overload/throttling.
By the way @richb-hanover and @moeller0 I managed to implement a sleep function in the present bash implementation. It works perfectly well. Basically if load < 50% for >= 60s, it puts the pingers to sleep, and then wakes them up as soon as the load goes >50%. I don't think the wake-up delay will hurt that much. I've tested it and it seems to work well.
This should save a lot of unnecessary pings.
I am not sure our load estimates can be used here... Say if your shaper is configured for 100Mbs but the bottleneck is at 49% you will never really exceed 49% apparent load (averaged over your cycle time), but delay might spike through the roof... thinning out the ping frequency (to say 1/10 or normal rate) might be a compromise here that avoids going completely blind towards bufferbloat.
Hmm, I take the point, but the steady state baseline I had in mind is the compromise value you would otherwise set CAKE to absent adaptive rate control:
It is a safe harbour that should work 95% of the time (save in the green region). In this state, there should almost never be bufferbloat.
Does that change things for you? If not, I did also wonder about setting an additional load threshold for this sleep function, e.g. 5%.
Or as you say we could also reduce frequency (which would work better with one shot pings than ping processes that I can just start and stop, although with the ping process PID I suppose I can just kill completely and set up again with the new interval, but anyway this is also a consideration).
Ah, okay, if the rate is set to the hard minimum during these epochs that could work reasonably well. Still I assume that during primetime most links would be in active ping mode, and hence might still overwhelm the common reflectors... at least the load around the globe will spread out a bit ;)
Hehe true. Well it's not quite the hard minimum. It's the baseline which is the compromise. The baseline is like 95% of time actual rate is higher than baseline. The minimum is like 100% of the time the actual rate is higher than the minimum. I did wonder about in the sleep mode setting the CAKE bandwidth to the minimum. If you recall we had the idea of decaying to a base rate that is between the minimum and maximum. That seems to work well on my connection.
I think without delay measurements it really should go back to the hard minimum...
OK thanks. So either maintain base rate + reduced frequency pings, or back to hard minimum + no pings. I'm guessing the former is probably best. Any thoughts?
Both sound okay....
OK I implemented switch to hard minimum on sleep now.
@moeller0 I have implemented having separate ping streams write to a common fifo. It seems to work well.
Each pinger loop writes out:
echo $timestamp $reflector $seq $rtt_baseline $rtt $rtt_delta > /tmp/CAKE-autorate/ping_fifo
And the ping results are read in by the main loop here:
while read -r timestamp reflector seq rtt_baseline rtt rtt_delta
The CPU usage with this approach compared with fping is not that much higher and facilitates improved control over the individual ping streams (they can be individually stopped and started with the appropriate ping_pid).
Looks great, but please consider:
timestamp reflector seq ul_delay_baseline ul_delay ul_delay_delta dl_delay_baseline dl_delay dl_delay_delta
and just instantiate both with RTT for ICMP echo data that way the main code can be made to control dl and ul independently if true OWD data is available and still does the right thing for RTTs.
BTW, using a FIFO here seems clever and the correct data structure,.
To check against partial writes you could write
timestamp reflector seq ul_delay_baseline ul_delay ul_delay_delta dl_delay_baseline dl_delay dl_delay_delta timestamp
if leading and trailing timestamp differs the record is corrupted and should be ignored....
@Marctraider any chance you could test the new code in main for performance and CPU use? Please keep your old code if you actually use this regularly just incase the new code performs less well.
I've changed a lot again, but I think for the better. Now individual pingers write to a common fifo, which is read in and processed by the main loop. So the main loop now processes all ping results more or less as soon as they are issued.
From my testing all seems fine and CPU use with the changes has actually been reduced (despite the main processing loop now working with every single ping result as they come in, which should give better performance). But there could be a few issues needing ironing out.
@moeller0 thank you for your suggestions. Good point about partial write and seems like a cool idea about the error correction based on repeated timestamp. Are you thinking partial writes could occur given stopping the ping processes mid-write upon the sleep owing to sustained base rate?
On a related point, as you suggested, I skip ping results if the incoming result is greater than 500ms old.
Will that be enough on its own to recover from my ping sleep elegantly do you think? See from here:
Or do you think I should really delete the fifo and make it again for re-entering the main loop? I am not sure how much that initial skipping of the few stale results between the main loop break and ping stop will matter. During the skipping the minimum rate will be sustained because the main loop won't drive the rate up until that skipping line has passed. But depending on how fast the skipping process is there could be a few new results that come in during that skipping process giving a short period of catch up in terms of recovery from the sleep. Could be that I'm overthinking this.
6069321
Yes i will soon test the new script, and definitely keep the old one as backup (been using that primarily for now) but further optimizations are always welcome.
My ultimate goal, without setting rtt threshold too low, it for script to react so fast that a spike of 30 or bigger is almost impossible while abruptly disconnecting my fastest line (while an iperf test is ongoing). Without impairing total bandwidth significantly.
My ultimate goal, without setting rtt threshold too low, it for script to react so fast that a spike of 30 or bigger is almost impossible while abruptly disconnecting my fastest line (while an iperf test is ongoing). Without impairing total bandwidth significantly.
That sounds like a nice and concrete goal to test for. Hopefully it is achievable with the right settings with the latest version.
Actually @Marctraider I forgot to mention - there was a significant breakthrough moment in which it dawned on a few of the regular contributors on the OpenWrt thread including myself and @moeller0 that every time bufferbloat is detected that is in of itself a kind of capacity estimate of the line, and so ideally then the next CAKE bandwidth at that point is set based on the actual bandwidth used in the previous iteration, and not just the previously set bandwidth.
Following the corresponding change in the script, now, on bufferbloat, the rate is set according to:
cur_rate=$(( ($load*$cur_rate*$rate_adjust_bufferbloat)/100000 ))
https://github.com/lynxthecat/CAKE-autorate/blob/main/CAKE-autorate.sh#L53
That is, the new rate is set based on $rate_adjust_bufferbloat the actual transfer rate ($load$cur_rate) rather than being set based on only $rate_adjust_bufferbloat * the previous rate ($cur_rate).
That way we actually use the line capacity estimate ($load*$cur_rate), as it were, to set the new CAKE bandwidth, and so the new set CAKE bandwidth should correctly undershoot from the higher actual line capacity.
For your use case:
My ultimate goal, without setting rtt threshold too low, it for script to react so fast that a spike of 30 or bigger is almost impossible while abruptly disconnecting my fastest line (while an iperf test is ongoing). Without impairing total bandwidth significantly.
I think this should handle this drop much better because now when you disconnect the fast line, the actual rate transferred ($load*$cur_rate) will drop, and by setting the new rate based on that, it should be much more appropriate. That is, less than the available line capacity despite the huge change.
This will be set as soon as bufferbloat is detected, and you can monitor what happens in real time by just enabling $output_monitoring_lines in the 'config.sh'.
I am looking forward to seeing whether the recent changes (including especially the above change) gives you better performance.
... for script to react so fast that a spike of 30 or bigger is almost impossible ...
This sounds like an amusing research side project: how can we create a reproducible bandwidth changing application/tool/setup?
This would be a huge improvement over watching our LTE modem/Cable modem/etc. and trying to guess how the speed actually changed, and how autorate responded...
I know there's a project called DummyNet that claimed to do something like this. I've also heard it called a "Flakeway" (for "flakey gateway").
It would be cool if we had a tool that said, "cut bandwidth in half". I wonder if we could simulate this simply by adding a fixed delay to packets going through...
Something like that would really help create a standard test for these different tools. I would love a bash script that creates a battery of stress tests for different approaches. I am convinced mine is working very well now but this does seem pretty niche. I mean OpenWrt is already nice. CAKE is then even more niche. And CAKE-autorate is further niche still.
But it is often said QOS does not work on variable connections like LTE. I think this project has shown it can.
I want to split #17 - This is a new and important topic that should be considered separately from the CPU usage discussion. I wrote:
I'd like to look back to a comment from @Marctraider
As we continue our research, I hope we explore the other direction: how slowly can we send those pings while still maintaining good latency and speed control? Finding a good lower limit to the ping rate would have these advantages:
Do we have any information about how rapidly/how much LTE or Cable Modem speeds vary in real life? Maybe we could collect it ourselves...
We could collect long-term data (over the course of hours? maybe even days?) that record how often the CAKE-autorate algorithm needs to adjust the parameters, and the effect those changes made on latency (or the effect on latency, had those changes not been made...)
There were two thoughtful replies, which the authors are invited to append to this discussion so they don't get buried in the CPU usage discussion:
@marctraider wrote https://github.com/lynxthecat/sqm-autorate/issues/17#issuecomment-1061313873
@moeller0 wrote https://github.com/lynxthecat/sqm-autorate/issues/17#issuecomment-1061596391