Proposal: waterfall-by-priority load balancing algorithm

stapelberg commented 8 years ago

Hey,

currently, MLVPN supports a weighted round robin load balancing algorithm. This is fine if the cost of all the links is roughly the same, and all links will be used equally (or proportionally to their configured bandwidth).

However, I’m in a situation where it is desirable to use a minimum number of links first and then gradually use more links as needed. Think about a setup with the following links:

link1: residential ADSL
link2: shared ADSL via a WiFi beam radio
link3: mobile broadband with strict traffic limits

Whenever the needed bandwidth can be satisified by using link1, only link1 should be used. If necessary, link2 can be utilized, and if absolutely need be, link3 can be utilized as well.

My proposed algorithm for this situation works as follows:

Introduce a (statically configured) priority field for each link.
On each link, set a counter to 0.
When a packet needs to be sent, grab the first link by priority.
Check whether the counter exceeds the configured bandwidth. If yes, grab the next link by priority and check again.
If no, send the packet and increment the counter with the number of bytes.
Each second, reset all link counters to 0.

I have implemented this in a proof-of-concept implementation (outside of MLVPN) and it works fine.

I’m willing to put in the work to implement this within MLVPN.

Would you accept such a contribution?

stapelberg commented 8 years ago

@zehome Did you get a chance to think about this? :) Please let me know if you have any questions.

markfoodyburton commented 8 years ago

I'd be really interested to see such a thing, at least to experiment with Cheers Mark.

stapelberg commented 8 years ago

@zehome Any news? If I don’t hear from you, I’ll just implement this and send a pull request if it works well.

zehome commented 8 years ago

Ok so I've tried thoses ideas in the past.

It's kind of the approch used by multipath mosh, but instead on focusing on bandwidth, it's focused on latency.

First in order to do that, you'll need to implement a flow recognition mecanism. Like (srcip, srcport, dstip, dstport, proto).

Then, for each flow, you'll need to mesure bandwidth. Then you'll be able to decide which link to use for every flow when a packet arrives.

You'll also need to handle flow timeout in order to keep memory usage low and not growing to the roof.

I've already implemented partly the flow recognition system but it's not finished / polished because the idea I was working on is not working properly (per flow re-ordering).

Anyway, we can always try to implement it and see in practice if it's a gain for your application.

BTW, mlvpn will never be as efficient as mptcp, which does everything mlvpn should do.

stapelberg commented 8 years ago

First in order to do that, you'll need to implement a flow recognition mecanism. Like (srcip, srcport, dstip, dstport, proto).

Sorry, I don’t follow. Why are you saying flow recognition is a prerequisite for waterfall-by-priority?

I’ve implemented pull request #88 and tested it with two links: one gigabit ethernet link, one wifi link. With the pull request applied, I get precisely the traffic rates I’m expecting. E.g., I can successfully set ethernet to 1 MB/s and will see that 5 MB/s go over wifi. Conversely, when setting ethernet to 100 MB/s, no traffic goes over wifi.

Am I missing a testcase?

markfoodyburton commented 8 years ago

I just tried this patch set. I think you are taking the numbers in the bandwidth setting 'literally' (which is fair enough, but till now these numbers have been effectively an indication of relative bandwidth if I understand correctly. It would be helpful to specify what units you expect the numbers in for instance :-) ) Secondly, I noted that without setting the loss tolerance to 100%, then the thing failed almost immediately - is that expected?

So - having adjusted the bandwidth numbers somewhat, I ended up getting something that tried to work. However, much as I tried to adjust things, I ended up with a lot of traffic on one link (which was set to have quite low traffic) and little on the other - this all tested under a heavy load. Could it be that the bandwidth measure you specify above sends everything during the first part of the second through on one link, by the time the ack's come back, we've run out of time in the second, and we move on, meaning the second link never fires up?

I GUESS using a re-order buffer could help here, but I guess it might have to be very large...

I haven't looked at the code, but I wonder if a moving window approach could be used?

Finally - @zehome - I notice you singing the praises of mptcp, which looks brilliant. The advantage of mlvpn is only that it's a single (and pretty simple) setup, where as, if I understand right, for an equivalent mptcp version you would need a vpn, a socks proxy, and some luck. Having said that, would it be possible to 'wrap' mlvpn around mptcp and create a 'best of both worlds'?

(I should say, in my case, I have a 4.7M and a 1.8M link (dont ask). I get a max throughput of about 4.1M and 1.3M on the two links. so I'm loosing about 10% of my throughput, which means the overall benefit is verging on the marginal in my case :-( - I'd love to get closer to the full bandwidth :-) )

Cheers Mark.

stapelberg commented 8 years ago

I just tried this patch set.

Thanks for giving the pull request a spin!

I think you are taking the numbers in the bandwidth setting 'literally' (which is fair enough, but till now these numbers have been effectively an indication of relative bandwidth if I understand correctly. It would be helpful to specify what units you expect the numbers in for instance :-) )

That’s already done (but not yet merged) in https://github.com/zehome/MLVPN/pull/80/files. The unit is bytes/s.

Secondly, I noted that without setting the loss tolerance to 100%, then the thing failed almost immediately - is that expected?

No. What exactly do you mean by “fail”?

So - having adjusted the bandwidth numbers somewhat, I ended up getting something that tried to work. However, much as I tried to adjust things, I ended up with a lot of traffic on one link (which was set to have quite low traffic) and little on the other - this all tested under a heavy load. Could it be that the bandwidth measure you specify above sends everything during the first part of the second through on one link, by the time the ack's come back, we've run out of time in the second, and we move on, meaning the second link never fires up?

You can answer that question better than I can by measuring the bandwidth on your underlying links, e.g. using iftop.

I’d recommend to move the discussion about mptcp into a separate issue.

markfoodyburton commented 8 years ago

ok - missed the bytes/s - it's what I guessed :-)

I'm not sure, it may have been experimental error :-) - but when I have the loss tolerance set to e.g. 85%, the link was instantly marked as lossy ... at the time, my bandwidth settings would have been very low (as I had them in KB/s) - but I dont see why that should have cased links to be marked as lossy....?

For the bandwidth, I know what the throughput is, my issue is not there. What I'm suggesting is if you are treating things e.g. "each second" (as you had indicated you might, - I didn't look at your code) - then I wonder if you are flooding the first link initially during the first part of the second (potentially causing the packet losses), then you get to a point where you have sent as many packets as you want (for the whole second), then you pass onto the second link. However, because it takes some time to send those packets, we're almost at the end of the second, so we dont actually manage to send much out on the second link before we end the second and start again.

(again, I didn't look at your code, so perhaps it doesn't work this way....)

I thought maybe the re-order buffer may help, but it seems to me you would need a very large re-order buffer, which potentially has other negative impacts?

Another approach could be to have more of a 'window/filter' system, to try and keep the 'average' bandwidth correct over time...

stapelberg commented 8 years ago

I suggest you have a look at the code, it should be a quick read and will give you a good grasp as to what’s happening (in detail).

That said, your description is correct: packets are being sent on each link to the extent configured, i.e. if you configure your bandwidth too high, you might run into delays/packet losses — I haven’t tested that particular bit.

It should be easy to verify whether this is happening: just configure half of your actual bandwidth as limit?

Let me know what you find out. I’ll also do some more experimentation once I get around to it.

markfoodyburton commented 8 years ago

I did try that, I seems not to give me a very good result. I think the issue is that you end up with a packet distribution like aaaaaabbbbb, where as, ideally you want ababababa.

One of the experiments I tried, to make bandwidth more 'accurately' reflect that which was set (rather than @zehome's approach with the sort of 'buckets' - which ends up with a 'quantum' effect), was to use a random number approach. e.g. set the probability of using one interface rather than the other.

so, then I'm guessing I'd have ended up with aabbbababaaa ... whatever. The result was worse that the bucket approach.

I believe thats because, again, when I tried to send too many packets at a time, things got congested and nasty.

Potentially a small re-order buffer may help in this instance, but I think if the plan is to send, basically, a seconds worth of traffic, then the re-order buffer would have to be very large?

So - what I'm saying is - I think - you want to only use one interface when traffic is low, but as soon as you have more traffic than that interface can handle, you need to intersperse the two interfaces, rather than 'burst sending'.... it's just a guess though :-)

Cheers Mark.

stapelberg commented 8 years ago

The very point of this issue/pull request is to do aaaaabbbb, i.e. entirely fill up interface after interface.

I’ll do some more research when I get around to it.

stapelberg commented 8 years ago

Had some time to play around with this.

In my experiments, the links had 0 loss, so I’m not sure why you’re seeing something different.

Further, when configuring bandwidth limits equal to/below/above the actual link speed (i.e. 10 Mbit/s, 5 Mbit/s, 50 Mbit/s, all on a 10 Mbit/s link), mlvpn behaves as expected.

I’ll try to use this setup next week in a situation with many different links, perhaps then I’ll be able to reproduce what you’re seeing.

markfoodyburton commented 8 years ago

it equally could be my VERY POOR quality ADSL that is causing problems with the approach.... Don't get me wrong, I think what your proposing makes a lot of sense and has value, it just COULD be in my case that its not as appropriate as other approaches...

markfoodyburton commented 6 years ago

Hi, I've recently re-visited this (I'm adding a 4g connection with limited monthly quota to an existing pair of ADSLs).

There is a difference between UDP and TCP. On my home network, I see fair mix of both, I want both to work 'well'. Most notably, TCP will effectively adjust itself to the available bandwidth. This is where MLTCP has an advantage, as it can change the way this adjustment happens. Equally, (as noted above) sending things across one link, then the other (AAABBB, rather than ABABAB) seems to lead to poor performance for TCP.

For TCP, the SRTT time is normally held relatively close to the 'idle' state. It is possible to set a 'threshold' trigger point, and use that to add more bandwidth to the link, however I find this approach problematic as the threshold is very sensitive.

Hence, outside of MLTCP, I conclude (sadly) that using SRTT to manage tunnel selection (as in #84) is unlikely to work well for TCP connections (though it is very good for UDP connections). Leaving it as a 'backup' mechanism if no bandwidths are given would seem a good approach.

Rather, I believe it is better to use 'bandwidth' where possible.

Currently MLVPN chooses bandwidth 'statically', allowing a dynamic approach based on incoming bandwidth seems to work very nicely.

To give a reasonably fast response to traffic, if I find that we exceed a given links bandwidth, I add the full amount from the next path, I do not attempt to 'only' pass what is needed on that path. This will be efficient (and, I believe, close to optimal) for TCP connections, while for UDP connections, if they are bandwidth limited, it will effectively "overuse" the more expensive path by a small amount. I am personally happy with this compromise. Also note, to enable TCP to use the full bandwidth, the calculation must include something like a 20% over estimate of the bandwidth needed.

What I am seeing looks pretty impressive in terms of performance across 2 ADSL links. In a few weeks, I should be able to try it with a 4g link included in the mix, and see what it does there.

I have a patch for this, but it's currently rather experimental. I'll clean it up at some point and post it if people are interested.

zehome commented 6 years ago

very nice.

I've seen similar behavior as yours, it's a very hard thing to achieve good aggregation.

I'm currently working on another project which leverages mptcp and it's awesome multi-connection balance algorithms, but with the ease of use of mlvpn.

Currently, solutions using shadowsocks-redir are very hacky, there are better solutions which I intend to use. (TPROXY)

markfoodyburton commented 6 years ago

keep me in touch, I'd love to see that. Meantime, MLVPN for me, is, itself, pretty awesome, not only easy to use, but easy to hack :-)

zehome / MLVPN

Proposal: waterfall-by-priority load balancing algorithm #82