zehome / MLVPN

Multi-link VPN (ADSL/SDSL/xDSL/Network aggregation / bonding)
http://www.mlvpn.fr/
BSD 2-Clause "Simplified" License
518 stars 127 forks source link

Ideas for managing tunnel selection #84

Open markfoodyburton opened 8 years ago

markfoodyburton commented 8 years ago

Hi, thinking about #64 and #82 - I have tried two experiments. Firstly, instead of using a 'round robin' selection mechanism, I used a weighted random selection. As far as I can tell, in my set up, the effect is minimal, if anything it's somewhat worse. If we think about a round robin mechanism, most of the time, the tunnels take things in turn. With a random approach you are quite likely to get packets following each other on the tunnel. Although the average throughput may be right, this causes the tunnels to get 'peeks' of traffic that they can't deal with... which in the end has a negative impact on performance

I dont think this will surprise anybody, but I wanted to try :-)

Second, for my ADSL connections, I notice, as the links get heavily used the SRTT can go through the roof. Further more, the slower line has a longer SRTT - very approximately related to it's bandwidth (or at least, that is my guess).

I implemented a mechanism which adjusts the weights for the tunnels based on the SRTT. This is surprisingly successful. In order to avoid the 'random' effect above, the ratio needs to stay fairly steady, e.g. it needs to change slowly. Looks like you need a rolling average of at least 20000 packets. Over which time, the impact of the return going one route or the other should also be minimised.

Doing this would mean you dont need to supply bandwidth numbers (which are subject to change), and the ratio used adjusts to circumstances.

HOWEVER - this is favouring one tunnel as opposed to another based on their speed, rather than their bandwidth, which is clearly not the case for anything like a mixed medium environment - and only has any hope of working for common mediums (e.g. some ADSL lines).

None the less - this approach seems to work very nicely indeed. Highly unscientifically, I seem to get much steading and marginally higher download and upload speeds. Whats interesting is that the connection seems to 'self adjust' to both the ADSL fluctuations, but also the load imposed on it. During heavy download, the faster download tunnel seems to be favoured for the return path (giving up better throughput overall), while during upload, as the uplinks become saturated, their speed evens out, and we seem to better use the available upload bandwidth!

The guts of my approach are below, if anybody wants to try this out themselves, or if you like the approach, let me know and I'll open a pull request.

`static void mlvpn_rtun_recalc_weight_srtt() { mlvpn_tunnel_t *t; double total_srtt = 0; int error = 0;

LIST_FOREACH(t, &rtuns, entries)
{
    if (t->srtt >= 1000)
        error++;
    total_srtt += (t->srtt);
}
if ( error ) {
    log_debug("srtt", "srtt seems not stable (yet)");
} else {
    LIST_FOREACH(t, &rtuns, entries)
    {
        /* useless, but we want to be sure not to divide by 0 ! */
        if (t->srtt > 0 && total_srtt > 0)
        {
          /* lets try moving things a little slower, we'll take a rolling
           * average over lots of  samples... */
          double new=((double)(total_srtt - (t->srtt/)) *100 / (double)total_srtt);
          t->weight = ((t->weight *19999.0) + new)/20000.0;
          log_debug("wrr", "%s weight = %f (%f %f)", t->name, t->weight,
                    t->srtt, total_srtt);
        }
    }
}

}`

zehome commented 8 years ago

Yes indeed that's the right solution to load balance the links without looking for packet loss.

Would be very interresting to see that merged in mlvpn I think.

markfoodyburton commented 8 years ago

ok - but there is a massive health warning from my side. I've ended up reverting to manually setting the bandwidths - because with low traffic volume you end up with random rubbish and you end up with a 'badly' set ration. Once traffic increase, it takes some time, but you end up with a good flow - sometimes. But sometimes it doesn't settle out at all. I haven't got to the bottom of that.... My conclusion - as a 'I have no clue" set up, it's ok. If you know anything about your links, you are probably better of setting them manually :-(

Ok - so having given the health warning, I'm happy to work on the patches, and see if I can't make them better....

I can't promise when, but I'll work on it at some point, and try and feed some sensible patches back.

Cheers Mark.

markfoodyburton commented 7 years ago

I took a look at mptcp, interestingly they have done the work in terms of finding better algorithms for path selection. One of their plans seems to be a little like I tried (only they probably know what they are talking about :-) )

It strikes me that it would be really very valuable to implement some of the same algorithms in MLVPN, and were we to do that, it seems to me the major difference between a MLVPN and a MTTCP approach would be removed.

https://www.ietf.org/archive/id/draft-walid-mptcp-congestion-control-04.txt For instance gives all the technical details about how the algorithm selects it's path.

This isn't very hard to implement inside mlvpn I dont think (though, right now, I'd be pritty sure @zehome would do a better job than me :-))

zehome commented 7 years ago

There are numerous ways in which mlvpn and mpctp differs.

First, the path detection, "mesh" usable in mptcp with autodetection. This is hard to implement.

Second, the "coupled congestion" handling. MPTCP does a much much better job than mlvpn in this regard. In particular with the "balia" tcp congestion algorithm.

Thoses are very very hard to implement in MLVPN, cause MLVPN don't track sessions, don't have SYN/ACK capabilities, don't have timeout on tcp streams. I've tried to implement part of it but it's kind of redundant with what MPTCP actually does.

Right now, it's probably a better idea to use a transparent socks proxy with mptcp if you want something with much better featureset on Linux.

markfoodyburton commented 7 years ago

@zehome again, thank you for taking the time, and I think your answer is really valuable for people to understand what mlvpn is (and is not).

I'd love to see a project that made using mptcp as easy as mlvpn, but, mlvpn gives me close to good performance, and it's pretty much rock solid stable, so .... (There's OVH's over the box thing, which looks like a complete solution, but deploying it doesn't look soo easy, except if you buy of of their boxes I guess :-) )

markfoodyburton commented 7 years ago

FWIW:

I've taken another look at all this. Specifically I've tried to clean up the SRTT approach above, but I have also tried to see if I can improve on it.... Here's what I found:

There are - for me - two aims: 1/ to make bandwidth selection automatic 2/ to make bandwidth selection 'dynamic' so the tunnel responds to varying conditions on each line.

First, I have two ADSL connections, with typical 1M uploads and 4-5M downloads. I have (after some work) got the SRTT 'automatic' adjustment mechanism working reasonably. It uses the measured SRTT (which is correctly measuring round trip time), to adjust the current weight of each path (which is taken into account dynamically, and optimally).

However, for an ADSL line, this works (to a degree) for 'download', but it is inaccurate for ADSL upload. In short, it does not treat each of the directions independently as it measures round trip time.

Measuring 'send time' (e.g. in one direction) is not (As far as I know) possible without extra information. As an experiment, I did the following: I used the keep alive request to transmit the current 'advertised' bandwidth to both ends. Knowing this, and the timestamp, I then estimate the 'time delta' between the two ends (based on the advertised bandwidth (from my modems)). (e.g. assuming that the transition speed is proportional to the bandwidth (which it's probably not), then we can attribute the the measured round trip time to each 'leg' of the journey).

This is painful - an requires knowing the bandwidth (which basically defeats the object of the exercise).

next, using the now calculated time delta, we can use that to calculate the send time for each tunnel, and use this to adjust the weights.

The results are "OK". But, .... If I allow the calculation to be 'unsmoothed', then the results are somewhat less performant than simply setting the bandwidth manually (which anyway, we have to do for this to work). On the other hand, if I "smooth" the weight calculation, then I end up with the bandwidths that I put in in the first place, so there is little point. Further more, any 'smoothing' means that the system is not as dynamic in terms of changing in response to changes to the bandwidth.

It seems to me, adding this complexity gains little over using the normal mechanism of setting the bandwidth. t was a nice experiment, but - dont bother :-)

However, the SRTT approach on it's own seems like a reasonable plan. It is not totally accurate for ADSL lines, but is not such a terrible approximation either. One option would be to include an 'srtt' approach as a 'backup' if the .conf file doesn't include bandwidth information (as a 'default setup'), In most cases this seems to get you to 'close' to an optimum solution.... For the 'server side' it may actually be a pretty good solution, meaning you dont have to set bandwidths on the remote server.

I'll try and prepare sensible patches for these and send them....

markfoodyburton commented 7 years ago

see pull request https://github.com/zehome/MLVPN/pull/69