MLVPN across different link bandwidths (new version)

zehome / MLVPN

Multi-link VPN (ADSL/SDSL/xDSL/Network aggregation / bonding)

http://www.mlvpn.fr/

BSD 2-Clause "Simplified" License

518 stars 127 forks source link

MLVPN across different link bandwidths (new version) #113

Open markfoodyburton opened 6 years ago

markfoodyburton commented 6 years ago

I've recently discovered a very interesting finding, I would appreciate other people replicating to make sure it's not just me!

By installing MCTCP on both the client and server, and using a different re-ordering algorithm, I have achieved extremely good bonding, with very different underlying bandwidths. UDP has always been fine, but the addition of the MCTCP kernel means that TCP is handled 'nicely', and is responsive.

please try is out the branch : new-reorder

I believe that a lot of the bandwidth issues reported in e.g. #112 #106 etc could be due to the poor performance with TCP connections. Using a combination of a new re-order mechanism and MCTCP, this issue SEEMS to have gone away.... Milage may vary, I would love your feedback !

Some things to note: 1/ There is a 'quota' mechanism included. This ay answer #82 in some ways too. In the config, you can specify a quota in Bytes per second. A 120GB/month limit I believe is 300 bytes per second. quota = 300

As traffic increases (over a 3 second averaging period), more tunnels are added. All tunnels with no quota are 'fairly' used no matter the bandwidth required. Subsequently tunnels listed lower in the conf file are chosen first.

The algorithm allows about a headroom on a tunnel, this can mean that you see steady download streams under-using slower speed channels if the faster channels themselves are not fully utilised.

2/ The reorder mechanism is dynamic, you should add reorder_buffer = yes to your (global) conf. There is no need to specify a size.

The reorder mechanism cap's at 64 entries, this could be changed, but it seems a reasonable upper limit. It uses 4 as a lower limit. The re-order buffer is not flushed when tunnels go down/up, I dont know if this is optimal or not.

The current reorder algorithm uses a packet count to decide when to ignore 'holes' in the steam. Another idea may be to use a time window (which is already used as a backup). It would be good to experiment with this.

Clearly the quota averaging period, max reorder buffer etc should be made into config param's at some point.

Please let me know if this helps.

industrial64 commented 6 years ago

Hey Mark,

Thanks so much for continuing to help with MLVPN! I gave your new-reorder a try, and while it provided good upstream aggregation, the reorder value value isn't changing from 1 (while watching debug).

Links are: 50/10Mbps VDSL 500/20Mbps Cable EuroDOCSIS3 We have been using reorder buffer values under debian-stable between 128 and 256. The global conf setting (above) is on both client/server.

This is what I see in the debug log:

2018-06-05T18:49:06 [ DBG/net] < VDSL-C2 recv 108 bytes (type=3, seq=1496231, reorder=1) 2018-06-05T18:49:06 [ DBG/net] > VDSL-C2 sent 1420 bytes (size=1392, type=3, seq=0, reorder=1) 2018-06-05T18:49:06 [ DBG/net] < Cable-C1 recv 92 bytes (type=3, seq=1496187, reorder=1) 2018-06-05T18:49:06 [ DBG/net] > Cable-C1 sent 1420 bytes (size=1392, type=3, seq=0, reorder=1) 2018-06-05T18:49:06 [ DBG/rtt] (VDSL-C2) No timestamp added, time too long! (1528224546275 > 1000) 2018-06-05T18:49:06 [ DBG/net] > VDSL-C2 sent 1420 bytes (size=1392, type=3, seq=0, reorder=1) 2018-06-05T18:49:06 [ DBG/rtt] (Cable-C1) No timestamp added, time too long! (1528224546275 > 1000) 2018-06-05T18:49:06 [ DBG/net] > Cable-C1 sent 1420 bytes (size=1392, type=3, seq=0, reorder=1)

Upon restarting both server and client, we briefly see speeds above 115Mbps. But during iperf tests like the below, all subsequent tests yield 114-115Mbps (49-51Mbps on the VDSL and the remainder on the Cable link)

iPerf download

[ ID] Interval Transfer Bandwidth [ 3] 0.0- 2.0 sec 64.5 MBytes 271 Mbits/sec [ 3] 2.0- 4.0 sec 44.5 MBytes 187 Mbits/sec [ 3] 4.0- 6.0 sec 25.8 MBytes 108 Mbits/sec [ 3] 6.0- 8.0 sec 27.5 MBytes 115 Mbits/sec [ 3] 8.0-10.0 sec 27.1 MBytes 114 Mbits/sec [ 3] 10.0-12.0 sec 27.2 MBytes 114 Mbits/sec

Any suggestions? I am able and willing to test every bit of your updates, I really want to see MLVPN mature, and this is the way to do it :) The only missing critical feature is 'bindtodev' for port selection over binding to the interface IP address, I thought @zehome did this in 2015, but I don't think it was integrated.

F.Y.I. - One important item @zehome made for Linux systems (which I noticed was missing) is the '/lib/systemd/system/mlvpn@.service' modification to include the Configuration file name (--name %i) in the 'ps -ax' process title: ExecStart=/usr/local/sbin/mlvpn --config /usr/local/etc/mlvpn/%i.conf --name %i --user mlvpn

industrial64 commented 6 years ago

Forgot to mention, with reorder_buffer_size between 64-256 I can get good aggregation at right around 500-520Mbps downstream.

I found that after applying a quota to the lesser (VDSL) link, speeds recovered to 500Mbps.

However, while testing the quota mechanism, I found that no matter what value I used (your default: 300) or higher values (3000/6000/12000) the downsteam speeds on the VDSL link remained between 8-9.5Mbps.

In summary, putting the quota into the configuration enabled high speeds once again. But it doesn't use more than 9Mbps on the 50Mbps circuit: [ 4] 0.0-100.0 sec 5.59 GBytes 480 Mbits/sec

markfoodyburton commented 6 years ago

Thanks for testing this, and sorry it doesn't seem to be working out, but - maybe - it might help find out whats going on... So:

When you say the 'reorder-value' - what do you mean? There is an internal variable called list_size_av which indicates the current list size... but the (old) reorder-value isn't used.. It would be very interesting to see how this list_size_av changes for you over time.

In this implementation, there is a notion of the size of the list when (finally) a packet arrives that allows things to be removed. That gives us some sort of measure of the list size you need to re-order stuff. It's crude, and I dont know if it works :-).

I then average over time. multiply the number by 2 (again, totally arbitrary), and use it to keep the re-order buffer 'trimmed'. It would probably be better to base this number on the maximum RTT time, but - I have not done that yet....

What I have found is that a re-order buffer thats too large (in my case, more than about 10), TCP streams seem to suffer, and the bandwidth drops. Too small (less than about 10 in my case), and I end up dropping too many packets, and, bandwidth drops.... Hence the plan is to be 'dynamic' - increase the size of the reorder buffer when it's needed, and drop it back again when it's not needed.

(BTW, a reorder buffer of 256 seems huge? If we assume that one end sends things into all it's channels 'in order' and the channels themselves (generally) deliver in order, then - theoretically I assume - the maximum buffer size would be the number of packets that can be sent in the difference in the RTT's - something like that. Is that really 256 in your case?)

One thing to check, you could hack the reorder buffer, and hard-wire it to e.g. 256 to see what effect that had for you (see line 178, replace the b->list_size_av with e.g. 256)

BTW - a little point on the quota. The quota has 2 effects - it causes the weighting algorithm to TRY not to use the link, and secondly it prevents from 'overspending' on a link. The number is the number of kilobits (to be consistent with the rest of the file) that we will add (per second) to a notional 'quota' (spending limit) of allowable throughput on a link. In other words, if you have a 120GByte/month 4g SIM card, this mechanism stops you using 'too much'.

In your case, if the link is 'unlimited', then, you can remove the quota, you should still "benefit" (or perhaps not) from the re-order mechnaims.

industrial64 commented 6 years ago

Hey Mark! Thanks for your response :)

When you say the 'reorder-value' - what do you mean? There is an internal variable called list_size_av which indicates the current list size... but the (old) reorder-value isn't used.. It would be very interesting to see how this list_size_av changes for you over time.

I was talking about the reorder_buffer_size, we had been playing with values between 64 and 256 and found that 128 was a good fit, 64 seemed to limit speeds to around 350-390Mbps. More detail on the links and their RTT is definitely important here: Cable: 500/23 @ 12-15ms VDSL: 50/10 @ 5-7ms

In this implementation, there is a notion of the size of the list when (finally) a packet arrives that allows things to be removed. That gives us some sort of measure of the list size you need to re-order stuff. It's crude, and I dont know if it works :-).

Copy that, do you see any performance/stability issues with using reordering always? i.e. putting it in place @ 64 (even for links that are evenly matched in speed and paired in RTT/JIT)

I then average over time. multiply the number by 2 (again, totally arbitrary), and use it to keep the re-order buffer 'trimmed'. It would probably be better to base this number on the maximum RTT time, but - I have not done that yet....

I wish I could help you here! :)

What I have found is that a re-order buffer thats too large (in my case, more than about 10), TCP streams seem to suffer, and the bandwidth drops. Too small (less than about 10 in my case), and I end up dropping too many packets, and, bandwidth drops.... Hence the plan is to be 'dynamic' - increase the size of the reorder buffer when it's needed, and drop it back again when it's not needed.

At 128 (reorder_buffer_size) I didn't see a huge TCP performance drop (testing on iperf), but did notice a significant reduction in performance with it off/0 (115Mbps Max downstream) and with it at 64 (290-350Mbps Max downstream)

(BTW, a reorder buffer of 256 seems huge? If we assume that one end sends things into all it's channels 'in order' and the channels themselves (generally) deliver in order, then - theoretically I assume - the maximum buffer size would be the number of packets that can be sent in the difference in the RTT's - something like that. Is that really 256 in your case?)

I know :) It didn't seem to impede or make performance suffer though, at least from what I saw during testing. I keep it at 128 for this link pair, it hits up to 520Mbps (with 49Mbps on the VDSL, rest on the Cable link)

One thing to check, you could hack the reorder buffer, and hard-wire it to e.g. 256 to see what effect that had for you (see line 178, replace the b->list_size_av with e.g. 256)

Sounds like a plan, I'll give it a shot today :)

BTW - a little point on the quota. The quota has 2 effects - it causes the weighting algorithm to TRY not to use the link, and secondly it prevents from 'overspending' on a link. The number is the number of kilobits (to be consistent with the rest of the file) that we will add (per second) to a notional 'quota' (spending limit) of allowable throughput on a link. In other words, if you have a 120GByte/month 4g SIM card, this mechanism stops you using 'too much'.

Right on, it is a necessary feature for certain link types today. When I tested it however, I noticed speed on the VDSL was always between 8-9.5Mbps downstream whether I had 300/3000/6000/12000 quota value set.

In your case, if the link is 'unlimited', then, you can remove the quota, you should still "benefit" (or perhaps not) from the re-order mechnaims.

Agreed, though it is still nice to use it when deficient links (with common spikes of LAT/JIT/Loss) to avoid their use unless absolutely necessary (during large packet ingress or egress)

Once again, thanks for your contribution! I am really hoping MLVPN is seen as production-ready in most engineers eyes, you are a big driver here for that continued development.

markfoodyburton commented 6 years ago

(I'm not sure I'm anything more than a dirty hacker, @zehome knows a lot more about this stuff than me, but, I have a vested interest, living in the countryside :-) - and I love MLVPN, because it's so hackable :-))

The issue with the quota system is that, over a 3 second period, I look to see what sort of traffic I have, and then base the 'weighting' on the requirement (with a descent overhead). Net result (I think) TCP connections see a slowly increasing bandwidth, which they are slow to respond to, all in all, this seems to peg bandwidth lower than it could. So, it may be a good idea to remove the quota system while your testing (for now).

One other thing to check - is MLVPN eating up loads of CPU for you? (especially this new version, it could.... it may be 'slow' to re-order the packets)

Do you ever see this printout: log_debug("reorder", "got old (insert) consider increasing buffer (%d behind)\n",(int)(b->min_seqn - TAILQ_LAST(&b->list,list_t)->pkt.seq));

Cheers Mark.

industrial64 commented 6 years ago

(I'm not sure I'm anything more than a dirty hacker, @zehome knows a lot more about this stuff than me, but, I have a vested interest, living in the countryside :-) - and I love MLVPN, because it's so hackable :-))

Same here, although as you can tell from the link speeds, I am trying to achieve 500Mbps+ speeds with MLVPN. The problem we face in North America is lopsided links, it's not uncommon to find a 150/2Mbps circuit. Symmetrical speeds are rarely found for consumer and SMB locations, unless you're in an American metro area, which has Verizon FiOS. I don't expect to push my luck much farther on the downstream, but upstream bonding of disparate links is the ultimate advantage and goal.

The issue with the quota system is that, over a 3 second period, I look to see what sort of traffic I have, and then base the 'weighting' on the requirement (with a descent overhead). Net result (I think) TCP connections see a slowly increasing bandwidth, which they are slow to respond to, all in all, this seems to peg bandwidth lower than it could. So, it may be a good idea to remove the quota system while your testing (for now).

Sadly, if I remove the quota rule for the VDSL link (using your new-reorder branch), speeds ceiling at 115Mbps, and the auto_reorder doesn't appear to 'kick-in', I tried running it for over 60secs to see if it would adjust, but no luck.

One other thing to check - is MLVPN eating up loads of CPU for you? (especially this new version, it could.... it may be 'slow' to re-order the packets)

I haven't noticed a difference, though I can assume the 'auto re-ordering' isn't functioning correctly, as speeds do not grow past 115Mbps. On an i7 4500u (w/HT) I have never seen utilization pass 70% on one of the four cores @ 500Mbps+. On the server end there is an ungodly amount of CPU power: new twin Xeon E5's, and utilization never goes past 30% @ 500Mbps+.

Do you ever see this printout: log_debug("reorder", "got old (insert) consider increasing buffer (%d behind)\n",(int)(b->min_seqn - TAILQ_LAST(&b->list,list_t)->pkt.seq));

Absolutely: [ DBG/reorder][0m got old (insert) consider increasing buffer (9 behind) [ DBG/reorder][0m got old (insert) consider increasing buffer (54 behind) [ DBG/reorder][0m got old (insert) consider increasing buffer (208 behind) ... lots more where those came from :)

If there is anything you want to peek at, or if you want to compare actual/complete configs feel free to shoot me an email: github@sixgov.com Very best!

industrial64 commented 6 years ago

Hey Mark, One final note for testing today, I upped the quota value on just the server side to 30000 (for VDSL), and commented out the quota (VDSL) from the client-side, and with this configuration the auto-reorder appears to be functioning, bit of a head scratcher ;)

Line 178 (from above) is unmodified, and this is basically your new-reorder with a modified updown.sh. I can happily report that CPU util. is nearly identical on the client device, but has risen from 30% to 50% on the server-side, not outrageous, but understandable given the overhead of an additional inspection routine.

From the below, you can see there is a refactoring or recalculation hit, and the traffic drops to `380ish during those intervals, which very-much looks like set intervals of about 5-6seconds:

[ 3] 72.0-74.0 sec 89.0 MBytes 373 Mbits/sec [ 3] 74.0-76.0 sec 135 MBytes 565 Mbits/sec [ 3] 76.0-78.0 sec 121 MBytes 509 Mbits/sec [ 3] 78.0-80.0 sec 91.1 MBytes 382 Mbits/sec [ 3] 80.0-82.0 sec 133 MBytes 559 Mbits/sec [ 3] 82.0-84.0 sec 121 MBytes 508 Mbits/sec [ 3] 84.0-86.0 sec 90.0 MBytes 377 Mbits/sec [ 3] 86.0-88.0 sec 137 MBytes 574 Mbits/sec [ 3] 88.0-90.0 sec 124 MBytes 522 Mbits/sec [ 3] 90.0-92.0 sec 87.9 MBytes 369 Mbits/sec [ 3] 92.0-94.0 sec 132 MBytes 556 Mbits/sec [ 3] 94.0-96.0 sec 143 MBytes 600 Mbits/sec [ 3] 96.0-98.0 sec 91.0 MBytes 382 Mbits/sec [ 3] 98.0-100.0 sec 136 MBytes 571 Mbits/sec total: [ 3] 0.0-100.0 sec 5.67 GBytes 487 Mbits/sec

During these dips I don't see any drop in bandwidth on the VDSL, if there is anything I should keep an eye on to troubleshoot, just let me know.

The big improvement here is I have never before seen TCP performance (on iperf, w/default options) above 520Mbps, I am routinely hitting all the way up to 600Mbps with the VDSL doing a steady 45Mbps of work (right where it should be, without incurring any drop-tail) :)

markfoodyburton commented 6 years ago

So, if I understand where you are: 1/ With some 'fiddling' you get 'good' results (did I understand right?) (So we should keep going!) 2/ CPU usage has increased, I should look at ways of optimising things (My idea is to 'guess' at where the packet will land in the buffer, and search from there) 3/ setting no quota should be working for you, and isn't. (presumably a bug....)
4/ The current '64' packet limit for the reorder buffer is too small.

Setting that 64 to e.g. 1024 or something would make the buffer bigger, at the cost of extra CPU. But, I really think I should base the size on Current RTT. Indeed, my feeling is that I should drop packets that are older than "current largest RTT", rather than counting packets in the buffer.... Will need to be careful about CPU usage again :-)

markfoodyburton commented 6 years ago

(BTW, forgot to mention, you can pre-set the permitted 'quota' for a tunnel, so you can give yourself some 'headroom' from when you start.

-p, --permitted :[bkm] Preset tunnel initial permitted bandwidth (Bytes - Default,Kbytes or Mbytes)

e.g. -p tun1:1000m

markfoodyburton commented 6 years ago

I've pushed an update to 'new-reorder' with the following fixes:

I've changed the weighting algorithm, I hope this fixes the issue with not setting a quota.
I've changed the reorder buffer itself to be limitless (!), but to remove things after the RTT time.

(This is similar to @zehome's algorithm I think, but it prunes things even if there are still packets coming in. Of course, the role of the reorder algorithm is not just to re-order, but to 'drop' a missing packet as soon as possible, using RTT gives me good results so far, and better results that the packet count....)

Let me know your millage :-)

industrial64 commented 6 years ago

So, if I understand where you are: 1/ With some 'fiddling' you get 'good' results (did I understand right?) (So we should keep going!)

Absolutely! Performance on TCP tests had dramatically improved to the max ceiling speeds of the links, the 500/20 occasionally goes up to 600Mbps, as the ISP uses bit-bucket bursting (with an unknown traffic allotment for the bucket)

2/ CPU usage has increased, I should look at ways of optimising things (My idea is to 'guess' at where the packet will land in the buffer, and search from there)

Correct, after this new update, the usage looks a little bit lower (between 38-45% @ 490-570Mbps iperf TCP)

3/ setting no quota should be working for you, and isn't. (presumably a bug....)

Feels a bit like it :)

4/ The current '64' packet limit for the reorder buffer is too small. Setting that 64 to e.g. 1024 or something would make the buffer bigger, at the cost of extra CPU. But, I really think I should base the size on Current RTT. Indeed, my feeling is that I should drop packets that are older than "current largest RTT", rather than counting packets in the buffer.... Will need to be careful about CPU usage again :-)

Agreed, an upper limit like 256 would be fine for almost all circumstances/scenarios. As you said earlier, a 256 buffer size is massive, especially in my case of a link RTT delta of 10-12ms. This weekend I'll do some more tests in rural areas with line of sight wireless circuit pairing/aggregation. The RTT of those links are usually absurd and jump around a lot, this would help show the true colours of you techniques here :)

-p, --permitted :[bkm] Preset tunnel initial permitted bandwidth (Bytes - Default,Kbytes or Mbytes) e.g. -p tun1:1000m

Right on, I'll start playing with this, always nice to see a new option for MLVPN ;)

I've pushed an update to 'new-reorder' with the following fixes:

I've changed the weighting algorithm, I hope this fixes the issue with not setting a quota.

Sadly it doesn't, removing the quota (on the VDSL) from just the server causes traffic to tabletop at 115Mbps - very odd :) What's also odd is that when I put a quota of 3200 on the VDSL, the link is capped to 6.8Mbps, when I double it to 6400 it stays at 6.8Mbps, same 6.8Mbps at 12800, but when I set the quota to 32000, I end up getting the full 48.5Mbps of speed out of the VDSL

I've changed the reorder buffer itself to be limitless (!), but to remove things after the RTT time. (This is similar to @zehome's algorithm I think, but it prunes things even if there are still packets coming in. Of course, the role of the reorder algorithm is not just to re-order, but to 'drop' a missing packet as soon as possible, using RTT gives me good results so far, and better results that the packet count....)

Let me know your millage :-)

Hopefully some of the above will prove useful to continuing the troubleshooting of the quota :)

The only big caveat I have with MLVPN currently is the missing 'bindtodev' instead of binding to the IP-address of the link/port. Thanks again!!

markfoodyburton commented 6 years ago

For the Quota, I wonder if it's to do with different RTT's. For the dev-bind, I'll rebase at some point... @industrial64 could you get in touch direct, It might be easier to debug 1-1. mark at helenandmark dot org.

markfoodyburton commented 6 years ago

(Note to self as much as anything else. Sorry to hijack this thread :-) ) Turns out that I have been battling against a really dirty ADSL line. I have updated my implementation to count loss better. (sorry for the noise on my repo, you shouldn't try to follow, as I'm testing between 2 machines, and using git to sync (not a good idea, dont do it!)).

Anyway- I have 2 conclusions: 1/ the current algorithms collapse with any noise on any tunnel, as it has a negative impact on all tunnels. 2/ The re-order algorithms can try to recover, but its not going to be good. Indeed, at this point, I dont see a good way of making use of 'noisy' bandwidth.... 3/ Turns out - all my 'clean' ADSL's ALWAYS deliver ALL traffic IN ORDER! I'd very much like to know if other people, around the world, have connections that deliver out of order.

TalalMash commented 2 years ago

(Note to self as much as anything else. Sorry to hijack this thread :-) ) Turns out that I have been battling against a really dirty ADSL line. I have updated my implementation to count loss better. (sorry for the noise on my repo, you shouldn't try to follow, as I'm testing between 2 machines, and using git to sync (not a good idea, dont do it!)).

Anyway- I have 2 conclusions: 1/ the current algorithms collapse with any noise on any tunnel, as it has a negative impact on all tunnels. 2/ The re-order algorithms can try to recover, but its not going to be good. Indeed, at this point, I dont see a good way of making use of 'noisy' bandwidth.... 3/ Turns out - all my 'clean' ADSL's ALWAYS deliver ALL traffic IN ORDER! I'd very much like to know if other people, around the world, have connections that deliver out of order.

My WISP provider is using Mikrotik equipment with VLAN and PPPoE server, the packets are always out of order unfortuantely and there is no bufferbloat at all.