porech / engarde

A go network utility to create a reliable IP tunnel over multiple connections
GNU General Public License v2.0
276 stars 41 forks source link

Trouble setting SQM/QoS traffic shaping on individual WAN interfaces. #27

Closed Marctraider closed 4 years ago

Marctraider commented 4 years ago

To get some form of QoS going, there seems to be two ways to go about it:

  1. SQM on Wireguard Tunnel, since I control my own VPS I can easily setup an SQM interface on VPS as well:

Both sides apply egress bandwidth throttling, on router side I set it do 5Mbps for upstream (Basically a bit below the slowest of my 3 WAN interfaces)

VPS side I set egress cap to 50Mbps~ just for safety but to retain at least some bandwidth, basically lower than the lowest ingress (downstream) capability of my fastest WAN interface router side.

This yields perfect SQM on max bandwidth test, latency in tunnel doesn't even budge. The theory here is that as long as we assume all 3 WAN's work on maximum quality/capacity, and setting the bandwidth cap 15%~ lower than the lowest available WAN interface, the tunnel remains fine. Downside is you'll have to be quite conservative on both sides, and if some WAN interfaces fail on router side, the bandwidth shaping cap might be too loose and no longer functions (Unless you set it really tight like 10Mbps/5Mbps maybe.

  1. SQM only on router side, applying to individual WAN interfaces. Theory here is that we can never choke/starve any WAN interface, and anything that arrives or gets through the tunnel, through these WAN interfaces just get send/received on best effort. Assuming we don't use TCP ECN, any excess packets simply get dropped. Applying such QoS would in theory be best as you can utilize the maximum potential of the fastest available WAN over engarde. In theory it sounds like this is right into engarde's alley and should be no problem.

The problem is, as soon as I apply SQM (Lets say 1/1Mbps to a single individual WAN interface, the tunnel unexpectedly drops in speed the longer a bandwidth test is ongoing, as if the single capped interface affects the wireguard tunnel at the same time. The tighter I cap the single capped interface, the more regression is shown in the tunnel speed as well.

I'm currently using https://github.com/tohojo/sqm-scripts which is also used on OpenWRT.

For ingress it creates ifb devices to mirror the incoming data so that it can be shaped with tc qdisc.

Currently using this on both sides for applying to wg0 (wireguard) directly but obviously if i try method 2 I adjust parameters to suitable settings.

SCRIPT=layer_cake.qos EQDISC_OPTS="diffserv3 metro nat dual-srchost no-split-gso ack-filter"

I tried older SQM methods as well, fq_codel etc but they all show the same behavior.

Not sure if this is just impossible to setup to the nature of the beast and it is simply impossible to apply SQM on a redundant setup like this.

Another method of testing is to apply SQM to interface, then ping -I $iface x.x.x.x (so it bypasses tunnel entirely), and no matter how low I cap the interface, pinging is troublesome while there should be plenty of bandwidth left for it.

MAYBE it can be solved with adding (virtual) 'veth' in/out devices in front of my WAN interfaces, but that's quite a complex config. Positive side effect would be that even ingress packets can be manipulated in iptables mangle table before they pass the (to be SQM'ed) veth out-interface

ale-rinaldi commented 4 years ago

Hello, I see you closed this one, have you solved the issue?

Marctraider commented 4 years ago

Not really, but I'm quite contend now with just shaping the tunnel both ends now (including DSCP marking on both ends, dont even think this is possible otherwise as shaping the individual wan interfaces would shape only the encapsulated udp tunnel stream, not what's inside)

I'll try and come up with a solution for this in the future, probably with veth dummy interface in between. Shaping wan would be really basic because with just an udp stream theres nothing to really work with in terms of tins.

I'm pretty sure if I hypothetically added 3 physical routers in between all my 3 internet connections, just for shaping, would solve the issue, but that is quite impractical and adds a layer of delay and complexity.

So somehow it would have to be done virtually.

I doubt its an issue with engarde or wireguard. In the end I dont think wan interface shaping is practical in this redundant setup. I'm chasing ghosts ;-)

Best would be if i.e. cake sqm supported automatic bandwidth probing, there is currently an implementation for ingress but not egress, and its also a bit buggy.

ghost commented 4 years ago

I had the best result with the following setup, latency won't budge: -cake + layer_cake with nat and src-dsthost arguments on Engarde interface. Speed set at the highest link of the 3. -fq_codel + simple for the rest 2 slower connections.

I also noticed that enabling fq_codel even without shaping on active interfaces can improve connection to engarde with cake, I'm guessing it has to do with AQM or buffer length adjustment?

Marctraider commented 4 years ago

Nice!

Small problem with QoS on tunnel is; To keep reliable connection you need to cap bandwidth to like half of your total bandwidth from all connections, and in case of a very bad fluctuating (but fastest) link like mine, its even more difficult to find sweetspot.

A great solution would be cake auto shaper but it sadly doesnt work well, it works fine initially but after a while it gets throttled so bad the connection becomes unusable.

Have you by the way noticed (with iperf) that jitter seems to go up when QoS is applied to tunnel?

My best conf was QoS one-way on both sides (VPS and Router)

Currently I dont run any QoS at all for a long time now.

ghost commented 4 years ago

Nice!

Small problem with QoS on tunnel is; To keep reliable connection you need to cap bandwidth to like half of your total bandwidth from all connections, and in case of a very bad fluctuating (but fastest) link like mine, its even more difficult to find sweetspot.

A great solution would be cake auto shaper but it sadly doesnt work well, it works fine initially but after a while it gets throttled so bad the connection becomes unusable.

Have you by the way noticed (with iperf) that jitter seems to go up when QoS is applied to tunnel?

My best conf was QoS one-way on both sides (VPS and Router)

Currently I dont run any QoS at all for a long time now.

Yes! I did notice jitter with cake but not with fq_codel, and yeah I had to use crontab to reapply cake with autorate-ingress due to the speed going down. Interesting approach with using QoS on VPS I should try this. Fortunately Engarde already has lower bufferbloat running compared to my native connection, I think that's a common thing with UDP VPNs/tunnels? +50ms vs 300ms on ADSL.