High packet loss after some time of using wireguard+udp2raw setup

LoveCPro commented 3 months ago

Description:

I am experiencing significant packet loss after some time using the wireguard+udp2raw setup. Initially, there is no packet loss, but after a period (sometimes a few days, sometimes just an hour), packet loss starts to occur, reaching up to 50%. During this time, CPU and memory utilization remain low.When I restart the udp2raw client process, the packet loss stops temporarily, but it starts again after some time.

Previously, when using only WireGuard, I encountered tunnel interruptions specifically when transmitting HTTPS traffic. This issue was resolved by introducing udp2raw.

Expected Behavior:

The tunnel should maintain a stable connection with no packet loss.

Environment:

OS: [openwrt] cmd: client:[/usr/bin/udp2raw -c -l 127.0.0.1:1111 -r ${serverip}:1111 --raw-mode faketcp --cipher-mode xor --log-level 0 -a --wait-lock] server:[/usr/bin/udp2raw -s -l 0.0.0.0:1111 -r 127.0.0.1:1111 --raw-mode faketcp --cipher-mode xor]

Questions:

Are there any known methods to improve this situation? Could this issue be related to firewall settings, and if so, what steps can I take to investigate and resolve it?

wangyu- commented 3 months ago

Well this might be udp2raw bug. But this is (maybe more likely) caused by your ISP is pusnishing long-term connections.

Are there any known methods to improve this situation?

Currently udp2raw doesn't have packet loss monitoring feature.

What you can do is, monintor with your own script, and if the packet loss is considered "high", then you use the udp2raw's --fifo feature to tell udp2raw to re-dial a new connection.

wangyu- commented 3 months ago

the advantage of --fifo vs "restart udp2raw" is:

With fifo reconnect, your upper level udp connections will stay valid after reconnect, you traffic will only be interrupted by a few RTTs.

With restarting udp2raw, your wireguard will wait for timeout first, then re-establish a new connection. Traffic will be interrupted much longer.

LoveCPro commented 3 months ago

To minimize packet loss as much as possible, do you have any recommendations? I am not familiar with the --seq-mode parameter. Could you please explain if this parameter can help reduce packet loss? Any additional suggestions for optimizing the setup to avoid packet loss would be greatly appreciated.

wangyu- commented 3 months ago

You can try different --seq-mode and see, it might help or might not help. Cannot provide any guarantee.

If the packet loss is indeed caused by your ISP's qos strategy of punishing long-term connection, there is not many options you can do at your side.

lars18th commented 1 month ago

If the packet loss is indeed caused by your ISP's qos strategy of punishing long-term connection, there is not many options you can do at your side.

I suggest to provide a method to stablish a new FakeTCP connection. This will overcome the use case when the ISP could be punishing long-term connections. For example, you could start a second connection without closing the former. And at some point multiplex the data over the new one, and then close the initial connection. With this strategy from the point of view of the ISP tracker a new connection is stablished. And perhaps this will reset the timeout.

What you think?

wangyu- commented 1 month ago

I suggest to provide a method to stablish a new FakeTCP connection. This will overcome the use case when the ISP could be punishing long-term connections. For example, you could start a second connection without closing the former. And at some point multiplex the data over the new one, and then close the initial connection.

I have considered the same before. But the problem is: the new connection is not guaranteed to be better than the old.

Sometimes, besides long-term connection punishment, ISP might also have load-balancing mechanisms across multiple physicial links. The new connection might accidently be on a bad link (this is not uncommon according to my experience). So the current design is "don't touch it, unless udp2raw believes the connection is really really bad", and at same time we allow user to trigger reconnect by --fifo.

Ideally, udp2raw could try to establish multiple underlying connections and do some measure and automatically pick the best one. But complexity arises in how to smartly pick the best connection. (e.g. one connection might have lower packet loss but higher latency, how do we choose in this case? ). For simplicity I don't want to go this way at the moment.

lars18th commented 1 month ago

Hi @wangyu- ,

Thank you for the response. Your point-of-view is acceptable. I feel it would best to use a cron schedule to call with --fifo to restart the connection. Or use some external tool to check the connection over the tunnel and in some cirscunstance call to restart. But in any case, when doing the echo reconnect >fifo.file you're first stablishing a new connection and then closing the ancient. Or you simple close and reopen? The idea is to not block the traffic in any sense. Therefore, we can scedule crons of only 60' for example.

What you think?

wangyu- commented 1 month ago

Or you simple close and reopen? The idea is to not block the traffic in any sense.

It's indeed simply close and reopen. But the reconnect is usually very fast and upper-level connection stays valid. You typically get a outage of 200ms.

I at the moment, feel it's not a super big problem. Don't want to complicate the design to start a new connection in parallel and retire the old connection when new is ready.

Your suggestion is also not perfect: consider the new connection is very bad bc of ISP load balancing, you will have some outage any way. A perfect solution will unavoidable need some kind of smart way of measuring and comparing of the heath of tunnel, and pick the best among the parallel connection.

If you really care about high avaliablity. I think the prefered solution is:

you start like 10 pairs of udp2raw + tinyfecvpn tunnels, you monitor the health of them, you pick the best connection out of 10 and route traffic via it, you restart the bad udp2raw connections when needed. If done correctly, the old traffic will stay vaild without outage.

lars18th commented 1 month ago

Hi,

Perhaps you're right and simplification is preferable.

However, the suggestion of udp2raw + tinyfecvpn is not the best. First of all, you need to take into account that tinyfecvpn is a VPN tool, and not a TUNNELING tool. Therefore it's overkill and complex to provide a redudant tunnel (with fec) for running wireguard on top of it (that is a VPN). In my opinion it will be preferable: N x udp2raw + UDPspeeder + wireguard. And then UDPspeeder will operate with multiple links... but in a simple way. Perhaps in an Active-Passive mode. And when the active connection is "disconnected" (in the sense that some roundtrip timeout expires) it changes to the next passive link. And with a simple script we can catch the change to restart to underlaying tunnel. You think that this is complex?

wangyu- commented 1 month ago

I already explained the idea of periodically reconnect + start a new connection in parallel and retire the old connection when new is ready doesn't work well, unless a smart way of measuring and comparing of the heath of tunnel, and pick the best among the parallel connection.

Perhaps in an Active-Passive mode. And when the active connection is "disconnected" (in the sense that some roundtrip timeout expires) it changes to the next passive link. And with a simple script we can catch the change to restart to underlaying tunnel.

This whole thread is talking about high packet loss. It's about higher packet loss (e.g 20%) vs lower (e.g 2%). The connection is not completely dead. So active standbandy + roundtrip timeout expires obviously is not the solution. In the beginning you were talking about ISP punishing long-term connections(with higher packet loss) but now you are talking about dead (or near dead) link?

Essentially what is needed is to try new connection in parallel and pick the best quality link in some smart way. It's not simple as some roundtrip timeout expires.

lars18th commented 1 month ago

Hi @wangyu- ,

I feel your response is right. Perhaps we're talking about different thinks. And obviosly it's best to maintain the simplicity and modularity.

Based on previous, what you think about provide some statistics of the current tunnel? Perhaps the --fifo could be used then to obtain these information and take the corresponding actions outside the tool. What you think?

And in another area: does anyone know of a multiplexing tool for udp tunnels? Not with the idea of extending throughtput. But for the purpose of controlling faulty tunnels. This could be then used on top of udp2raw and/or udpspeeder.

wangyu- / udp2raw