net4people / bbs

Forum for discussing Internet censorship circumvention
3.48k stars 82 forks source link

WireGuard with obfuscation support #88

Open el3xyz opened 3 years ago

el3xyz commented 3 years ago

Hey all,

Thanks to David Fifield for invitation to this forum. WireGuard is known to be one of the most secure and fastest (due to kernel space implementation) VPN protocols. Unfortunately it's quite easily tracked and blocked by DPI due to following issues:

I've added some obfuscation support to make WG detection slightly more difficult:

Code can be found here: https://github.com/el3xyz/wireguard-linux-compat

However this approach is sensitive to statistical modelling based detection and I'm seeking the ways to improve it. One problem is that all traffic going to a single/few IP is easily detected but this should be addressed by split-tunneling. What are other issues?

Cheers

wkrp commented 3 years ago

Thanks for starting this discussion and especially for providing an implementation. There have been a few threads in the past on circumvention/obfuscation for WireGuard (2016, 2018), but as far as I know they never went anywhere. A sample of running code is a great way to move the discussion forward. In my opinion, WireGuard with additional blocking resistance is a highly realizable goal with many paths to success—we don't need to wait for a perfect plan before starting to prototype implementations.

Some of the challenges of making WireGuard blocking-resistant is that it's based on UDP datagrams rather than TCP streams, and that it's usually implemented in the kernel. Existing circumvention systems tend to focus on TCP (though not exclusively), and are usually implemented in userspace.

Yegor Ievlev has posted a recipe showing how to interface kernel WireGuard with a userspace Shadowsocks (which does support UDP proxying). The client configures its WireGuard with an endpoint of 127.0.0.1:51820, which is the ss-tunnel UDP listener. The packets travel over the Shadowsocks tunnel. The server ss-server receives the packets and forwards them to the WireGuard server (which does not have to be on the same host).

server$ ss-server -s 0.0.0.0 -s ::0 -p 443 -k <password> -m aes-128-gcm -U
client$ ss-tunnel -s <shadowsocks-server> -p 443 -l 51820 -L <wireguard-server>:51820 -k <password> -m aes-128-gcm -U

Recent versions of the Pluggable Transports specification consider UDP proxying—see Section 1.5. I don't have enough personal experience with it to know how it works.

In your thread on the WireGuard mailing list, Jason Donenfeld suggests suggests an alternative to setting endpoint to a local UDP port: use a Netfilter module in the kernel, or NFQUEUE to delegate to a userspace program, possibly even reusing the WireGuard link's existing keys instead of using a separate set of keys for the obfuscation layer:

iptables -t mangle -A OUTPUT -m wg_obfs --out-iface wg0 -j WG_OBFS_OUT && \
iptables -t mangle -A INPUT -m wg_obfs --in-iface wg0 -j WG_OBFS_IN

Another consideration is what the protocol obfuscation should look like. I think your approach of making packets look random—analogous to obfs4 and Shadowsocks—is a great place to start. There's no strong theoretical basis for why such an approach should work, but in practice it has proven effective.

You might consider expanding the random padding schema to permit packets that are all padding. That will break the 1:1 correspondence between the unobfuscated and obfuscated packet streams.

wkrp commented 3 years ago

Lately I have been working a lot with the Turbo Tunnel, the main claim of which is that circumvention tunnels should conceptually a sequence of discrete packets, not a continuous data stream. (Even if those packets end up being encapsulated into a stream-like cover protocol.) The Turbo Tunnel transports I have implemented so far interact with user programs over a TCP interface (e.g. SOCKS). But I think it should be possible to adapt the idea slightly to proxy protocols, like WireGuard, that are natively packet-based.

One of the benefits of a Turbo Tunnel design is that it permits transmitting a userspace data stream over an obfuscated channel that is potentially unreliable or out of order. The inner session and reliability protocol (KCP or QUIC, for example) breaks the stream into packets and takes care of concerns like in-order delivery and retransmission (essentially, implementing facilities that are normally provided by the kernel, in userspace). But with WireGuard, there would be no need for a separate inner session and reliability layer. The packets come straight from the kernel, which will do its own session and reliability management.

Here's an example of the abstract procedure, with end-to-end stream delivery on the left and end-to-end packet delivery on the right. The packet procedure is actually simpler, because it can take advantage of kernel facilities, rather than reimplementing them.

--- sender ----------------------------    --- sender ---------------------------
1. Accept a stream on a local TCP port.    1. Accept packets on a local UDP port.
2. Break it into a sequence of packets     ..
   with session metadata.
3. Encapsulate the packets into an         2. Encapsulate the packets into an
   obfuscated channel.                        obfuscated channel.
--- receiver --------------------------    --- receiver -------------------------
4. Decapsulate packets from the            3. Decapsulate packets from the
   obfuscated channel.                        obfuscated channel.
5. Reassemble the packets into a stream.   ..
6. Forward the stream to a TCP port.       4. Forward the packets to a UDP port.

dnstt may not be the best target for such an adaptation, just because the capacity per DNS query is so small. (Though maybe it would be possible to reduce the MTU on the wireguard interface.) Though it's still in development, Champa would be a convincing demonstration of the idea, as it's a polling-based HTTP channel, optionally through an intermediary, quite unlike the native UDP of WireGuard. Probably it would be easiest to augment both client and server with a UDP listener and a UDP forwarding address: any UDP payload received by the listener is encapsulated into the tunnel, and any UDP payload received through the tunnel is forwarded to the configured address (with a source address of its own listening port). Remove the KCP and smux layer. The details of encapsulation and everything else may remain the same.

el3xyz commented 3 years ago

Yegor Ievlev has posted a recipe showing how to interface kernel WireGuard with a userspace Shadowsocks

May I ask why is this needed? To convert TCP sessions into a series of UDP datagrams, so RST will no longer work? In theory WG allows interesting trick - we can track number of TCP retransmits and change obfuscation method if there are too many. Also it's possible to use raw socket instead of UDP and construct any protocol header (SCTP, QUIC, TCP, you name it).

wkrp commented 3 years ago

May I ask why is this needed? To convert TCP sessions into a series of UDP datagrams, so RST will no longer work?

The idea is not to wrap Shadowsocks in WireGuard, but to wrap WireGuard in Shadowsocks. Shadowsocks is not the important part per se—it's just an example of a successful blocking-resistant tunnel protocol. WireGuard provides nice features and security guarantees; Shadowsocks provides blocking resistance: put them together to get a blocking-resistant VPN protocol. Abstractly, the outer tunnel protocol could be TCP, UDP, or anything else. Yegor's post shows how to get the kernel's WireGuard packets into userspace so that an ordinary program can work on them, by setting endpoint to a local UDP port.

One advantage of doing obfuscation in a separate program is that you are not constrained to following the kernel's packet-sending schedule. You can delay a packet before sending it, or send extra "chaff" packets according to your own schedule, allowing you to shape the traffic profile however you need. I don't know if a Netfilter module or NFQUEUE permits that level of traffic modification. I don't mean to emphasize the traffic analysis point too much, though, because experience shows it's not yet necessary for effective circumvention (the per-packet content obfuscation you've implemented is certainly more important).

wkrp commented 3 years ago

Code can be found here:

I got this code working as follows. You need both the patched wireguard-linux-compat (I used commit 721242f0) and the patched wireguard-tools (I used bfc5f2d7) that knows about the altered device name.

On both peers I installed the kernel module and tools, and generated keys.

$ sudo apt install build-essential linux-headers-amd64
$ cd
$ git clone https://github.com/el3xyz/wireguard-linux-compat
$ cd wireguard-linux-compat/src
$ make DEV=wireguard_obf
$ sudo make install
$ cd
$ git clone https://github.com/el3xyz/wireguard-tools
$ cd wireguard-tools/src
$ make DEV=wireguard_obf
$ cd
$ (umask 077; ~/wireguard-tools/src/wg genkey > privatekey)
$ ~/wireguard-tools/src/wg pubkey < privatekey > publickey

Then set up each peer to refer to each other.

peera$ sudo ip link add dev wgobf0 type wireguard_obf
peera$ sudo ip address add dev wgobf0 192.168.2.1 peer 192.168.2.2
peera$ sudo ~/wireguard-tools/src/wg set wgobf0 \
           listen-port 51820 \
           private-key privatekey \
           peer [peerb-publickey] \
           allowed-ips 0.0.0.0/0 \
           endpoint [peerb-ip]:51820
peera$ sudo ip link set up dev wgobf0
peerb$ sudo ip link add dev wgobf0 type wireguard_obf
peerb$ sudo ip address add dev wgobf0 192.168.2.2 peer 192.168.2.1
peerb$ sudo ~/wireguard-tools/src/wg set wgobf0 \
           listen-port 51820 \
           private-key privatekey \
           peer [peera-publickey] \
           allowed-ips 0.0.0.0/0 \
           endpoint [peera-ip]:51820
peerb$ sudo ip link set up dev wgobf0

You can verify that the wireguard_obf module was loaded with dmesg, and see that the wgobf0 network interface exists with ip address and ip link.

Then, try pinging one peer from the other:

peerb$ ping -c 5 192.168.2.1
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=140 ms
64 bytes from 192.168.2.1: icmp_seq=2 ttl=64 time=69.7 ms
64 bytes from 192.168.2.1: icmp_seq=3 ttl=64 time=69.6 ms
64 bytes from 192.168.2.1: icmp_seq=4 ttl=64 time=70.9 ms
64 bytes from 192.168.2.1: icmp_seq=5 ttl=64 time=69.7 ms

--- 192.168.2.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 11ms
rtt min/avg/max/mdev = 69.627/83.887/139.502/27.811 ms

The obfuscated UDP payloads look like this:

->
0000   3c a8 9c 8f e7 9d e1 62 6e 1e b1 03 66 34 d5 ed
0010   84 6f d1 1f 3b 1d af 0f 9d 53 6e 3b 5c bc 3c e8
0020   77 24 c7 0b 4b 15 e6 bf f1 5c d5 c8 fe bf 66 88
0030   bb fd 41 36 0b 3d a0 fc 9c c4 70 a8 7d 6b ec 84
0040   e4 16 27 d3 d1 bf 1e 61 f2 7d 22 43 b0 6a 69 ec
0050   59 c6 78 89 fe dd ba f0 ca 5a 3e 45 d2 b0 12 d6
0060   dc dc 7c 17 15 0d 5d b7 cc f1 e9 a3 a1 de 01 bb
0070   eb 21 d9 86 89 8d 73 02 d1 fe 6a dc ae e8 40 19
0080   dd f6 da e8 04 b8 ca 19 8e 82 96 56 81 cd 31 96
0090   e2 01 fd 19 34 fd 51 5a 74 13 a0 4c 19 db a0 00
00a0   b2 6b 54 fd 00 00 3e 08 00 80 1f f7 00 f0 c1 08
00b0   e8 c8 26 77 e8 2a 94 49 ae 8e 66 d2 6f be 6d 68
00c0   6e 02 e6 f7 bc 58 91 36 31 cd ce 71
<-
0000   31 ee 26 f7 0c 1c 95 8a 22 50 68 38 86 fe 41 03
0010   83 91 d8 62 dc 98 03 91 22 ac f5 f0 97 9c 2f db
0020   05 40 cf 4d 85 02 2a a5 a3 50 cb 40 d0 25 a0 fa
0030   3f 29 6d dd 06 93 fc 34 c6 aa 89 5b 2f 8e 36 41
0040   26 27 b9 9e 75 63 a0 5b b1 0f 81 ea f8 31 fe c7
0050   fc 08 ff f9 7e c6 3f fe 37 a3 df 67 3a 1f c3 97
0060   c7 82 c2 90 73 2d 64 ec 97 ff 7d c1 83 14 0c 03
0070   87 fe 7f 90 a9 be 6a 28 8e 67 f5 dc 99 e5 92 b8
0080   84 e0 20 cf 4d 16 32 e1 5b 6f a3 f0 9d ad 80 f5
0090   56 62 21 26 3b 8d a0 d8
->
0000   f1 33 e5 a4 0b c0 fa 5a fc 89 ff 07 df 74 f9 e0
0010   c9 a2 51 a4 95 bf d0 18 09 34 0c 2c 75 fc a7 a2
0020   6c 91 f1 77 0c f4 84 e6 ac d3 a3 bf 10 9d e4 b7
0030   4e 26 04 f9 43 30 7d f8 d9 4c 88 da 10 56 6f a9
0040   5b cf 10 cc 20 ff 3c 60 08 51 a0 f3 d8 df a9 7a
0050   54 5e c9 cf ca 8b 54 3b ae f9 6c 56 0a 5b f9 0a
0060   b1 02 ee 2b 90 b5 3a ec f9 b0 30 13 68 5e 17 f7
0070   cf 1b 50 0c 0b 32 d4 ad de 7a f7 0f 1d 73 c0 48
<-
0000   1d fb a1 b1 9d 56 70 4e 3e c0 d8 f9 1f 20 ac ac
0010   ba ac 08 2c 86 1f 78 0a 3d 56 ba 39 b9 12 c6 d7
0020   5e e4 0b 62 2d 88 fc 11 b4 ad 82 58 ba 1d ef 8a
0030   e9 f9 60 7b ae 24 49 bb f5 36 2f 55 1e 95 69 52
0040   8f fb 79 d3 cf 65 89 27 1d ea 84 4f f1 60 ec 85
0050   22 ad 22 ed 11 46 b1 2c d9 c1 53 a3 f8 1b fb 31
0060   e9 25 91 55 f6 49 9b 98 e6 df 25 fb c7 f0 4d 2f
0070   6a 36 a8 21 fa c8 cc b8 3c 19 69 9a a2 43 98 ab

Compare to non-obfuscated WireGuard. Note the ^\x01\x00 pattern that was mentioned in the thread on WireGuard blocking in Russia.

-> 
0000   01 00 00 00 59 b5 e8 ea 9d be dd 93 c7 f8 27 0b
0010   2f 22 d8 e1 26 b2 12 9e b2 4b 86 ce a9 95 af ea
0020   16 a0 9c d1 86 7a 12 56 05 89 ec ed b3 5b 4c 4b
0030   1d d1 c8 48 9e 48 be b7 b4 7d fe 38 1f ec 9b eb
0040   0b 41 f6 86 72 13 c9 b0 27 94 cf 2e cc a2 9d 83
0050   ce 92 59 43 fc d1 c9 7a aa 71 fa 7b 80 d7 3f be
0060   5a c8 10 8a ba 0f 8e 0a 16 58 17 cd 8e c5 a1 f2
0070   11 27 08 e7 b5 f1 b3 20 e4 0c b1 5a 7b d5 b4 aa
0080   4e aa 8f 3c 00 00 00 00 00 00 00 00 00 00 00 00
0090   00 00 00 00
<-
0000   02 00 00 00 98 c1 29 c7 59 b5 e8 ea f3 13 5b 1b
0010   0c 2e ae dd b3 89 58 b7 10 fe 14 43 7c a8 14 80
0020   00 6f f2 b3 20 0b 39 52 c9 ae e8 1d f8 a5 d6 6f
0030   4b 24 e3 8e 97 b2 7a bb 91 ac a6 70 61 f9 af 74
0040   26 de 2a 55 58 48 49 5f 9f 63 8e 84 00 00 00 00
0050   00 00 00 00 00 00 00 00 00 00 00 00
->
0000   04 00 00 00 98 c1 29 c7 00 00 00 00 00 00 00 00
0010   46 fd 49 2c f1 c9 31 17 ba 98 66 a1 86 3d 93 8c
0020   07 e7 cf 38 f3 f4 42 77 f0 de bc f6 f1 df 29 1c
0030   7b 31 75 6d 80 7a f7 d6 cd 50 da 6e ec f0 df 56
0040   bc dd 46 b0 a1 d9 55 ee 29 79 4e 63 ea e4 43 0c
0050   e6 cc ad fb 91 fd e8 c2 90 8e 3b 50 63 ad 19 cd
0060   e1 d9 b2 b7 de d8 74 cb 7f c8 f6 1b 1b 05 32 ca
0070   71 be 99 16 63 82 51 e8 b6 12 fd 18 5e 6c d6 4d
<-
0000   04 00 00 00 59 b5 e8 ea 00 00 00 00 00 00 00 00
0010   73 d5 c9 48 87 34 1c 02 84 80 94 11 2f 07 92 37
0020   8d 52 45 28 ed 13 5a 12 0f f3 34 6a 43 df 84 9d
0030   0d f4 4e b4 d7 cf 69 2d d4 11 2c eb ec 2b 1a 2e
0040   44 d0 68 1b 10 9f d7 12 39 cc d3 56 cd df e9 d3
0050   69 93 5e 55 3e 4f 6e c5 0e 73 b3 d3 37 36 a0 99
0060   d2 fa aa a0 eb ce eb c7 aa 01 7e 33 29 2c 18 ff
0070   32 2a 12 95 9d 70 48 49 b1 36 d3 b6 8a 9c 2a 5b
xiaokangwang commented 3 years ago

I believe there are several additional topics about the obfuscation of WireGuard connections.

  1. The choice of creating a kernel-mode implementation increases its deployment cost, a user-mode implementation may achieve a similar result while streamlining the deployment process. A kernel-mode implementation will only work on a particular operating system and require full and unrestricted access to the device. This means it won't work if the user runs it on a different kernel or the user application does not have unrestricted access such as on Android. Wireguard does not require kernel-level access for its functionality, the kernel-mode driver is only used to improve performance. This can be great for an established standard, but for experimental softwares, the cost-benefit balance will shift. Unless there is a general-purpose ebpf software development framework within the kernel and allow a proxy protocol to be updated independently of the kernel, it is very unlikely for an in-kernel anti-censorship proxy protocol solution to be attractive compare to a user-mode cross-platform one. This is especially true since this kind of patch is very unlikely to be accepted into the kernel, and the developer will need to update the code to adapt to every change in kernel API.

  2. The general purpose obfuscation of UDP communication can be accomplished with proxy chaining. It can be set up with either server address rewriting or socks5 forward proxy. Socks5 requires client software support. If the server address may change, or the client needs to communicate with more than one peer(in the case of WebRTC), only socks5 can be used. There are existing tools that can obfuscate a general UDP stream, like UDPSpeeder or udp2raw. If a transport protocol requires SOCKS5 support for it to work with proxy chaining, it may require modifying application source code to enable Socks5 support. Fortunately, it is rather easy when you have access to the source code, something like this will be enough for WebRTC. (This is created for a yet-to-be-published prototype WebRTC tunnel proxy.)

  3. As for the obfuscation of UDP(DGRAM) data, I would like to summarize how this is currently done in all kinds of tools. I hope this can assist the discussion. V2Ray have some support for UDP obfuscation. Two kinds of obfuscation are supported for mKCP transport. Header based obfuscation add a fake identifier (like some magic number) to the packet so that for network UDP sabotaging("QoS") device it will appear as traffic generated by another program. Encryption based obfuscation("seed") that just encrypt the packet so that it looks like a random number. VLite has a packet obfuscation framework that supports the dynamic configuration of obfuscation settings. An obfuscation scheme is defined as a sequence of transform layers. For each layer, input data, and stage parameter will be supplied. Each layer will be required be perform a forward transform("Mask") in sequence when sending a packet, and an inverse transforms("UnMask") in reverse sequence when receiving a packet(Source Code). In this way, each stage can be kept simple and reusable. A more complex obfuscation scheme can be created from reusable transform layers. Geph4(sosistab) pad the packet to a random length, and encrypts the handshake packet with one of the keys derivated from the current time and endpoint public key. The data packets are encrypted. UDPSpeeder does not support encryption(just XOR Masking), however, it reshapes the traffic with Forward Error Correction. This makes a model trained on the underlying traffic less useful on the traffic transferred through this tunnel. Udp2raw can not only encrypt the traffic but also optionally rewrite the packet into TCP or ICMP packets. In this way, the traffic will not trigger transit network UDP sabotaging rules("QoS").

  4. Protocol tunnel architecture design is better when focus on creating individually useful components that can be stacked together, and decrease coupling when possible. In the V2Ray ecosystem, different components are designed to be interchangeable and stackable if possible. After the integration of VLite into V2Ray, a user will be able to set up something that resembles a Turbo tunnel with an outbound with mKCP transport that transforms a stream into a datagram stream, and chain proxy it through VLite oubound that provides connection stability & quality assist, if the VLite outbound is configured to output streams, this stream can then be sent over(chain proxy) an outbound with any proxy protocol currently supported by V2Ray, may it be VMess or Shadowsocks or Socks or HTTP proxy over any supported transports. Every single component has its own standalone utility value and can be individually developed, tested, and used, yet, they can be combined together to create a complex stack. This makes it easy to create individual components while allowing more advanced construction.

el3xyz commented 3 years ago

Thanks for sharing this, Xiaokang Wang

Actually I agree with all your points, here are my 2 cents:

  1. Initially I started this weekend project, because I had not quite satisfying experience with ShadowSocks and V2Ray. I have 2-3x transfer rate with WG and longer battery life. I was running both tunnels on rooted Android phone, with WG in kernel.

  2. I wasn't aiming for any sort of large scale deployment (actually just myself). It was quite interesting to hear lot of different opinions on this, including ones from WG folks.

  3. Doing everything in kernel has its own benefits (besides performance). For instance one can analyze connection quality just examining number of retransmits for every TCP connection routed through VPN and take actions accordingly (switch destination, enable FEC, etc). The same can be done, of course, in the userspace layer, but in less reliable way.

klzgrad commented 3 years ago

eBPF seems useful for adding a small amount of varying custom obfuscation and also a big improvement on usability. Though last I looked it was not very easy to mutate packets in this way, if not entirely infeasible.

xiaokangwang commented 3 years ago

eBPF seems useful for adding a small amount of varying custom obfuscation and also a big improvement on usability. Though last I looked it was not very easy to mutate packets in this way, if not entirely infeasible.

Yes, it is not very easy to implement a proxy in this way in its current state, additional support and rework are required on the kernel side to make it work without significant effort and workarounds.

aabdellah commented 3 years ago

@el3xyz Thanks for your great work, this has helped me a lot.

I just have a few questions:

dereference23 commented 2 years ago

I've built dkms modules and renamed tools for Debian and Arch Linux.

The tools have nwg and nwg-quick names. Configuration files can be placed into /etc/notwireguard. systemd units also work.

https://github.com/dereference23/notwireguard-linux-compat/releases https://github.com/dereference23/notwireguard-tools/releases/

database64128 commented 2 years ago

Those interested in userspace WireGuard proxies can take a look at #117.

wkrp commented 2 years ago

In your thread on the WireGuard mailing list, Jason Donenfeld suggests suggests an alternative to setting endpoint to a local UDP port: use a Netfilter module in the kernel, or NFQUEUE to delegate to a userspace program

In September 2022 an implementation of the Netfilter module idea was posted to the WireGuard mailing list.

Iptables WireGuard obfuscation extension

Jason once suggested use a netfilter module for obfuscation. Here is one.

https://github.com/infinet/xt_wgobfs

It uses SipHash 1-2 to generate pseudo-random numbers in a reproducible way. Sender and receiver share a siphash secret key. Sender creates and receiver re-creates identical siphash output, if input is same. These siphash outputs are used for obfuscation.

  • The first 16 bytes of WG message is obfuscated.
  • The mac2 field is also obfuscated if it is all zeros.
  • Padding WG message with random bytes, which also has random length. They are from kernel get_random_bytes_wait() though.
  • Drop 80% of keepalive message at random. Again randomness is from kernel.
  • Change the Diffserv field to zero.
RomanValov commented 1 year ago

Hi @el3xyz . I'm playing around with your patches and want to say thanks for the work done. I have a question about cookie message and obtaining of obfuscator (key) in two scenarios:

  1. handshake initiation received (link)
  2. handshake response received (link)

In first case you're passing obfuscator (key) as a part of the handshake initation message. And I found your comments stating that you're doing this to avoid peer lookup on the server. But in second case you're actually doing a peer lookup on server side.

For me both cases looks pretty the same, but could share a bit of your motivation:

  1. why do you think it's bad to do a peer lookup at handshake initiation
  2. why do you think it's bad passing obfuscator key in handshake response message

?