rapiz1 / rathole

A lightweight and high-performance reverse proxy for NAT traversal, written in Rust. An alternative to frp and ngrok.
Apache License 2.0
8.82k stars 441 forks source link

feat(transport): add websocket transport #290

Closed rucciva closed 9 months ago

rucciva commented 9 months ago

As per #134 , this would add support for using websocket transport. Especially useful in environment with strict policy that only allow incoming/outgoing connection via http(s) protocol.

rucciva commented 9 months ago

hi @rapiz1 , the docker build seem to fail with latest rust alpine (even 0.4.8 is failing). when i revert it to rust:1.69-alpine, it works, should i update the dockerfile and pin the docker version?

rapiz1 commented 9 months ago

I really appreciate your effort to implement this and test it! Do you have any idea why it doesn't compile with a newer rust version?

rucciva commented 9 months ago

i'm not sure why, i think i have very little experience in rust (i've just started learning rust last week) to solve that, but probably something related to upgrade in alpine image?

these are the error i got when re-building 0.4.8

186.6    Compiling rathole v0.4.8 (/home/rust/src)
311.0 error: linking with `cc` failed: exit status: 1
311.0   |
311.0   = note: LC_ALL="C" PATH="/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/bin:/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/bin/self-contained:/usr/local/cargo/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" VSLANG="1033" "cc" "-m64" "/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/rcrt1.o" "/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crti.o" "/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crtbeginS.o" "/tmp/rustcaJEWIT/symbols.o" "/home/rust/src/target/release/deps/rathole-85237e278a19f6f8.rathole.d5d17b4f8f7e821a-cgu.0.rcgu.o" "-Wl,--as-needed" "-L" "/home/rust/src/target/release/deps" "-L" "/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/lib" "-Wl,-Bstatic" "-lssl" "-lcrypto" "-lunwind" "-lc" "/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/lib/libcompiler_builtins-263be272f87964bd.rlib" "-Wl,-Bdynamic" "-Wl,--eh-frame-hdr" "-Wl,-z,noexecstack" "-nostartfiles" "-L" "/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/lib" "-L" "/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained" "-o" "/home/rust/src/target/release/deps/rathole-85237e278a19f6f8" "-Wl,--gc-sections" "-static-pie" "-Wl,-z,relro,-z,now" "-Wl,-O1" "-Wl,--strip-all" "-nodefaultlibs" "/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crtendS.o" "/usr/local/rustup/toolchains/1.72.1-x86_64-unknown-linux-musl/lib/rustlib/x86_64-unknown-linux-musl/lib/self-contained/crtn.o"
311.0   = note: /usr/lib/gcc/x86_64-alpine-linux-musl/12.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lssl: No such file or directory
311.0           /usr/lib/gcc/x86_64-alpine-linux-musl/12.2.1/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find -lcrypto: No such file or directory
311.0           collect2: error: ld returned 1 exit status
311.0           
311.0 
311.3 error: could not compile `rathole` (bin "rathole") due to previous error

What about building with debian based image instead? can we still got statically linked binary?

rapiz1 commented 9 months ago

I think it's due to alpine removing TLS related library. Actually, TLS in the docker image never works because we didn't ship the TLS in linking. On the other hand, statically linking against TLS is not something commonly approved and have troublesome nuance. So I think we can disable TLS related features(including tls, and your ws) in the docker image build, given that it never worked Releasing could be more problematic because we use docker to build for different platforms...And we still have the issue of linking TLS library. Did you try to trigger a release for this PR?

rucciva commented 9 months ago

re-building 0.4.8 with debian based image seems to be successful

 [+] Building 482.4s (15/15) FINISHED                                                                                             docker:rancher-desktop
 => [internal] load build definition from Dockerfile                                                                                               0.0s
 => => transferring dockerfile: 397B                                                                                                               0.0s
 => [internal] load .dockerignore                                                                                                                  0.0s
 => => transferring context: 141B                                                                                                                  0.0s
 => [internal] load metadata for docker.io/library/rust:latest                                                                                     8.1s
 => [auth] library/rust:pull token for registry-1.docker.io                                                                                        0.0s
 => [builder 1/7] FROM docker.io/library/rust@sha256:911acdfd39276ead0dfb583a833f1db7d787ad0d5333848378d88f19e5fc158c                            267.2s
 => => resolve docker.io/library/rust@sha256:911acdfd39276ead0dfb583a833f1db7d787ad0d5333848378d88f19e5fc158c                                      0.0s
 => => sha256:911acdfd39276ead0dfb583a833f1db7d787ad0d5333848378d88f19e5fc158c 988B / 988B                                                         0.0s
 => => sha256:6a2ac38604fce995fd586c8d760147f71d9113dcbe84a7fcddcb30c60a1ec7ee 1.38kB / 1.38kB                                                     0.0s
 => => sha256:5789de4d5ecc8b55d521f243992f1b6493ced13a837fb5887859e50b72748a31 6.10kB / 6.10kB                                                     0.0s
 => => sha256:167b8a53ca4504bc6aa3182e336fa96f4ef76875d158c1933d3e2fa19c57e0c3 49.56MB / 49.56MB                                                  67.7s
 => => sha256:b47a222d28fa95680198398973d0a29b82a968f03e7ef361cc8ded562e4d84a3 24.03MB / 24.03MB                                                  24.9s
 => => sha256:debce5f9f3a9709885f7f2ad3cf41f036a3b57b406b27ba3a883928315787042 64.11MB / 64.11MB                                                 109.8s
 => => sha256:1d7ca7cd2e066ae77ac6284a9d027f72a31a02a18bfc2a249ef2e7b01074338b 211.04MB / 211.04MB                                               240.9s
 => => sha256:2f47d826831b715d0a34a7f72c69942043b7e90e909e3e0565ebddaeec280c1a 190.45MB / 190.45MB                                               239.7s
 => => extracting sha256:167b8a53ca4504bc6aa3182e336fa96f4ef76875d158c1933d3e2fa19c57e0c3                                                          4.9s
 => => extracting sha256:b47a222d28fa95680198398973d0a29b82a968f03e7ef361cc8ded562e4d84a3                                                          1.5s
 => => extracting sha256:debce5f9f3a9709885f7f2ad3cf41f036a3b57b406b27ba3a883928315787042                                                          5.8s
 => => extracting sha256:1d7ca7cd2e066ae77ac6284a9d027f72a31a02a18bfc2a249ef2e7b01074338b                                                         14.5s
 => => extracting sha256:2f47d826831b715d0a34a7f72c69942043b7e90e909e3e0565ebddaeec280c1a                                                         11.0s
 => [internal] load build context                                                                                                                  0.0s
 => => transferring context: 3.10kB                                                                                                                0.0s
 => CACHED [stage-1 1/2] WORKDIR /app                                                                                                              0.0s
 => [builder 2/7] RUN apt update && apt install libssl-dev                                                                                        11.5s
 => [builder 3/7] WORKDIR /home/rust/src                                                                                                           0.0s 
 => [builder 4/7] COPY . .                                                                                                                         0.3s
 => [builder 5/7] RUN cargo build --locked --release --features client,server,noise,hot-reload                                                   193.0s
 => [builder 6/7] RUN mkdir -p build-out/                                                                                                          0.9s
 => [builder 7/7] RUN cp target/release/rathole build-out/                                                                                         0.8s
 => [stage-1 2/2] COPY --from=builder /home/rust/src/build-out/rathole .                                                                           0.2s
 => exporting to image                                                                                                                             0.2s
 => => exporting layers                                                                                                                            0.2s
 => => writing image sha256:5e0c5a54612f0ed20d8edb9349d6ee77fe80756f83de014ca172eee6f1e4a68d                                                       0.0s
rapiz1 commented 9 months ago

BTW, in the long term, we still would like to seek a cross-platform, statically linking TLS solution. rustls is the best bet but it still needs some time to be practical

rapiz1 commented 9 months ago

RUN cargo build --locked --release --features client,server,noise,hot-reload

@rucciva I suspect it's this line that caused the difference, instead of the base image. tls is not used at all

rucciva commented 9 months ago

So I think we can disable TLS related features(including tls, and your ws) in the docker image build, given that it never worked

ah thats unfortunate, can we build multiple docker image tag instead? one that include all features (including related tls dependency, maybe using debian-slim flavor) and the existing one?

RUN cargo build --locked --release --features client,server,noise,hot-reload

@rucciva I suspect it's this line that caused the difference, instead of the base image. tls is not used at all

that confuses me. as far as what i understand, those features doesn't include tls right? but the build fails due too tls related error.

rucciva commented 9 months ago

one that include all features (including related tls dependency, maybe using debian-slim flavor) and the existing one?

For example like this one

FROM rust:bookworm as builder
RUN apt update && apt install -y libssl-dev
WORKDIR /home/rust/src
COPY . .
ARG FEATURES
RUN cargo build --locked --release --features ${FEATURES:-client,server,noise,hot-reload}
RUN mkdir -p build-out/
RUN cp target/release/rathole build-out/

FROM gcr.io/distroless/cc-debian12
WORKDIR /app
COPY --from=builder /home/rust/src/build-out/rathole .
USER 1000:1000
ENTRYPOINT ["./rathole"]

The github action could look like this. you can pull it from here. Although it is slightly bigger at 27mb (due to libc, libssl, and libgcc) compared to 4mb when using alpine. But the plus is it can contains all features including tls and websocket.

rapiz1 commented 9 months ago

Yes the dockerfile looks good to me. Go ahead!

rucciva commented 9 months ago

hi @rapiz1 , on second thought, should we make the dockerfile by default contains all features or maybe just produce docker image withh all features only?

My reasoning is as follow:

Screenshot 2023-10-01 at 13 32 46
rapiz1 commented 9 months ago

We can go with one docker image containing all features. I see you change the base image to debian so tls should work.

rucciva commented 9 months ago

great, so i think i made the last change on this PR. you can see the docker job here. I've also made sure it runs fine

Screenshot 2023-10-01 at 13 38 03
rapiz1 commented 9 months ago

Thanks for your contribution very much!

fernvenue commented 9 months ago

Is it possible to use websocket for transport and noise for encryption?

rucciva commented 9 months ago

Is it possible to use websocket for transport and noise for encryption?

Hi @fernvenue , in theory it is possible to encrypt websocket frame using noise, i've done that with go. The existing noise transport is also a wrapper to tcp transport, so it might be possible to wrap the websocket transport instead.

But why would you want that? IMHO The reason websocket is needed here is mainly to share an existing https port or where the only allowed protocol is http(s). If you can allocate dedicated port to rathole than it is best to go with the existing noise transport.

fernvenue commented 9 months ago

@rucciva thanks for your reply.

If you can allocate dedicated port to rathole than it is best to go with the existing noise transport.

Yea, I can allocate dedicated port, I thought websocket will be a better option for performance, especially for latency?

rucciva commented 9 months ago

@rucciva thanks for your reply.

If you can allocate dedicated port to rathole than it is best to go with the existing noise transport.

Yea, I can allocate dedicated port, I thought websocket will be a better option for performance, especially for latency?

I'm not sure about that, since websocket is a wrapper to tcp transport (when not using tls), the same with noise transport. If we add noise on top of websocket then it has more processing (noise + websocket + tcp) compared to existing noise transport (noise + tcp). Maybe you can test this first by comparing latency of rathole using plain tcp transport vs using websocket transport (without tls). If websocket transport is better than it might be worth to consider adding noise on top of websocket.

fernvenue commented 9 months ago

Hi @rucciva, I just did benchmark test, here's the result:

$ iperf3 -c ::1 -p 5202
Connecting to host ::1, port 5202
[  5] local ::1 port 42420 connected to ::1 port 5202
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   868 MBytes  7.28 Gbits/sec    0   1023 KBytes       
[  5]   1.00-2.00   sec   861 MBytes  7.22 Gbits/sec    0    895 KBytes       
[  5]   2.00-3.00   sec   865 MBytes  7.26 Gbits/sec    0   1023 KBytes       
[  5]   3.00-4.00   sec   858 MBytes  7.19 Gbits/sec    0   1023 KBytes       
[  5]   4.00-5.00   sec   866 MBytes  7.27 Gbits/sec    0   1023 KBytes       
[  5]   5.00-6.00   sec   860 MBytes  7.21 Gbits/sec    0   1023 KBytes       
[  5]   6.00-7.00   sec   859 MBytes  7.20 Gbits/sec    0   1023 KBytes       
[  5]   7.00-8.00   sec   851 MBytes  7.14 Gbits/sec    0    895 KBytes       
[  5]   8.00-9.00   sec   859 MBytes  7.20 Gbits/sec    0    895 KBytes       
[  5]   9.00-10.00  sec   848 MBytes  7.11 Gbits/sec    0   1023 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  8.39 GBytes  7.21 Gbits/sec    0             sender
[  5]   0.00-10.04  sec  8.38 GBytes  7.17 Gbits/sec                  receiver

iperf Done.
$ iperf3 -c ::1 -p 5202 -u
Connecting to host ::1, port 5202
[  5] local ::1 port 43213 connected to ::1 port 5202
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   1.00-2.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   2.00-3.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   3.00-4.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   4.00-5.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   5.00-6.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   6.00-7.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   7.00-8.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   8.00-9.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   9.00-10.00  sec   128 KBytes  1.05 Mbits/sec  4  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  1.25 MBytes  1.05 Mbits/sec  0.000 ms  0/40 (0%)  sender
[  5]   0.00-10.04  sec  80.0 KBytes  65.3 Kbits/sec  0.090 ms  0/40 (0%)  receiver

iperf Done.
$ iperf3 -c ::1 -p 5202
Connecting to host ::1, port 5202
[  5] local ::1 port 42788 connected to ::1 port 5202
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   821 MBytes  6.89 Gbits/sec    0   1023 KBytes       
[  5]   1.00-2.00   sec   812 MBytes  6.82 Gbits/sec    0    895 KBytes       
[  5]   2.00-3.00   sec   812 MBytes  6.82 Gbits/sec    0   1023 KBytes       
[  5]   3.00-4.00   sec   816 MBytes  6.85 Gbits/sec    0    895 KBytes       
[  5]   4.00-5.00   sec   820 MBytes  6.88 Gbits/sec    0    895 KBytes       
[  5]   5.00-6.00   sec   816 MBytes  6.85 Gbits/sec    0    895 KBytes       
[  5]   6.00-7.00   sec   819 MBytes  6.87 Gbits/sec    0    895 KBytes       
[  5]   7.00-8.00   sec   826 MBytes  6.93 Gbits/sec    0   1023 KBytes       
[  5]   8.00-9.00   sec   838 MBytes  7.03 Gbits/sec    0    895 KBytes       
[  5]   9.00-10.00  sec   841 MBytes  7.06 Gbits/sec    0    895 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  8.03 GBytes  6.90 Gbits/sec    0             sender
[  5]   0.00-10.04  sec  8.02 GBytes  6.86 Gbits/sec                  receiver

iperf Done.
$ iperf3 -c ::1 -p 5202 -u
Connecting to host ::1, port 5202
[  5] local ::1 port 42907 connected to ::1 port 5202
[ ID] Interval           Transfer     Bitrate         Total Datagrams
[  5]   0.00-1.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   1.00-2.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   2.00-3.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   3.00-4.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   4.00-5.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   5.00-6.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   6.00-7.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   7.00-8.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   8.00-9.00   sec   128 KBytes  1.05 Mbits/sec  4  
[  5]   9.00-10.00  sec   128 KBytes  1.05 Mbits/sec  4  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-10.00  sec  1.25 MBytes  1.05 Mbits/sec  0.000 ms  0/40 (0%)  sender
[  5]   0.00-10.04  sec  80.0 KBytes  65.3 Kbits/sec  0.067 ms  0/40 (0%)  receiver

iperf Done.
$ echo 'GET http://[::1]:8080' | vegeta attack -rate 0 -duration 30s -max-workers 48 > ./tcp.report
$ vegeta report ./tcp.report
Requests      [total, rate, throughput]         474269, 15809.45, 15808.44
Duration      [total, attack, wait]             30.001s, 29.999s, 1.908ms
Latencies     [min, mean, 50, 90, 95, 99, max]  135.036µs, 2.07ms, 1.873ms, 3.789ms, 4.482ms, 6.065ms, 30.415ms
Bytes In      [total, mean]                     291675435, 615.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:474269  
Error Set:
$ echo 'GET http://[::1]:8080' | vegeta attack -rate 0 -duration 30s -max-workers 48 > ./websocket.report
$ vegeta report ./websocket.report 
Requests      [total, rate, throughput]         441000, 14700.08, 14698.54
Duration      [total, attack, wait]             30.003s, 30s, 3.138ms
Latencies     [min, mean, 50, 90, 95, 99, max]  145.951µs, 2.26ms, 2.022ms, 4.105ms, 4.896ms, 6.771ms, 45.308ms
Bytes In      [total, mean]                     271215000, 615.00
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:441000  
Error Set:

Now we have answer about this question, if we want best performance, maybe keep using TCP as protocol instead of WebSocket will be great, yea, for both bandwidth and latency.

rucciva commented 9 months ago

Its awesome @fernvenue , thanks for the benchmark.

Agreed with your result. For maximal performance without compromising security, the existing noise transport should be prefered.