xtaci / kcptun

A Quantum-Safe Secure Tunnel based on QPP, KCP, FEC, and N:M multiplexing.
MIT License
13.9k stars 2.54k forks source link

v20240828非常容易io closed #967

Open Frank-pv opened 1 month ago

Frank-pv commented 1 month ago

问题描述:一直使用作者的kcptun用于加速流量,对此非常感谢作者对开源的贡献。但因为水平有限只能将问题反馈出来

1、升级最新版本后,服务端经常出现io close,旧版本不会,具体表现为两次iperf测速后(跑满宽带),立马会出现无法发起任何连接(断流) 2、配置使用作者提供的配置https://github.com/xtaci/kcptun/issues/923 3、怀疑为7月27日的一次提交导致的,并且问题一直存在,链接:https://github.com/xtaci/kcptun/commit/4193bb63530df6bf7a9f35545fd97eafa7dd92f4

  1. 检查 -key xxx至少三遍,—— 检查一致
  2. 保证-nocomp, -datashard, -parityshard, -key, -crypt, -smuxver, -QPP -QPPCount两边一致。—— 检查一致
  3. 是否在服务器端,正确设定了转发的目标服务器地址 --target。—— 检查通过
  4. 是否在客户端,正确的连接到了 client的监听端口。—— 检查通过
  5. 如果第3条不确定,尝试在服务器上telnet target port试试。
  6. 防火墙是否关闭了UDP通信,或者设置了UDP的最大发包速率?——直连
  7. 两端的版本是否一致?—— 检查一致
  8. *是不是最新版本?——是*
  9. 两端分别是什么操作系统?——Ubuntu——centos,Rocky——openwrt
  10. 两端的输出日志是什么? Server 端 2024/08/30 11:36:06 remote address: 192.168.198.235:52815 2024/08/30 11:36:06 smux version: 2 on connection: [::]:39832 -> 192.168.198.235:52815 2024/08/30 11:36:20 remote address: 192.168.198.235:54825 2024/08/30 11:36:20 smux version: 2 on connection: [::]:39842 -> 192.168.198.235:54825 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(3) out: 127.0.0.1:3389 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(5) out: 127.0.0.1:3389 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(7) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(9) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(11) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(13) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(15) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(17) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(19) out: 127.0.0.1:3389 2024/08/30 11:36:29 stream opened in: 192.168.198.235:54825(21) out: 127.0.0.1:3389 2024/08/30 11:36:36 io: read/write on closed pipe 2024/08/30 11:36:53 stream opened in: 192.168.198.235:54825(23) out: 127.0.0.1:3389 2024/08/30 11:36:53 stream opened in: 192.168.198.235:54825(25) out: 127.0.0.1:3389

Client 2024/08/30 11:36:06 remote address: 192.168.198.235:52815 2024/08/30 11:36:06 smux version: 2 on connection: [::]:39832 -> 192.168.198.235:52815 2024/08/30 11:36:20 remote address: 192.168.198.235:54825 2024/08/30 11:36:20 smux version: 2 on connection: [::]:39842 -> 192.168.198.235:54825 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(3) out: 127.0.0.1:3389 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(5) out: 127.0.0.1:3389 2024/08/30 11:36:20 stream opened in: 192.168.198.235:54825(7) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(9) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(11) out: 127.0.0.1:3389 2024/08/30 11:36:23 stream opened in: 192.168.198.235:54825(13) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(15) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(17) out: 127.0.0.1:3389 2024/08/30 11:36:26 stream opened in: 192.168.198.235:54825(19) out: 127.0.0.1:3389 2024/08/30 11:36:29 stream opened in: 192.168.198.235:54825(21) out: 127.0.0.1:3389 2024/08/30 11:36:36 io: read/write on closed pipe 2024/08/30 11:36:53 stream opened in: 192.168.198.235:54825(23) out: 127.0.0.1:3389 2024/08/30 11:36:53 stream opened in: 192.168.198.235:54825(25) out: 127.0.0.1:3389

附上配置: server: { "smuxver": 2, "listen": "[::]:39810-39900", "target": "127.0.0.1:3389", "key": "123456789", "crypt": "aes", "mode": "fast", "mtu": 1400, "sndwnd": 2048, "rcvwnd": 2048, "datashard": 10, "parityshard": 0, "dscp": 46, "nocomp": true, "acknodelay": false, "nodelay": 1, "interval": 20, "resend": 2, "nc": 1, "sockbuf": 16777217, "smuxbuf": 16777217, "streambuf":4194304, "keepalive": 10, "pprof":false, "quiet":false, "tcp":false, "log": "/tmp/kcptun.log" }

客户端: { "smuxver": 2, "localaddr": "127.0.0.1:60002", "remoteaddr": "192.168.199.7:39810-39900", "key": "123456789", "crypt": "aes", "mode": "fast", "mtu": 1400, "sndwnd": 256, "rcvwnd": 2048, "datashard": 10, "parityshard": 0, "dscp": 46, "nocomp": true, "acknodelay": false, "nodelay": 1, "interval": 20, "resend": 2, "nc": 1, "conn": 1, "sockbuf": 16777217, "smuxbuf": 16777217, "streambuf":4194304, "keepalive": 10, "autoexpire":600, "quiet":true, "tcp":false, "log": "/tmp/kcptun.log" }

具体表现截图: image

xtaci commented 1 month ago

数据能完整传输么?

xtaci commented 1 month ago

比如实际传送搞一个大文件,有没有问题?

xtaci commented 1 month ago

是不是大约30s后出现

Frank-pv commented 1 month ago

是不是大约30s后出现

1、数据传输是完整的(不排除隧道内TCP完整性校验的结果) 2、问题不是出现在30S后,只要2-3次流量突刺就会触发bug,并且此时无法再发起任何TCP请求 3、通过套SS,在隧道内下载文件是完整的

xtaci commented 1 month ago

图片

用什么参数可以重现,我这没法重现

Frank-pv commented 1 month ago

图片

用什么参数可以重现,我这没法重现

服务端: [root@kvm-199-7 kcp]# uname -a Linux kvm-199-7 5.14.0-427.22.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Jun 19 17:35:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux [root@kvm-199-7 kcp]# cat kcp.conf { "smuxver": 2, "listen": "[::]:39810-39900", "target": "127.0.0.1:3389", "key": "123456789", "crypt": "aes", "mode": "fast", "mtu": 1400, "sndwnd": 2048, "rcvwnd": 2048, "datashard": 10, "parityshard": 0, "dscp": 46, "nocomp": true, "acknodelay": false, "nodelay": 1, "interval": 20, "resend": 2, "nc": 1, "sockbuf": 16777217, "smuxbuf": 16777217, "streambuf":4194304, "keepalive": 10, "pprof":false, "quiet":false, "tcp":false, "log": "/tmp/kcptun.log" } 客户端: Linux ubuntu-virtual-machine 6.5.0-41-generic #41~22.04.2-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 3 11:32:55 UTC 2 x86_64 x86_64 x86_64 GNU/Linux root@ubuntu-virtual-machine:/home/ubuntu/kcp# cat kcp.conf { "smuxver": 2, "localaddr": "127.0.0.1:60002", "remoteaddr": "192.168.199.7:39810-39900", "key": "123456789", "crypt": "aes", "mode": "fast", "mtu": 1400, "sndwnd": 256, "rcvwnd": 2048, "datashard": 10, "parityshard": 0, "dscp": 46, "nocomp": true, "acknodelay": false, "nodelay": 1, "interval": 20, "resend": 2, "nc": 1, "conn": 1, "sockbuf": 16777217, "smuxbuf": 16777217, "streambuf":4194304, "keepalive": 10, "autoexpire":600, "quiet":true, "tcp":false, "log": "/tmp/kcptun.log" }

附上测试视频: https://github.com/user-attachments/assets/9a515513-7e67-4fe3-897d-af8ba8e2e73f

Frank-pv commented 1 month ago

config.zip 这个是配置文件

xtaci commented 1 month ago

图片

Frank-pv commented 1 month ago

图片

确实不太科学,同一台机器确实没问题,不同机器的话问题又会复现出来

xtaci commented 1 month ago

因为某种内部防火墙,被RST了么

Frank-pv commented 1 month ago

因为某种内部防火墙,被RST了么

可以排除这个原因,因为是直联的

xtaci commented 1 month ago

这不好判断了,可以考虑换成其他的工具来测试,不一定非要iperf3

xtaci commented 1 month ago

image

我在client: WSL ubuntu, server: freebsd上 也没有出现断流的问题。

xtaci commented 1 month ago

我再研究下

Frank-pv commented 1 month ago

这不好判断了,可以考虑换成其他的工具来测试,不一定非要iperf3

嗯嗯,这个问题困扰我很久,您有时间再看看,目前我换回上一个版本了 https://github.com/user-attachments/assets/994cc98d-6234-4c73-9051-70cc349984c8

xtaci commented 1 month ago

你可以尝试下我最新的提交,自己编译下

xtaci commented 1 month ago

这个问题大概是这样的:

  1. Ctrl+C中断后,留在streambuf中的待发送数据,还在排队发送中,对端也还在持续echo
  2. 因此第二次再次链接的时候,表现在卡住了,因为上一个链接的数据包,还堆积在发送队列中(因为这次改了closeWait等待30秒才发起关闭,主要是处理有一些链接会 HALF_CLOSE的问题),导致了SMUX的streamSYN并没有及时发送处理,卡在开启链接。
xtaci commented 1 month ago

你可以尝试如下:(在最新的版本下,只改kcptun中的std/copy.go)

  1. 修改closeWait的时间,改为0,1
  2. 启动参数中,增大smuxbuffer的大小。防止堆积
xtaci commented 1 month ago

因此,我接下来要做的尝试如下:

区分server端的copy和client端的copy。

client端的copy是需要立即关闭的,不等待。

xtaci commented 1 month ago

https://github.com/xtaci/kcptun/commit/142ac6b48f218b17c80cbb7c1477eceb2bca97da

xtaci commented 1 month ago

怎么说呢,这个问题可以环节,但因为是复用在一条链路上,因此一定存在 对头阻塞问题,就是Ctrl+C下中断不了已经堆积在kcp发送队列的数据。

https://zh.wikipedia.org/wiki/%E9%98%9F%E5%A4%B4%E9%98%BB%E5%A1%9E

Frank-pv commented 1 month ago

你可以尝试如下:(在最新的版本下,只改kcptun中的std/copy.go)

  1. 修改closeWait的时间,改为0,1
  2. 启动参数中,增大smuxbuffer的大小。防止堆积

经验证,使用最新提交,closeWait设置为0,可以避免这个问题

image

Frank-pv commented 1 month ago

因此,我接下来要做的尝试如下:

区分server端的copy和client端的copy。

client端的copy是需要立即关闭的,不等待。

测试了一下,堆积应该是发生在server端,而不是client端

xtaci commented 1 month ago

对的,就是这个问题,ctrl+c后,服务端在kcp 队列中的,cancel不掉,那么你可以降低per streambuffer,来缓解这个问题。

xtaci commented 1 month ago

主要是有些奇葩的服务器,会用TCP HALF CLOSE,只关闭发送,不关闭接收,这种情况下,还得预留一个时间来收(30s) 。当然,对于你们的特定应用,你是可以关闭closeWait的,或者这个作为参数,写入启动config。我考虑下。

xtaci commented 1 month ago

3ec90cd66d42521db3232c347b89f488c5d94276 @Frank-pv 参数化了 image