v2fly / v2ray-core

A platform for building proxies to bypass network restrictions.
https://v2fly.org
MIT License
29.48k stars 4.66k forks source link

v2ray 的 DS 可能存在性能问题 & v2ray 整体性能提升 #373

Closed RPRX closed 3 years ago

RPRX commented 4 years ago

https://github.com/badO1a5A90/v2ray-doc/blob/master/performance_test/DS/20201030.md

不知道其它 Go 程序是否会这样,但 v2ray 一直是这样。


2020/11/01 更新:

经过 @xiaokangwang 大佬的点拨,发现 v2ray 有针对纯 TCP 的 read 性能优化,之前的确没注意到:

https://github.com/v2fly/v2ray-core/blob/74f96a83c8df4bde05f4ff2a70900f34b142aea4/common/buf/io.go#L61-L71

这里断言出 syscall.Conn,之后的 ReadVReader,系统直接写入 v2ray 指定的内存中,从而减少一次内存复制。

而由于这个修复 https://github.com/v2fly/v2ray-core/commit/47660bfee269dbe31105acf05e5a1797169ed9f1 ,DS 是去掉了这项优化的。(其实这里可以先判断一下 runtime.GOOS

但可以发现,只要多套一层,比如 HTTP 伪装、WebSocket、TLS、PROXY protocol,这项优化就不会起作用了。。。

所以我会据此调整下 XTLS,实现更强的性能。

RPRX commented 4 years ago

另外 v2ray 的整体性能也。。。需要研究一下问题在哪,优化 CPU 和内存使用(我有一些猜想,晚些时候补充)。

lucifer9 commented 4 years ago

貌似其他go程序不这样... https://github.com/lucifer9/goben 可以用这个跑下看 我在i3-8100上测试,DS是TCP 127.0.0.1的两倍左右 abstract 与否基本上没差别

RPRX commented 4 years ago

@lucifer9

可能是因为这项修复 https://github.com/v2fly/v2ray-core/commit/47660bfee269dbe31105acf05e5a1797169ed9f1

回滚后 linux 上 benchmark 试试

ghost commented 4 years ago

根据https://github.com/badO1a5A90/v2ray-doc/blob/master/performance_test/DS/20201030.md 做的测试,发现domain socket对性能影响巨大。我自己也做了很多类似的测试,发现确实如此。 后来我又对此做了进一步的实验,主要对比nginx和v2ray处理domain socket的性能。发现: 1.v2ray对domain socket进行接收(listen)或者输出(fallbacks)处理,性能都不如tcp。输出性能不到tcp四分之一,接收性能约为tcp一半 2.nginx对domain socket进行输出(proxy_pass)处理,性能和tcp基本持平。对domain socket进行接收(listen)处理,结果比较奇怪,这里不做评价

以下是我做的实验:

nginx配置

server {
    listen 81;
    listen unix:/dev/shm/test.sock;
    root /dev/shm/nginx;
}
server {
    listen 82;
    location / {
        proxy_pass http://127.0.0.1:81;
    }
}
server {
    listen 83;
    location / {
        proxy_pass http://unix:/dev/shm/test.sock;
    }
}
server {
    listen 86;
    location / {
        proxy_pass http://127.0.0.1:998;
    }
}
server {
    listen 998;
    location / {
        proxy_pass http://127.0.0.1:81;
    }
}
server {
    listen 87;
    location / {
        proxy_pass http://unix:/dev/shm/test2.sock;
    }
}
server {
    listen unix:/dev/shm/test2.sock;
    location / {
        proxy_pass http://127.0.0.1:81;
    }
}
server {
    listen 88;
    location / {
        proxy_pass http://unix:/dev/shm/test3.sock;
    }
}
server {
    listen unix:/dev/shm/test3.sock;
    location / {
        proxy_pass http://unix:/dev/shm/test.sock;
    }
}
server {
    listen 89;
    location / {
        proxy_pass http://127.0.0.1:999;
    }
}
server {
    listen 90;
    location / {
        proxy_pass http://unix:/dev/shm/test4.sock;
    }
}
server {
    listen 91;
    location / {
        proxy_pass http://unix:/dev/shm/test5.sock;
    }
}

v2ray配置:

{
    "log": {
        "loglevel": "none"
    },
    "inbounds": [
        {
            "port": 84,
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": 81,
                        "xver": 0
                    }
                ]
            }
        },
        {
            "port": 85,
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": "/dev/shm/test.sock",
                        "xver": 0
                    }
                ]
            }
        },
        {
            "port": 999,
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": 81,
                        "xver": 0
                    }
                ]
            }
        },
        {
            "listen": "/dev/shm/test4.sock",
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": 81,
                        "xver": 0
                    }
                ]
            }
        },
        {
            "listen": "/dev/shm/test5.sock",
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": "/dev/shm/test.sock",
                        "xver": 0
                    }
                ]
            }
        }
    ]
}

/dev/shm/nginx下放置着一个大小为300m的名为file的文件(因为内存不够了,我的机器内存只有1gb),/dev/shm还剩下约200m的空间

分别执行

#测试tcp直连速度
wget 127.0.0.1:81/file -O /dev/null

#对比niginx和v2ray doamin socket输出速度
    #nginx tcp-> nginx
wget 127.0.0.1:82/file -O /dev/null
    #nginx ds-> nginx
wget 127.0.0.1:83/file -O /dev/null
    #v2ray tcp-> nginx
wget 127.0.0.1:84/file -O /dev/null
    #v2ray ds-> nginx
wget 127.0.0.1:85/file -O /dev/null

#对比niginx和v2ray doamin socket接收速度
    #nginx tcp-> nginx tcp-> nginx
wget 127.0.0.1:86/file -O /dev/null
    #nginx ds-> nginx tcp-> nginx
wget 127.0.0.1:87/file -O /dev/null
    #nginx ds-> nginx ds-> nginx
wget 127.0.0.1:88/file -O /dev/null
    #nginx tcp-> v2ray tcp-> nginx
wget 127.0.0.1:89/file -O /dev/null
    #nginx ds-> v2ray tcp-> nginx
wget 127.0.0.1:90/file -O /dev/null
    #nginx ds-> v2ray ds-> nginx
wget 127.0.0.1:91/file -O /dev/null

每条命令都重复执行多次,用wget自带的速度显示进行测速,类似下面这样:

# wget 127.0.0.1:81/file -O /dev/null
--2020-10-31 20:05:39--  http://127.0.0.1:81/file
Connecting to 127.0.0.1:81... connected.
HTTP request sent, awaiting response... 200 OK
Length: 314572800 (300M) [application/octet-stream]
Saving to: ‘/dev/null’

/dev/null                            100%[===================================================================>] 300.00M  1.05GB/s    in 0.3s    

2020-10-31 20:05:39 (1.05 GB/s) - ‘/dev/null’ saved [314572800/314572800]

测试结果: 第一条命令平均速度为 1.02GB/s

第二条 510MB/s 速度刚好为第一条的一半 第三条和第二条速度几乎相同 第四条 480MB/s 第五条 110MB/s,连tcp回落的四分之一速度都不到

第六条 370MB/s 第七条 120MB/s 这个很奇怪,刚开始很慢,不到20mb/s,后面速度才快上了, 每次测都是这样 第八条 230MB/s 第九条 300MB/s 第十条 170MB/s 第十一条 140MB/s 这条命令居然比第五条还快,就很离谱

其他信息: 1.我的机器用iperf测到的回路极限速度为20Gbps 2.系统为 Ubuntu 21.04 3.内核为 5.10.0-051000rc1-generic 4.nginx版本为1.19.4 5.v2ray版本为4.32.0

lucifer9 commented 4 years ago

@lucifer9

可能是因为这项修复 47660bf

回滚后 linux 上 benchmark 试试

没啥变化,v2ray 测试配置如下,主要测试TCP回落和DS回落的性能差别

{
    "log": {
        "loglevel": "none"
    },
    "inbounds": [
        {
            "port": 10004,
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": 10001,
                        "xver": 0
                    }
                ]
            },
            "streamSettings": {
                "network": "tcp"
            }
        },
        {
            "port": 10005,
            "protocol": "vless",
            "settings": {
                "decryption": "none",
                "fallbacks": [
                    {
                        "dest": "/tmp/test.sock",
                        "xver": 0
                    }
                ]
            }
        }
    ]
}

iperf3监听10000端口

iperf3 -s -p 10000

用socat转发tcp和unix domain socket

socat -b 81920000 TCP-LISTEN:10001,reuseaddr,fork TCP:127.0.0.1:10000
socat -b 81920000 UNIX-LISTEN:/tmp/test.sock,reuseaddr,fork TCP:127.0.0.1:10000

TCP回落(port 10004)

iperf3 -c 127.0.0.1 -p 10004
Connecting to host 127.0.0.1, port 10004
[  5] local 127.0.0.1 port 6178 connected to 127.0.0.1 port 10004
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  1.07 GBytes  9.21 Gbits/sec    2    767 KBytes
[  5]   1.00-2.00   sec  1.13 GBytes  9.74 Gbits/sec    1    895 KBytes
[  5]   2.00-3.00   sec  1.10 GBytes  9.49 Gbits/sec    1   1.12 MBytes
[  5]   3.00-4.00   sec  1.35 GBytes  11.6 Gbits/sec    0    767 KBytes
[  5]   4.00-5.00   sec  1.04 GBytes  8.92 Gbits/sec    0    767 KBytes
[  5]   5.00-6.00   sec   856 MBytes  7.19 Gbits/sec    0    767 KBytes
[  5]   6.00-7.00   sec  1.20 GBytes  10.3 Gbits/sec    0    767 KBytes
[  5]   7.00-8.00   sec  1.08 GBytes  9.24 Gbits/sec    1    767 KBytes
[  5]   8.00-9.00   sec   891 MBytes  7.48 Gbits/sec    1    895 KBytes
[  5]   9.00-10.00  sec  1.12 GBytes  9.60 Gbits/sec    1    767 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.8 GBytes  9.28 Gbits/sec    7             sender
[  5]   0.00-10.00  sec  10.8 GBytes  9.27 Gbits/sec                  receiver

iperf Done.

DS回落(port 10005)

iperf3 -c 127.0.0.1 -p 10005
Connecting to host 127.0.0.1, port 10005
[  5] local 127.0.0.1 port 39040 connected to 127.0.0.1 port 10005
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  63.8 MBytes   535 Mbits/sec    1   1023 KBytes
[  5]   1.00-2.00   sec  47.1 MBytes   395 Mbits/sec    5   1023 KBytes
[  5]   2.00-3.00   sec  43.0 MBytes   361 Mbits/sec    7   1023 KBytes
[  5]   3.00-4.00   sec  42.0 MBytes   352 Mbits/sec    6   1023 KBytes
[  5]   4.00-5.00   sec  50.7 MBytes   426 Mbits/sec    8   1.12 MBytes
[  5]   5.00-6.00   sec  56.4 MBytes   473 Mbits/sec    4    895 KBytes
[  5]   6.00-7.00   sec  45.2 MBytes   379 Mbits/sec    5   1023 KBytes
[  5]   7.00-8.00   sec  43.0 MBytes   361 Mbits/sec    4   1023 KBytes
[  5]   8.00-9.00   sec  47.9 MBytes   402 Mbits/sec    4   1023 KBytes
[  5]   9.00-10.00  sec  43.2 MBytes   362 Mbits/sec    4   1023 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   482 MBytes   405 Mbits/sec   48             sender
[  5]   0.00-10.00  sec   475 MBytes   398 Mbits/sec                  receiver

iperf Done.

以上是回滚以后测试的。 以下是未回滚前(master)测试的

Connecting to host 127.0.0.1, port 10005
[  5] local 127.0.0.1 port 65104 connected to 127.0.0.1 port 10005
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  59.1 MBytes   495 Mbits/sec    4   1023 KBytes       
[  5]   1.00-2.00   sec  43.5 MBytes   365 Mbits/sec    3   1023 KBytes       
[  5]   2.00-3.00   sec  45.3 MBytes   380 Mbits/sec    4   1023 KBytes       
[  5]   3.00-4.00   sec  43.9 MBytes   368 Mbits/sec   12    895 KBytes       
[  5]   4.00-5.00   sec  44.6 MBytes   374 Mbits/sec    5    895 KBytes       
[  5]   5.00-6.00   sec  53.6 MBytes   450 Mbits/sec    4    895 KBytes       
[  5]   6.00-7.00   sec  46.8 MBytes   392 Mbits/sec    3    895 KBytes       
[  5]   7.00-8.00   sec  48.8 MBytes   409 Mbits/sec    4    895 KBytes       
[  5]   8.00-9.00   sec  45.4 MBytes   380 Mbits/sec    4    895 KBytes       
[  5]   9.00-10.00  sec  46.8 MBytes   393 Mbits/sec    5    895 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   478 MBytes   401 Mbits/sec   48             sender
[  5]   0.00-10.00  sec   469 MBytes   394 Mbits/sec                  receiver

iperf Done.

结果很接近

lucifer9 commented 4 years ago

macOS 10.15.7 下测试,同样配置 TCP回落是12.4Gb/s,DS回落3.66Gb/s 看上去差距就没Linux下那么大

RPRX commented 4 years ago

@xiaokangwang 说 TCP 有特殊优化,本来 DS 也有的(回落的 DS 可能不在这个体系内?)

lucifer9 commented 4 years ago

@xiaokangwang 说 TCP 有特殊优化,本来 DS 也有的(回落的 DS 可能不在这个体系内?)

怎么优化的啊,我用如下的配置测试

{
    "log": {
        "loglevel": "none"
    },
    "inbounds": [
        {
            "port": 10000,
            "listen": "127.0.0.1",
            "protocol": "dokodemo-door",
            "settings": {
                "address": "127.0.0.1",
                "port": 10001,
                "network": "tcp"
            }
        }
    ],
    "outbounds": [
        {
            "protocol": "freedom",
            "settings": {}
        }
    ]
}

结果还没回落快

iperf3 -c 127.0.0.1 -p 10000
Connecting to host 127.0.0.1, port 10000
[  5] local 127.0.0.1 port 43650 connected to 127.0.0.1 port 10000
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   551 MBytes  4.62 Gbits/sec   10    895 KBytes       
[  5]   1.00-2.00   sec   343 MBytes  2.88 Gbits/sec   11    895 KBytes       
[  5]   2.00-3.00   sec   367 MBytes  3.08 Gbits/sec    5    895 KBytes       
[  5]   3.00-4.00   sec   330 MBytes  2.77 Gbits/sec   11   1023 KBytes       
[  5]   4.00-5.00   sec   331 MBytes  2.77 Gbits/sec    9    895 KBytes       
[  5]   5.00-6.00   sec   451 MBytes  3.78 Gbits/sec    3   1.12 MBytes       
[  5]   6.00-7.00   sec   285 MBytes  2.39 Gbits/sec   10   1023 KBytes       
[  5]   7.00-8.00   sec   403 MBytes  3.38 Gbits/sec   15   1023 KBytes       
[  5]   8.00-9.00   sec   372 MBytes  3.12 Gbits/sec   17    767 KBytes       
[  5]   9.00-10.00  sec   467 MBytes  3.92 Gbits/sec   11    895 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.81 GBytes  3.27 Gbits/sec  102             sender
[  5]   0.00-10.00  sec  3.80 GBytes  3.27 Gbits/sec                  receiver

iperf Done.

直连的速度

iperf3 -c 127.0.0.1 -p 10001
Connecting to host 127.0.0.1, port 10001
[  5] local 127.0.0.1 port 10144 connected to 127.0.0.1 port 10001
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  4.33 GBytes  37.2 Gbits/sec    0    767 KBytes       
[  5]   1.00-2.00   sec  4.44 GBytes  38.1 Gbits/sec    0    767 KBytes       
[  5]   2.00-3.00   sec  4.41 GBytes  37.8 Gbits/sec    0    895 KBytes       
[  5]   3.00-4.00   sec  4.46 GBytes  38.3 Gbits/sec    0    895 KBytes       
[  5]   4.00-5.00   sec  4.30 GBytes  36.9 Gbits/sec    0    895 KBytes       
[  5]   5.00-6.00   sec  4.21 GBytes  36.2 Gbits/sec    0    767 KBytes       
[  5]   6.00-7.00   sec  4.50 GBytes  38.6 Gbits/sec    0    767 KBytes       
[  5]   7.00-8.00   sec  4.67 GBytes  40.1 Gbits/sec    0    767 KBytes       
[  5]   8.00-9.00   sec  4.26 GBytes  36.6 Gbits/sec    0    767 KBytes       
[  5]   9.00-10.00  sec  4.40 GBytes  37.8 Gbits/sec    0    767 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  44.0 GBytes  37.8 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  44.0 GBytes  37.8 Gbits/sec                  receiver

iperf Done.

同样上面的配置,macOS上测试,直连的速度是通过v2ray的2倍。没Linux上这么夸张。

RPRX commented 4 years ago

@lucifer9 TG 群反馈 v4.32.0 的入站日志中来源端口均为 0,大概是改 DS 引入的问题

lucifer9 commented 4 years ago

@lucifer9 TG 群反馈 v4.32.0 的入站日志中来源端口均为 0,大概是改 DS 引入的问题

如果是用的ws或者h2,这个行为是正常的。 之前的代码是用X-Forwarded-For的IP作为log里面的来源IP,实际的remote_address的端口作为log里面的来源端口。这样实际上是不对的,真有问题的话,没法根据这个IP:Port组合去上一级代理的accsee log里面找相应记录。 如果要求真实端口的话,那么得上级代理支持,这个要求比较难以满足。虽然确实大部分代理都可以配置,也有相应标准,但是对用户来说就不太友好。 现在只记录X-Forwarded-For,忽略掉端口,用户去上级代理查记录时候也不会被误导。当然要是强迫症特别喜欢看到ip:port这样的话,那就随机生成一个5位数端口也是可以的 😃

badO1a5A90 commented 4 years ago

但可以发现,只要多套一层,比如 HTTP 伪装、WebSocket、TLS、PROXY protocol,这项优化就不会起作用了。。。

所以是否意味着,比如带TLS的情况下,DS应该超过TCP的性能?但以往一些测试中似乎也并非如此.DS一直比TCP要慢.

RPRX commented 4 years ago

VLESS XTLS Direct Mode ReadV Experiment 很成功,少一次内存拷贝,又提升了近一倍性能。回落时的 DS 如果有 ReadV,应该能提升读取返回数据的性能。但控制变量的情况下和 TCP 比还是很奇怪,难道和 v2 内部处理数据转发的方式有关?另外,回落不经过路由和 pipe 机制,数据转发流程很简单,所以跑大量数据时性能甚至能超过 Nginx。这个 pipe 机制读和写都有大锁,可能是 v2 当前性能问题的核心所在,回落的长期稳定还证明了这个 pipe 机制没有绝对必要、可以绕过,即直接对接 reader 和 writer。

https://github.com/v2fly/v2ray-core/blob/master/transport/pipe/impl.go

RPRX commented 4 years ago

这里备忘一下,XTLS Direct Mode 的 ReadV Experiment 同时证明了它在 arm、mips 上都可用,不仅限于桌面平台。待周五的新版本带来更多样本后,如果没有什么问题,将把 v2 的 ReadV 改为全平台开放,预计大幅提升裸 VMess、SS 的性能。

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days