[Feature Request]未来V2ray的整体实现方案讨论

ActiveIce commented 4 years ago

鉴于最近漏洞频发，为了完全解决漏洞问题，我想V2的实现需要一轮重构。

相信大家都是为了更好的Project V而来此献言献策，大家各持己见来讨论，有所争论是正常现象，也是很好的现象，但是同时也要保证讨论的高效。bug report为了高效尚且有个模板，这里讨论就不规定模板了，但是希望大家是带着如下的思路进行发言：

**1. 当前存在xx问题，新的xx实现有xx特性，可以解决/规避上述问题。

当前缺失xx需求，这个需求有xx应用场景，增加xx功能可以满足该需求。
我认为xx提出来的方案x中第x点，存在x问题，可以使用xx实现，让x方案更完善。**

带着可能的解决方案，哪怕是还存在问题的、不成熟的方案或思路来，都是对讨论有好处的。鉴于每个人的需求不太一样，不可能一个协议就满足所有的场景。

所以讨论中不要出现用A代替B，用B代替A，C已经没救了，D永远不能得到解决，鹦鹉死了这种事情，我想这不是一种解决问题的态度。虽然我们知道永远不可能达到完美，但是仍然要追求完美。像提出A1, A2的优化方案代替A; B1, B2的方案代替B，讨论才能有所推进，有所成效。

明白大家漏洞当前，表达观点的急切心情，但是还是按规矩来，带着问题、分析、方案来讨论。感谢大家的配合了！！！（不按规矩来的，根据有无一定的建设性，进行评论折叠）

以下是我的需求分析、对现有方案的一些思考、总结，以及最后的整体设计思路

首先从需求着手，考虑到Project V本身是个网络工具平台，我想不能狭义的将它理解为科学上网的工具。它是有很多不同的使用场景的，如局域网内的传输，是不是明文会更高效；公网两台主机之间的传输，或许只需要单纯加密和数据完整性检查；有webserver的人呢，又有通过ws/h2传输的需求，过CDN的需求；有些人说，诶，我要搞加速器，用UDP暴力发包，不要握手；有些人说我这里UDP被QoS了，你要搞TCP给我用。这么一看，场景是丰富的，需求是多样化的。

其次是加密的问题。没有永远安全、绝对安全的协议，毕竟防御要面面俱到，攻击只需要一个漏洞。因此我们需要借用集体智慧，采用广泛使用的，有许多人共同进行研究与维护的协议，以便出现问题时，及时得到上游集体智慧的解决方案。越广泛使用，越多人研究和维护，就越能保证与时俱进的安全。

基于上述考虑，结合之前讨论中网友们的观点，提出一个方案

vmess保持无状态协议的设计，包含鉴权，明文。
加密层，可以通过配置文件开关，对明文vmess进行加密。此处参考Wireguard的验证流程和加密实现，新增对TCP的支持，新增对AES-GCM系列的支持。Wireguard现在是Linux内核代码了，使用该原理实现可以省去自创协议大量的安全性论证及试验工作，避免潜在的漏洞，并且能在未来得到上游的更新。进一步的，可以兼容Wireguard的虚拟网卡，简单实现跨平台VPN级别的透明代理。
传输层，可以通过配置文件指定，将明文vmess或者加密后的vmess，通过TCP/UDP/WebSocket/H2c/未来的H3，进行传输
TLS层，可以通过配置文件开关，把ws/h2c变为wss/h2，且具有能通过CDN的特性。TLS亦是广泛使用，且有上游维护的。
为了减少或避免服务端暴露在外层带来的潜在风险，在使用wss/h2时，v2服务端不必像Trojan一样放在最外层。保持v2ray能被webserver反代的特性，将流量识别与按路径分流依然交给webserver处理。
若配置文件使用了wss/h2，客户端即采用chromium网络层进行流量的发送，这个方案也广泛使用，有上游维护。

如此一来，局域网可仅使用vmess明文，公网之间通信UDP没被QoS的话，可使用vmess加密+UDP，有QoS可使用vmess加密+TCP，有webserver用户可以用vmess明文+wss/h2，担心CDN干坏事的朋友可以vmess加密+wss/h2。也给需要使用TLS的用户提供了客户端指纹问题的一种解决方案。

这个实现各层次之间比较分明，大家可以看菜吃饭，任意按自身需求进行组合。这肯定不是最优的方案，如果对某模块的实现有问题，或者是存在不合理之处，存在更好的实现，欢迎提出解决改进的办法。

rayc345 commented 4 years ago

根据我对防火墙的了解，目前稳妥的方法越来越少，1.采用随机的四不像数据（类似SS，vmess），2.类似Trojan/v2ray+tls这种伪装TLS 3.把所有的通信数据包裹到GET/POST请求，但不是ssr那种，那个装得不像。关于第一种方法，前面你说的vmess+tcp，也需要tcp混淆伪装。我准备了一个简单的vmess现有协议替换方案，仿照TLS，具体通信过程是：

服务器事先生成ECDSA密钥对，公钥p已经给了客户端。在磋商密钥时，p作为与预享密钥使用，加密会话建立之后，公钥p用于验证服务器身份。

客户端向服务器发起连接，使用ECDHH磋商密钥，双方互发公钥时，为了验证身份，需要利用chacha20poly1305对公钥加密同时进行长度混淆，加密密钥k=hmac_sha256(p,unix_time/30)，虽然公钥p固定，但是k每30秒变化一次，服务器会考虑与客户端时差，unix_time/30上下3的值一个个尝试，若所有的k都不能解密chacha20poly1305，则认为数据无效。服务器缓存30秒内接收到的所有公钥，每次遇到新客户端都检查是否有已用过的密钥，有则认定为重放攻击，一段时间后断开连接。
服务器成功解密客户端发送的公钥后，用同样的方式加密自己的公钥发送给客户端，并立刻通过自己的私钥和客户端公钥得到会话密钥，同时利用私钥s对会话密钥签名发送给客户端。客户端利用公钥校验签名正确性。
这个握手流程相当于简化的TLS1.3，为了减小网络延时，客户端发送公钥时可以把代理目标地址与公钥放在一起，服务端获得客户端公钥的同时就可以对目标发起连接。不过代价是代理目标地址会失去前向安全性的保护。
这套流程我在clash上整了一个客户端，自己弄了个服务端已经调试通过。

ActiveIce commented 4 years ago

非常感谢提供了已经可以跑通的方案。看起来和Wireguard使用的技术有许多共同之处，ECDH密钥交换，预共享公钥，chacha20-poly1305加密。可以具体说说这个方案相比直接使用Wireguard的原版实现有什么区别吗，以及这些区别带来的优点/避免的缺点。另：Wireguard是基于Noise Protocol Framework，该框架下有很多方案可使用，不局限于UDP和chacha20，基于这个搞，也相当于有个在维护的上游了。

rayc345 commented 4 years ago

我是搬运了geph2项目里的一些代码，其作者表示自己是网络安全专业在读研究生，研究Tor相关的内容，代码是它基于scramblesuit等学界方案设计的，基本就是obfs4。具体相关的文章没有仔细看。实现方式与wireguard确实非常类似，ecdh密钥交换ecdsa身份验证aead加密。 wireguard是udp发包，vmess这边需要使用tcp并不能直接拿来用，所以也不好比较优劣，因为wireguard原版并不能搬到vmess上。另外Noise Protocol Framework这个是一个加密库，所提供的各个功能在golang中基本也都有。都是有维护的。

henrypijames commented 4 years ago

I'd like to challenge the matter-of-fact stance on maintaining statelessness. I understand the many benefits of a stateless protocol, but I have yet to see anyone do a cost and benefit analysis (or even an estimate) on it. At this point, I feel like many people's preference for statelessness is just blind faith, or fear of the unknown, or something else not entirely rational.

(I don't have a concrete proposal of how to improve the protocol by making it stateful - and I'm sorry for talking in abstracts. Maybe someone else more familiar with the topic can help out with an example. I just want the issue to be checked before the train has left the station.)

ActiveIce commented 4 years ago

@henrypijames Thank you for your comment. Can you tell us some advantages of stateful protocols, which the stateless protocol don't have, to help us do the cost and benefit analysis. Thanks again.

henrypijames commented 4 years ago

Like I was saying: I don't feel like I'm expert enough on this issue to make a concrete suggestion. Some else should do it, please. But I find it worrying that we've apparently decided on maintaining statelessness (and this is not a small decision to make) without anyone being able to say why.

The reason I want us to investigate a stateful alternative is because we've learned, over the years, that the protocol header is a major target (if not the main target) of fingerprinting, and a major part (if not the main part) of the header of a stateless protocol is authentication. This can't be avoided or even reduced as long as the protocol remains stateless.

If, on the other hand, we make the protocol stateful, there is at least a theoretical chance to significantly reduce the frequency and amount of authentication data, as well as other header data, thereby lower the overall fingerprint.

Again, I know that becoming stateful brings a boatload of problems on its own. But given how persistent and disturbing the fingerprinting problem has become - and is expected to get only worse, I think we should at least pause and think it through.

henrypijames commented 4 years ago

I just remembered this sentence from the WireGuard protocol doc:

Any secure protocol require [sic] some state to be kept

Maybe we can read this doc again and try to understand what Jason thought about stateful vs. stateless when designing WG. (BTW, I personally consider WG to be semi-stateful - the server does keep track of the clients, but it auto-renegotiates if either end reboots. In essence, stateful on the inside, but appearing stateless to the user outside.)

nametoolong commented 4 years ago

Any secure protocol require [sic] some state to be kept

In WG's case it is probably because

This handshake occurs every few minutes, in order to provide rotating keys for perfect forward secrecy.

However, what has bothered v2ray and other proxy implementations is behavioral fingerprinting. The problem will mostly be made worse under a stateful protocol, unless someone comes up with a really clever design of state machine.

To the authentication problem, there are several proposals of out-of-band authentication (I remember seeing them but frankly I don't know how to search for them in Chinese). I wonder if they are practical though.

henrypijames commented 4 years ago

However, what has bothered v2ray and other proxy implementations is behavioral fingerprinting. The problem will mostly be made worse under a stateful protocol, unless someone comes up with a really clever design of state machine.

Why does being stateful automatically (or most likely) lead to higher behavioral recognizability? I don't see that. Could you elaborate?

To the authentication problem, there are several proposals of out-of-band authentication (I remember seeing them but frankly I don't know how to search for them in Chinese). I wonder if they are practical though.

If you're stateless, out-of-band auth only put your auth communication in other (less fixed) places, but it doesn't significantly reduce the frequency and volume of auth. So it helps a little, but not much. Plus, out-of-band auth has a greater attack surface for MITM.

nametoolong commented 4 years ago

To a certain degree, your points do make sense.

Things are in fact more complicated than the stateful-stateless dichotomy. TLS has a notoriously complex state machine which is proven to be troublesome. WireGuard, ScrambleSuit, obfs4 and I2P's NTCP2 are stateful but authenticate only at the very beginning of a handshake, before the key exchange, which means they are stateless pre-handshake. Lantern's lampshade is stateless though, with always-on multiplexing. Shadowsocks is planning to move to a stateful protocol to thwart MITM attacks due to its trust model.

Let's go back to WG.

One design goal of WireGuard is to avoid storing any state prior to authentication and to not send any responses to unauthenticated packets.

Hence, WG's design will not significantly reduce the frequency and amount of authentication data as each connection still requires an authentication step. We can achieve the very same by forcing mux enabled in v2ray and having UUIDs auto-rotating. The point is, if a protocol or an implementation does something wrong at the first packet, its unobservability is doomed no matter stateful or stateless.

So I agree that we do not need to stick to statelessness. Stateful protocols are worth investigating, but only if they actually decrease the frequency of authentication. AFAICT no protocols do so without utilizing an out-of-band channel.

By the way, you are right about how stateless out-of-band authentication is useless.

henrypijames commented 4 years ago

Right, but there is a big difference between auth data in every connection, and auth data in every request (over the same connection) - which is what stateless means. Single requests (in fact, single packets) still need to be secured against MITM, but that can be done with a token (either static or dynamic) that's generated during auth at the start of the connection (reusing the token across different connections is probably too unsafe), thus reducing the auth footprint.

nametoolong commented 4 years ago

Right, but there is a big difference between auth data in every connection, and auth data in every request (over the same connection) - which is what stateless means.

The fact is that VMess does authenticate exactly once per connection. By enabling mux, all subsequent requests will be multiplexed on the established stream. No extra authentication is done.

henrypijames commented 4 years ago

The fact is that VMess does authenticate exactly once per connection. By enabling mux, all subsequent requests will be multiplexed on the established stream. No extra authentication is done.

First, mux isn't mandatory yet. And whether or not we should make it mandatory in the new protocol needs to be discussed - it too is not an unimportant decision to make.

Second, assuming mux, how do we protect against MITM between requests if we're not using TLS (because obviously, we have yet to make a decision on mandatory TLS as well)?

nametoolong commented 4 years ago

Second, assuming mux, how do we protect against MITM between requests if we're not using TLS?

We use AEAD ciphers to protect all post-handshake data frames, as if they were real application data. All protocols mentioned in https://github.com/v2ray/v2ray-core/issues/2541 do so.

There is not a clear distinction between 'requests' and 'application data'. If we use TLS, then there will eventually be a thin layer of protocol that makes requests on top of TLS. Yet from the perspective of TLS, the thin layer of protocol is perfectly application data and no raw authentication data are exposed.

henrypijames commented 4 years ago

So are we are limited to being stateful within a (muxed) connection, but stateless across connections? And the state within a connection is limited to (one-time) encryption keys/tokens? Or is there benefit to be had if the protocol keeps more state?

I think we agree that authentication and encryption comes with a minimal amount of state. For this discussion, we should focus on statefulness beyond that - is it useful and is the benefit worth the cost.

nametoolong commented 4 years ago

So are we are limited to being stateful within a (muxed) connection, but stateless across connections? And the state within a connection is limited to (one-time) encryption keys/tokens?

It appears to be yes, under v2ray's current design.

Or is there benefit to be had if the protocol keeps more state?

I believe yes, if we implement something like I2P's SessionTag mechanism. But how much? Shadowsocks could be the simpliest implementation of the token idea. Each Shadowsocks connection on wire is just 16 bytes of nonce-like key material followed by AEAD data frames. A minimal implementation of the token mechanism will be, like, 16 bytes of token followed by AEAD data frames, essentially a Shadowsocks with v2ray's time-based replay protection. Such protocol immediately solves our short-term considerations, yet it remains unclear whether we can rely on it in the long term. Whether it is worth implementing is up to the developers.

henrypijames commented 4 years ago

I believe yes, if we implement something like I2P's SessionTag mechanism. But how much? Shadowsocks could be the simpliest implementation of the token idea. Each Shadowsocks connection on wire is just 16 bytes of nonce-like key material followed by AEAD data frames. A minimal implementation of the token mechanism will be, like, 16 bytes of token followed by AEAD data frames, essentially a Shadowsocks with v2ray's time-based replay protection.

But this is still authentication (in a wider sense), it's not what I meant by "more state".

V2's latest countermeasure against connection-close fingerprinting is an example of statefulness: The server recognizes a suspect client, then assigns a property to it (in this case, a pseudorandom length of additional bytes to drain) and keeps track of it until the client is disconnected (which is only milliseconds thereafter, but still). Can we make use of something like that, but in positive ways for legitimate clients?

nametoolong commented 4 years ago

I can think of one example of ShadowsocksR: one of their protocols determines packet length using a PRNG, completely getting rid of length fields and thus reducing the attack surface. There are a handful of other possibilities but I have no idea how to build a sound protocol.

genufish commented 4 years ago

使用CDN时，可以通过任意CDN节点访问，但是配置中只能填写单一ip，希望配置文件支持服务器地址填写多个ip段。

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

v2ray / v2ray-core

[Feature Request]未来V2ray的整体实现方案讨论 #2541