shadowsocks / shadowsocks-org

www.shadowsocks.org
MIT License
890 stars 547 forks source link

[SIP] Shadowsocks v2 #157

Open riobard opened 4 years ago

riobard commented 4 years ago

This issue is to discuss the changes we want in the next major revision of Shadowsocks protocol. Right now I've done some preliminary research based on the SOCKS6 RFC draft and I have a prototype security layer that provides forward secrecy (except for early data) and 1-RTT latency (or 0-RTT if used with TCP Fast Open).

So here're the things I have in mind (no particular order of importance, and most are optional):

  1. v2 protocol roughly based on SOCKS6 (which is still a moving target).
  2. New security layer with PFS and 0/1-RTT (w/o TFO). (related issue https://github.com/shadowsocks/shadowsocks-org/issues/54)
  3. Basic auth so we can officially support single-port multi-users without hacks.
  4. Native solution for DNS. (related issue https://github.com/shadowsocks/shadowsocks-org/issues/156)
  5. Better-defined semantics of proxy and VPN regarding errors and ICMP packets (related issue https://github.com/shadowsocks/shadowsocks-org/issues/144)
  6. Multiplexing over single TCP connection (similar to HTTP/2) to reduce latency when TFO is not possible.

Please feel free to discuss the changes.

Mygod commented 4 years ago

Could be good to add links to related issue in this repo.

ghost commented 4 years ago

Can v2 server also provide v1 service?

Multiplexing over single TCP connection

In HTTP/3, they multiplexing over single UDP "connection" to avoid TCP flow control stalled all sub stream in single TCP connection. SCTP is an option too, it's designed to multiplexing, but it's too rare...

And just mention here: First proposal (in clowwindy's original post) for shadowsocks is public key encryption. (如果有其他同学有兴趣加入的话,也许可以进一步做成公钥加密的。)

riobard commented 4 years ago

@studentmain For co-existence of v2 and v1: this is up to the implementation to decide as long as it's not possible to do down-grade attack.

Multiplexing over TCP (and HTTP/2 in particular) is a compromise because of UDP throttling. If HTTP/3 becomes popular and works well enough in practice we could switch to UDP as well. Meanwhile I have to deal with TCP and broken middle boxes killing TFO packets.

I've almost finished the public key encryption part (without RTT penalty).

ghost commented 4 years ago

Then problem became how client choose correct protocol. Change in URL format may required.

HTTP/3 is modified QUIC, so I think at least it's good enough in Google's data center. But I'm not sure it's good enough too under pacific ocean. v2ray has QUIC support, maybe we can take a look at them. https://blog.apnic.net/2018/05/15/how-much-of-the-internet-is-using-quic/ https://w3techs.com/technologies/details/ce-quic

We should use quantum safe cipher in public key encryption. https://github.com/open-quantum-safe/liboqs/tree/master#supported-algorithms

riobard commented 4 years ago

A ss2:// scheme should work fine.

I need to see hard evidence that UDP works without significant throttling before investing time on it.

Quantum security is beyond the scope. Public key encryption is mostly to support multi-users without too much security downsides.

Mygod commented 4 years ago

Regarding the proposal, this is definitely too much. I prefer a minimalistic approach and offload features to plugins whenever possible. In fact, we could even make the default AEAD encryption as a (default) plugin and always run in plain (I guess that reduces v2 to simply socks6 over *, but KISS).

Detailed comments:

riobard commented 4 years ago

@Mygod At minimal I'd like ss2-server to work like a regular SOCKS6 server with some special behaviors regarding authentication as to not leak its existence. But right now there's no other SOCKS6 clients to test with.

It's 2020 and public key crypto is easy and efficient with modern primitives. I've considered using just TLS but there are several major blocking issues, namely only TLS 1.3 technically supports 0/1-RTT mode, but many implementations (like the one in Go's stdlib) does not support 0-RTT at all, and there's no plan to add it any time soon. Additionally, TLS brings in a host of other issues regarding certificate management and domain verification that I don't want to force it on people. And the complexity of TLS is… well just read the RFC and judge it yourself. The new security layer aims to be very simple, efficient, and secure. If you do not care about any of those nice things, you can always run SOCKS-over-TLS (many existing clients support it) and no need to bother with Shadowsocks at all.

I'm still considering multiplexing. It has significant benefits in Shadowsocks use case, namely 1) reliable 0-RTT connection establishment even when TFO does not work, 2) better utilization of network bandwidth due to TCP congestion control, and 3) it cuts the number of connections/open files in half on the server side. However it does come with obvious drawbacks as well, like complexity and head-of-line blocking, both of which cannot be avoided. I'm experimenting with HTTP/2 CONNECT proxy now, and it does work better than I expected. But it's difficult to integrate and provide reasonable proxy semantics (mostly communicating errors between local and remote side).

Mygod commented 4 years ago

Authentication without leaking existence can be achieved simply via socks6 over blank. It seems like socks6 draft specifies that the client can send payload immediately after authentication header so I don't see why this is an issue.

0-RTT cannot work. TLS 1.3 does 0-RTT by using a session ID. You need a handshake to establish a key exchange/session ID. You are not going to want to encrypt each packet using public-key encryption.

TLS has the added advantages for traffic hiding that a new protocol cannot provide. The advantage of TLS is that it makes traffic indistinguishable from other TLS traffic, say HTTPS, except for inspecting packet length distribution (see sssniff, etc), which I believe is somewhat reliable at best.

Also, the reason I oppose you building protocols from public-key crypto directly is exactly the reason I opposed OTA. We are living in the sad world where security is not as composable as you would like it to be, and you are not going to get world's best security experts to audit your protocol (despite how complicated TLS is, it is of popular interest and gets audited by everyone).

Multiplexing is useless, except for connection reuse, which should be implemented by plugins. I agree connection reuse could be useful but again this should be implemented by a plugin where the mimicked traffic does use such feature, say HTTP/2. We should make hiding traffic our priority instead of performance (especially when it's as minor as number of RTTs), etc. The plain protocol should not have long idle connections.

Mygod commented 4 years ago

In conclusion, we should just do socks6 over [blank]. @madeye What do you think?

riobard commented 4 years ago

@Mygod Unfortunately you are wrong on many levels…

  1. Multi-user authentication won't work securely without public key crypto, see #54
  2. 0-RTT works fine, except for early data (TLS 1.3 also shares this caveat), and client can choose how much early data to send. The trick is to pre-share server public key. We need this for server authentication anyway so no extra problem either. Also see issue #54 (I need to update it with forward secrecy tho).
  3. Sure, I completely understand the benefit of TLS. But for obfuscation we all agree it should be done by plugin so there's no disagreement. We just need to provide a default when people don't want to bother with TLS. Current default is insufficient.
  4. The new security layer is basically vastly simplified TLS 1.3 so I'm confident. You might not agree and it's perfectly fine to use TLS instead (and accept your chosen TLS lib's limitations).
  5. Fewer RTTs is important for user experience. Long idle connections are the norm. Just check how many connections you phone keep to various clouds. And it's strange for a client to keep dozens of TCP connections to a single server in an increasingly HTTP/2 world.
Mygod commented 4 years ago

I am not going to argue with your opinions so just some technical comments.

  1. "securely" depends on your security model. I mentioned in https://github.com/shadowsocks/shadowsocks-org/issues/54#issuecomment-589523707 that your proposal actually does not achieve what you want (in particular I constructed an attacker in your model). However, I would argue that TLS does it pretty decently.
  2. You can make a plugin to do what you describe but I do not feel comfortable making an ad-hoc protocol a default choice.

My opinion: TLS isn't too hard to set up actually.

riobard commented 4 years ago

We can discuss the technical issues separately in #54.

I'm not against TLS. It's just that the TLS in Go stdlib does not provide what I want (0-RTT) and I don't want to bother with certificates and domain verification. Like I said before, you can always run SOCKS-over-TLS so there's no disagreement here.

madeye commented 4 years ago

Recently, I'm thinking about a side channel key exchange approach.

For example, do a Wireguard like key exchange (https://www.wireguard.com/protocol/) in a side channel (a standard 443 port, a random port, or even a different host server), then communicate using the current shadowsocks protocol.

riobard commented 4 years ago

@madeye The benefit is?

I think some of the commercial operators offer HTTPS-based subscription to do similar things. But I don't fully understand the reasoning behind that.

ghost commented 4 years ago

@riobard So ss server itself can know which user will connect before any packet received. That will make active probing useless.

riobard commented 4 years ago

@studentmain Could you please explain a bit more in detail under what scenario will make it immune to probing?

ghost commented 4 years ago

Client connected to side channel and finish handshake here. Then server will get client's IP address (maybe with IV user will used) before client connected to it. Attacker can't pass side channel handshake, so when a packet come in, server has no information about it, then server can reset connection or do whatever it like.

riobard commented 4 years ago

If it's IP-based firewalling, it seems very fragile given the mass deployment of Carrier-Grade NAT (CGNAT), in which you cannot guarantee the client's public IP when connecting to the authorization server is the same one when connecting to the relay server.

So the safest bet is for the client to get some kind of auth token from the authorization server and use that token to connect to the relay server. At this point I'm confused as to how it will be different than sending just a PSK?

ghost commented 4 years ago

So the safest bet is for the client to get some kind of auth token from the authorization server and use that token to connect to the relay server.

Yes, token obtained in safe side channel is ok.

riobard commented 4 years ago

How is it different from the current approach with respect to active probing? I still don't understand the advantage of the split approach.

ghost commented 4 years ago

In split approach, server can detect probe easier and more accurate. It works similar to port knocking for SSH.

riobard commented 4 years ago

So in the normal setup, we have to detect replay attack on the server in a black list style (previously used nonce will be rejected). But in the split approach, at least on the relay server, I assume it's more like a white list style (only authorized tokens will be accepted).

Then we're moving the attack surface from the relay server to the auth server. Also now because the auth server and relay server are different now, we have to consider the additional synchronization issue (client gets an auth token from auth server, but auth server has not yet delivered that token to the relay server when the client connects to the relay server).

It does not look too promising either. Or am I missing something here?

ghost commented 4 years ago

Auth server can hide behind normal website (that's why it use 443) and works less frequent than relay server. So I think it's attack surface is much smaller, you need find it from many TLS website first.

additional synchronization issue

That's the problem need to be resolve. My solution is send keys to client after received relay server's confirmation. That will introduce more RTT for first packet. Auth server can send few dozens key to client for use in other connection, so it only affect first connection.

riobard commented 4 years ago

I see. My concern is that the split approach introduces many moving parts (it's officially a distributed system now) and the benefits are not very clear cut.

Mygod commented 4 years ago

This is a too complicated solution for a problem that TLS can solve.

EDIT: Also you are making it easier to fingerprint the server.

ghost commented 4 years ago

So here's three solution right?

ghost commented 4 years ago

Modified SOCKSv5 + TLS (via simple-obfs and v2ray-plugin) already tested for a long time.

About side channel handshake, as it needn't redesign packet format, we can test it on current code base.

riobard commented 4 years ago

More likely modified/simplified SOCKS6 + interchangeable security layer (custom/tls/plugin)?

Only issue is that SOCKS6 is still a draft and it's not clear if it will be widely adopted.

ghost commented 4 years ago

Only problem of TLS is they need a domain name.

riobard commented 4 years ago

@studentmain And certificates and renewal handling (acme most likely), which is a hassle for many.

ghost commented 4 years ago

Let's Encrypt ACME renewal can be automatic, so domain name is the only problem. Or they can only use self signed cert, that's another fingerprint...

riobard commented 4 years ago

Yeah that’s what acme is for. Still something to setup. Also TLS doesn’t work well in some corporate network with MitM decryption and company-issued CA. Buy it does come with the benefits that it will pass most firewalls and looks pretty innocent. There’s no one size fits all solution here.

ghost commented 4 years ago

So we may have multiple security layer and let user choose one. Does multiple cipher choice still necessary here? How external plugin operate in this model (I don't want see SOCKSv6 over TLS over Websocket over TLS)?

riobard commented 4 years ago

I’m afraid SOCKS-over-WebSocket-over-TLS will still be necessary to work with CDN. Anyway the choice is relatively simple:

We should definitely make a flowchart to pick the right combo. 😂

ghost commented 4 years ago

My suggestion:

Make new security layer as tiny as possible, provide forward secrecy. An optional built-in TLS (maybe with Websocket) layer can be enabled by user when there's no plugin.

Why I think FS is necessary: https://github.com/shadowsocks/shadowsocks-windows/issues/2162#issuecomment-455615758

riobard commented 4 years ago

Check #54 for the proposal. It’s as minimal as possible now.

Mygod commented 4 years ago

@studentmain Not sure if you have heard of Tor.

Dreamacro commented 4 years ago

Why not use https://noiseprotocol.org/noise.html ?

riobard commented 4 years ago

@Dreamacro A few reasons:

  1. Complexity is over the roof (unless use 3rd libs).
  2. Handshakes means either additional RTTs or keeping state on both ends (we want neither).
  3. Might as well use TLS (more common and innocent-looking at least).
Dreamacro commented 4 years ago

@riobard noise protocol more simple and lightweight than TLS, and provides a lot of flexible handshake patterns. In the real-world, Wireguard and Whatsapp use it.

As for RTT, noise protocol will be less than TLS.

riobard commented 4 years ago

@Dreamacro So you are suggesting we use a specific Noise key exchange, or the whole suite? AFAIC we only need ECDH to setup ephemeral sessions keys. The rest of Noise does not bring much benefit.

My worry is that both WireGuard and Whatsapp are assumed to be blocked in the future (if not yet now), and there's nothing stopping censors to block all Noise-like protocols (if not disguised). Compared to the case of TLS, I guess most censors cannot afford killing TLS due to its importance.

made-by-love commented 4 years ago

Can v2 server also provide v1 service?

Multiplexing over single TCP connection

In HTTP/3, they multiplexing over single UDP "connection" to avoid TCP flow control stalled all sub stream in single TCP connection. SCTP is an option too, it's designed to multiplexing, but it's too rare...

And just mention here: First proposal (in clowwindy's original post) for shadowsocks is public key encryption. (如果有其他同学有兴趣加入的话,也许可以进一步做成公钥加密的。)

I implemented pre-shard server public key (x25519 key exchange in libsodium) encryption in 2017, and zero overhead except for 32 bytes client's public key before nonce at the beginning of the first TCP packet. I Implemented in shadowsocks-python and libev version when I ported AEAD to Python version in 2017.

As it's not compatible to original shadowsocks, I didn't push the code.

Key exchange details in libsodium: https://download.libsodium.org/doc/key_exchange

服务器端 pk: 公钥,预先生成,客户端通过 api 获取服务器信息获取服务器 pk sk: 私钥,和 pk 是一对,预先生成,可以用 shadowsocks/psk.py 生成<pk, sk> rpk:远端/客户端公钥,客户端建立新连接的时候和 nonce 一起发送 rx:远端/客户端加密密码,也就是接收密码,解密用 tx:发送加密密码,发送给客户端加密的密码

客户端 pk: 公钥 sk: 私钥,和 pk 是一对,发起TCP链接的时候生成一次性 <pk, sk> 对 rpk:远端/服务器公钥,通过 api 获取服务器信息获取 rx:远端/服务器加密密码,也就是接收密码,解密用 tx:发送加密密码,发送给服务器端加密的密码

rx,tx 密码 rx 和 tx 是通过 pk,sk,rpk 计算得来 <rx, tx> = session_keys(pk, sk, rpk)

rx || tx = BLAKE2B-512(X25519(p.n))

ohsorry commented 4 years ago

So I guess this is the place where discuss ss v2 protocol mentioned by @studentmain? I haven't read the contents of socks v6 yet, so I may not be able to share my thoughts on v2 right now. However, check my efforts on refactoring shadowsocks-windows. I'v tried several times to figure out how shadowsocks works by reading the source code of shadowsocks-windows, but failed each time. Therefore I decided to refactor it, and give it a redesign. The key work of refactoring has been done, and both the server and client work properly on my computer. Have a look at it: https://github.com/shadowsocks/Shadowsocks-Net

Thanks to @celeron533 for creating a repository for me. My English really sucks and I shouldn't talk so much.

EkkoG commented 4 years ago

Only problem of TLS is they need a domain name.

TLS has an ext named TLS-PSK before TLS 1.3, TLS 1.3 has include this part, not ext at all, https://tools.ietf.org/html/rfc8446#section-2.2 image image from https://www.wikiwand.com/en/Transport_Layer_Security

openssl has official support TLS-PSK, and this is a Python warpper, it no need to have a domain at all. For user, the config can same as over TCP, no domain, no certificates.

The problem here is, language's stdlib TLS-PSK API is always missing, so it's need some third-party lib, or develop from scratch.

@riobard @studentmain @Mygod

riobard commented 4 years ago

@cielpy Presumably if we use TLS, we'd like to look as innocent as possible to blend in normal TLS traffic. The problem with TLS-PSK is that it has an easily-detectable & unique feature, and it is extremely rare in normal TLS traffic.

We could in theory just adopt TLS-PSK and call it a day, but if enough people are using it to evade GFW, it will be investigated by GFW admins and blocked.

EkkoG commented 4 years ago

@riobard Yes but no, I know the situation you mentioned is almost like GFW blocked ESNI recently, but it a little different because TLS-PSK has existed for years and have it own scenes to be used, mostly IoT devices, ESNI is very new, so GFW can just to block it, for TLS-PSK, I think it will be more difficult when the admins make the block decision.

EkkoG commented 4 years ago

If we know how much traffic have for now on the Internet, it will be easier to help us to make choice, but no...

riobard commented 4 years ago

There's no need to know global traffic pattern. Just capture traffic for a day at your home router and calculate the percentage of TLS-PSK in all TLS connections. I'd be impressed if it is more than 0.01% for an ordinary household.

EkkoG commented 4 years ago

Maybe, I will if it's possible.

ghost commented 4 years ago

Only problem of TLS is they need a domain name.

TLS has an ext named TLS-PSK before TLS 1.3, TLS 1.3 has include this part, not ext at all, https://tools.ietf.org/html/rfc8446#section-2.2 image image from https://www.wikiwand.com/en/Transport_Layer_Security

openssl has official support TLS-PSK, and this is a Python warpper, it no need to have a domain at all. For user, the config can same as over TCP, no domain, no certificates.

The problem here is, language's stdlib TLS-PSK API is always missing, so it's need some third-party lib, or develop from scratch.

@riobard @studentmain @Mygod

Can TLS operate without domain name? Technicality yes, actually no. Here's one thing: no widely support = no widely use (= if we use it, we are looks strange)