Shadowsocks AEAD 加密方式设计存在严重漏洞，无法保证通信内容的可靠性

RPRX commented 3 years ago

这个问题是昨天发现的，本来我只是想先给 Xray-core 的 Shadowsocks 明文结构加个时间戳以彻底解决对服务端的重放问题。

顺便研究了一下 SS AEAD 的安全性，我脑洞比较大，考虑得比较多，比如是否可以造成 对客户端的攻击，简述如下：

对于 TCP，Shadowsocks AEAD 往返是完全一样的加密结构，HKDF 的 info 也相同，且响应的加密完全独立于请求
对于 UDP，同样存在上面的问题（甚至可以与 TCP 杂交），更糟糕的是，UDP 往返连明文结构都是完全一样的

以上“特性”的利用方式太多了，真的很难排列组合完，还是主要说对客户端的攻击：

在不需要密码的情况下，可以随意对客户端接收的数据进行移花接木或重放等操作，使得被代理程序收到的数据没有任何可靠性。

比如将 A、B 两条 TCP 连接返回的数据对调，Shadowsocks AEAD 客户端是无法发现异常的，只会成功解密并传给被代理程序。

一些 SS 实现是全局共用 IV 过滤器，只能在单一客户端且不重启（which is impossible）的前提下防御重放，防不了移花接木。 Shadowsocks 流加密也存在此问题，有兴趣的可以研究一下 SSR 的 auth_chain_* 系列是否也存在此问题。

建议先用 VMess AEAD（实际上它有其它问题，但不是这种严重漏洞），最后感谢某个开发者群成员参与讨论与验证。

需要说明，这里主要是描述存在的漏洞，而对漏洞的利用有很多方式，比如 移花接木、重放、自交、杂交 等，欢迎补充测试结果。

ghost commented 3 years ago

开发者群成员进行了 本地 echo 的 PoC，并根据测试结果得出 TCP / UDP 移花接木猜想，后者尚未进行 PoC

注意⚠️：下文提及的 “问题” 仅限于本地 echo 自交，不包含其他问题的受影响程度

注意⚠️：没发出 PoC 不代表没有问题，不受 echo 攻击影响的不代表没有问题

什么是 `本地 echo`

本地 echo：将 Client 原本要发给 Server 的流量重新发回给 Client，但 Client 解密成功发回给上层应用 Client 收到了返回来的包，成功解密后发给上层应用 但他没有意识到，这包其实是不久前自己发出的，原封不动地又被发了回来

经过实践验证，echo 方式影响的软件 / 实现：

V2Ray / Xray 内 Shadowsocks 任意 method （AEAD / 非 AEAD）
SSR 任意 method + origin + plain ~~（怕不是没人用了）~~ 对不起我高估了，还有好多人在用
ss-libev 非 AEAD 加密受影响（AEAD 仅仅不受 echo 影响，其他攻击向量仍然可行）

复现过程（仅本地 echo 方式，其他攻击方式后续发布）

搭建本地 echo 环境 socat UDP4-LISTEN:2000,fork EXEC:'cat' 和 socat TCP4-LISTEN:2000,fork EXEC:'cat'
启动任意上述受影响的实现
使用 proxychains 或 cgproxy （因为 proxychains 不能处理 UDP 流量）代理 nc 进行连接 proxychains nc 任意IP地址任意端口号 cgproxy nc -u 任意IP地址任意端口号
输入任意内容，即可发现内容被 echo 返回

参考复现结果（仅参考）

TCP

UDP

需要说明的是，这种情况下 nc 收到的回包内包含目标地址块（即图一中的 y 和图二中的 ‘）：

zonyitoo commented 3 years ago

This is vulnerable because somebody can "fake" the response stream / packets from a known shadowsocks server.

If the attacker doesn't know the correct password, then he can only replay some known respond streams or packets.
If the attacker knows the correct password, then he doesn't need to do this.

So the key is to prevent replay on the client side.

Here are some of my stupid guess about the applications of this vulnerability:

Most users uses shadowsocks for accessing HTTP services, so if he is accessing a HTTP (without the S) service, then the attacker can respond to him a "fake" website. If possible.
The attacker just want to destroy your shadowsocks connection, and he just makes some random respond to you.

But these vulnerability doesn't seem to affect personal and private users. :P

RPRX commented 3 years ago

@zonyitoo 想象一下引发 TLS 错误从而根据报错长度来判断是代理。以及，SS AEAD 作为一个加密传输层，本应保证回应可靠。

Mygod commented 3 years ago

What happened to random IV?

bdbai commented 3 years ago

@Mygod Client IV, even random, does not reflect in a server response by any means. While Server IV may be random because it is out of control. Therefore, an arbitrary Shadowsocks stream, either uploading or downloading, could be reused by a MITM as a response stream to another request as long as both of them involve the same Shadowsocks service.

zonyitoo commented 3 years ago

What happened to random IV?

This issue indicates that the respond stream / packet could be faked.

Consider a senario, a client makes 2 independent TCP connections to a server, then a attackers sliently swaps these 2 connections' respond packets to the client. The client won't notice but only find errors from the applications, because the shadowsocks' client can always decrypt the two respond streams successfully.

Further application still unknown, but it should be a problem because sslocal doesn't know it is being attacked.

Mygod commented 3 years ago

Random IV should prevent cross session reordering. You should somehow ensure the IV is random.

~~As for reordering within the same TCP stream, I don't remember how it is implemented but we could presumably authenticate the previous tag as well, similar to Merkle-Damgard.~~

zonyitoo commented 3 years ago

Random IV should prevent cross session reordering. You should somehow ensure the IV is random.

The swapping happens before the client receives the first respond packet from the server. So it is unrelated that whether the IV is random. A correct implementation of server returns a random IV for every independent connection, but the situation described by this issue can still happen.

Mygod commented 3 years ago

Sure. I guess in this case, you could resolve it by ~~having server's real IV to be random xor client IV for TCP~~. I don't think this is an issue for UDP as UDP is designed to be unreliable.

Anyway this is a very active adversary, which is not very realistic in most cases. Although we could implement these mitigations ~~(xored IVs and Merkle-Damgard authentication)~~ in the future, I do not see the urgency at the moment.

zonyitoo commented 3 years ago

you could resolve it by having server's real IV to be random xor client IV for TCP.

And then client xor it back? Hmm, that is a rather good idea.

But the attacker can still have your client's IV, so he can also make it happens.

Mygod commented 3 years ago

Yeah never mind 😜. We should maybe authenticate client's IV as additional data instead.

P.S. Also now that I think about it the IV autoincrements so nothing needs to be done there. Oops.

viRikaRe commented 3 years ago

I just wonder what kind of MITM attacker will ever use this vulnerability.

When ISP/GFW doesn't suspect you, it shall not mess-up the connection. Otherwise it degrades the service and may impact other software.
When ISP/GFW is suspecting you, does this method reveal a distinctive pattern to confirm the suspicion?
When ISP/GFW confirms you are using proxy, why not just cut off your network?
If other MITM attacker wants to destroy your network, why not just drop the packets?

Anyway, a fix would be appreciated, but I don't see much priority.

RPRX commented 3 years ago

@viRikaRe

如果你正在使用 SS 浏览 HTTPS 网页，那么理论上，对调两条 TCP 返回的数据会导致服务端 TLS alert + 断连，这一行为是固定的。

没错，实际上不光是 AEAD，Shadowsocks 协议的 ciphers 都可以被这样操作，所以理论上是一个通用的识别 SS 的方式。

~~甚至可以根据 alert 长度推测出你大概在用什么 cipher~~

zonyitoo commented 3 years ago

TBH, I believe that we have already come to an agreement that the vulnerability this issue described do exists. So all of subsequence discussion should be based on this agreement.

It doesn't mean it is not vulnerable while the application of it is still unknown. It may not be fixed quickly without changing the design of shadowsocks' protocol IF we found an effective POC.

madeye commented 3 years ago

To avoid the short-term replay attacks to the clients, we'd better keep the bloom-filter persistent on the disk.

@zonyitoo Can you try implement this in shaowsocks-rust first? The implementation should be straightforward, just make sure we can restore the previous bloom-filter after restarting the client.

zonyitoo commented 3 years ago

Can you try implement this in shaowsocks-rust first?

Ok. I will give a try. But the current implementation about "Ping Pong Bloom Filter" can only filter some of the recent IVs / Salts. If we decide to persist this filter, should we use another data structure?

On the other hand, persisting the filter couldn't resolve the issue about stream swapping.

madeye commented 3 years ago

For a client, ping-pong bloom filter should be good enough. Although we call it short-term replay avoidance, it usually takes weeks to flush the filter for a client.

Right, swapping cannot be avoided by bloom-filter. As discussed above, swapping avoidance should be low priority issue.

Mygod commented 3 years ago

@madeye I don't think a bloom filter is useful. The attacker could replay some server messages sent to other clients.

madeye commented 3 years ago

@Mygod You're assuming different clients share the same key?

Mygod commented 3 years ago

Isn't that what shadowsocks supposed to do?

madeye commented 3 years ago

Okay, you mean the clients connected to the same server port. Yes, you're right.

madeye commented 3 years ago

I think we can draft a protocol upgrade for this, like what we did before in SIP004 and SIP007.

Several guidelines:

Minimize the protocol change and packet overhead.
Solve the replay issue totally in this upgrade.
Like SIP004, introduce this upgrade as new ciphers, and keep backward compatibility with old ciphers.

@database64128 @zonyitoo @Mygod What do you think?

DuckSoft commented 3 years ago

@madeye We are already working on it. (with @database64128 @zonyitoo et al) A draft version would be soon available as long as it's somewhat complete.

zonyitoo commented 3 years ago

But that proposal is still based on the current protocol. Should we move forward to shadowsocks v2 ? Making patches into the current protocol still cannot resolve #184 .

madeye commented 3 years ago

@zonyitoo I think a protocol upgrade should solve the replay issue entirely. Something like session ID can be introduced.

Moving to TLS based SOCKS6 protocol can solve everything, but that's not what we expected for shadowsocks.

Mygod commented 3 years ago

@madeye I propose allowing replay because why not. For this issue in particular, I do not see the urgency but you could presumably consider the fix here: https://github.com/shadowsocks/shadowsocks-org/issues/183#issuecomment-787404696

Basically you are going to authenticate the client TCP header (before decryption) or client IV/nonce as your additional authentication data (AAD). This way it shall prevent connection mangling.

The bottom line is that I suggest do nothing.

madeye commented 3 years ago

@Mygod Yes, I think your solution is straightforward for TCP. But for UDP, we have multiple and out-of-order IV/nonce from client, which makes the implementation complicated.

I agree that the issue described here is kind of low priority, and I don't think we should change anything immediately. In the long term, we should keep the current AEAD ciphers as they actually works pretty well.

However, if we can have some SIP to solve the replay issue once and for all, we'd better adopt it. In the past few years, I have received many reports and PoCs for attacking shadowsocks protocol based on replay. Given we already finished the refactoring of shadowsocks-rust, I think it's time to move on and do some protocol upgrading.

BTW, what about dropping all the ciphers and recommend users to use plain shadowsocks protocol over TLSv1.3 plugins. Then I don't need to write so many words here to response a protocol design flaw. 😅

Mygod commented 3 years ago

@madeye As I mentioned UDP is designed to be unreliable and stateless so that is an issue that the UDP client should handle instead. Protecting it against TCP is sufficient imo.

As for replay, I still think it is best to ignore the issue for the vanilla protocol. (If you do not like them, the way I see it you need to do timing/statefulness/handshakes. TLS handshake is the best option here.) After all, what is the point of solving this replay issue once and for all, if you could just use plugins like TLS instead?

database64128 commented 3 years ago

I propose allowing replay because why not.

@Mygod https://gfw.report/blog/gfw_shadowsocks/

madeye commented 3 years ago

IMO, adding something like session ID should be acceptable, as the overhead is negligible.

Let's see what's the proposal from @DuckSoft and others. Hope it can make most of people happy here, then they can stop spamming my inbox...

riobard commented 3 years ago

To prevent replay attacks on the client, I have a proposal that's fairly minimal with no change in protocol and entirely backwards compatible: the server calculates its nonce as the hash of the client's nonce plus the shared secret.

The client then can verify the server nonce and be sure it's the one tied to its original request. Replayed or swapped nonces will fail the check. (Assuming the server did its job to filter replayed client requests, which is the case for now?)

Older clients not knowing this trick will operate just fine. Additionally, the server would not need any random entropy to operate (useful for some virtualized environments).

On client this could be called something like Strict Mode, optional for compliant servers, and default to off for now.

DuckSoft commented 3 years ago

@riobard Good idea. Here's an enhancement plan originally brought out by @DuckSoft, with contributions from @database64128, @xiaokangwang and enlightened by @rprx:

Add 6 bytes after the original LEN field, which will turn LEN field into:

| LEN  | RESERVE | TIMESTAMP | LENGTH & TIMESTAMP TAG |
| ---- | ------- | --------- | ---------------------- |
| 2    | 2       | 4         | 16                     |

TIMESTAMP is 32-bit unix timestamp (yes i know year 2038 and it's done on purpose). RESERVE is reserved for future use, for now just pad zeroes. The server should have a check if the TIMESTAMP is in range, otherwise the connection is marked as malicious.

LEN+RESERVE = 4 bytes and TIMESTAMP = 4bytes, which fits into both 8 byte and 4 byte word length machines.

This plan can be used along with @riobard's plan (but it may need some tricks to be backward compatible)

ps: @xiaokangwang also proposed another method, which is both timestamp-ful and fully backward compatible. I'll leave it to him to state his own thoughts.

DuckSoft commented 3 years ago

The following method is originally proposed by @xiaokangwang, with credits to @DuckSoft, @rprx and @database64128 for discussing, reviewing and some patches.

The method requires a 32-byte IV, which means you must use either aes-256-gcm or chacha20-ietf-poly1305. The modification is only done inside IV:

The first 16 bytes are encrypted with AES-128 (block cipher), with key derived from PSK. The very first bit is used to indicate the direction of the traffic. For client->server packets this bit is marked as 0, and 1 for server->client. The latter 55bit is filled with Epoch timestamp, using 63-8 bits. The rest 72 bit is then filled with random stream.

For the 16 bytes remained, it is originally designed to be also filled with random stream, but we soon noticed that this method could be combined with @riobard's ones. Let's say for all server->client traffic the latter 16 bytes derives somehow from client->server IV and PSK...

Still this is like semi-finished products and need more polishing. We'd be glad to see improvements on this.

original picture:

riobard commented 3 years ago

@DuckSoft If we add the requirement that the client and the server clock to be coarsely in sync (e.g. allow drift within 15 minutes), there is no need to explicitly transmit timestamps, therefore no need to change wire protocol. We can also optionally provide complete backward-compatibility.

It's easier to explain in pseudo code:

T = unix_time_in_seconds() / 300 // current time in 5 minutes or 300 seconds interval
k0 = HKDF_SHA1(key, salt, "ss-subkey") // original session key
k1 = HKDF_SHA1(key, salt, "ss-subkey " + string(T-1)) // new session key if the client clock is 5 minutes behind the server
k2 = HKDF_SHA1(key, salt, "ss-subkey " + string(T)) // new session key if the client clock is within 5 minutes with the server
k3 = HKDF_SHA1(key, salt, "ss-subkey " + string(T+1)) // new session key if the client clock is 5 minutes ahead of the server

The server can simply decrypt using k1, k2, and k3 first, if any of the three works, we're good! Add the client nonce to a cache keyed by T-1, T, and T+1 correspondingly. We only need 3 slots in the cache, and it gradually forgets older keys as time goes on.

To be entirely backward-compatible, if all k1, k2 and k3 fail, we can optionally try k0, the original session key, and if it works, we know it's from an older client without this patch, and use the current Bloom Filter approach to provide resistance against replay attack as best as we can. This is optional, and can be disabled by a toggle named Strict Mode on the server.

DuckSoft commented 3 years ago

@riobard That makes sense and it seems like another VMess protocol with timestamps and AEAD :) So far so good. Shall we assign a SIP number to this?

DuckSoft commented 3 years ago

BTW, none of our designs seem to be resilient to what is stated in https://github.com/shadowsocks/shadowsocks-org/issues/184 by @rprx, which just delays your connection and replay the data before your connection is made, where we would got an IV failure if we turned on IV reuse filter :(

Anyway this is already the optimal solution I've ever seen. Maybe fixing problems one by one is also okay. But it's really sad that it seemed all 0-RTT Stateless proxies are dead...

riobard commented 3 years ago

@DuckSoft Don't worry, we're still trying! 😄

I wouldn't mind the theoretical attack in #184. The collateral damage would be far too great.

madeye commented 3 years ago

Like sssniff, #184 is just a theoretical attack and cannot work in the real world. Let's don't worry about it.

I think we can put @riobard's and @Mygod 's proposals together.

Generate sub-key with timestamp, +/- 5 minutes. It solves the long-term replay issue.
Use client's salt as the AD. It avoids swapping packets between TCP connections.

Let's make this proposal as new ciphers, e.g. aes-256-gcm-v2 and chacha20-ietf-poly1305-v2.

What do you think?

riobard commented 3 years ago

@madeye The current cipher choice list is way too long and confusing. Please do not put stuff like this into cipher names because it will make the cipher list even longer and more confusing. We should gradually make the list shorter and eventually phase it out so users do not make bad decisions.

There are still people sticking to old stream ciphers because it's entirely not obvious if you just look at the cipher names. Existing client implementations have mostly failed to enforce or suggest better choices.

madeye commented 3 years ago

@riobard There's only three AEAD ciphers on our list now. So don't worry about it.

riobard commented 3 years ago

If we aim for seamless backward-compatible upgrade, I suggest we go in two steps instead of one.

My first proposal to provide resistance against Replay Attack on the client is the least intrusive as it can be done on server alone without affecting older clients.

Resistance against long-term replay attack on the server is more intrusive and requires somewhat accurate clock on both clients and servers. We should think a bit more about the implications.

riobard commented 3 years ago

There's only three AEAD ciphers on our list now. So don't worry about it.

@madeye I'm afraid that's not the case in reality. For example, the most popular client on iOS, Shadowrocket, offers every cipher we've ever seen (that's over 30 entries!) and defaults to aes-256-cfb. The second most popular client, Surge, offers over 20 ciphers without suggesting a default.

Guess what they'll do if we add *-v2 cipher? A few more entries! 😂

madeye commented 3 years ago

For all the official servers and clients of shadowsocks, now we only have three AEAD ciphers supported. So, I don't think it's a problem for us.

Also, fragmentation of protocols and ciphers is never a problem for circumvention tools. In fact, it's kind of feature...

Actually very few users are manually configuring their profiles. Usually they subscribe a remote config or scan a QR code.

BTW, I don't think seamless backward-compatible upgrade is doable. Instead, it will bring more confusion if we don't define the new behavior explicitly.

riobard commented 3 years ago

For all the official servers and clients of shadowsocks, now we only have three AEAD ciphers supported. So, I don't think it's a problem for us.

Still, adding *-v2 doubles that list to six entries. That's the slippery slope I'm worried about…

Also, fragmentation of protocols and ciphers is never a problem for circumvention tools. In fact, it's kind of feature...

Only if they actually make a difference. Has there ever been any evidence that GFW can detect some stream ciphers but not the others? We paid the cost of confusion but gained nothing.

BTW, I don't think seamless backward-compatible upgrade is doable. Instead, it will bring more confusion if we don't define the new behavior explicitly.

Check my first proposal above. It is seamless backward-compatible without affecting older clients. New clients can just flip a toggle to enforce verification of replay attack.

madeye commented 3 years ago

That's the slippery slope I'm worried about…

Maybe we can call them v2-* 😅, as they're close to vmess now.

We paid the cost of confusion but gained nothing.

I don't think so. At least for years, I didn't receive random emails about attacking the stream ciphers any more. Also, those users who love stream ciphers (me) can keep using them happily...

It is seamless backward-compatible without affecting older clients.

Given the behavior changes for both client and server, I expect it can solve the long-term replay issue for both of them. The concern of "accurate clock" should not be a big problem, as we can try to sync with ntp in our implementations.

riobard commented 3 years ago

I don't think so. At least for years, I didn't receive random emails about attacking the stream ciphers any more. Also, those users who love stream ciphers (me) can keep using them happily...

@madeye You're still using stream ciphers???!!! 😂 Anyway what I meant was that there's no difference between various stream ciphers. I'd expect most users have migrated to AEAD ciphers so nobody is bothering you about deprecated stream ciphers.

Given the behavior changes for both client and server, I expect it can solve the long-term replay issue for both of them. The concern of "accurate clock" should not be a big problem, as we can try to sync with ntp in our implementations.

So you're suggesting we do it in one step to eliminate replay attack altogether? What kind of backward-compatibility do you have in mind then?

madeye commented 3 years ago

So you're suggesting we do it in one step to eliminate replay attack altogether? What kind of backward-compatibility do you have in mind then?

Yes, one step to solve all the replay issues.

The backward compatibility is only supported by old ciphers. I expect a hard fork of the currentl ciphers.

Maybe we can let @zonyitoo and @database64128 to make the decision, as they are the current active developers.

DuckSoft commented 3 years ago

IMO, stream ciphers of Shadowsocks should never be used, but it's okay if someone really wanted to play with them, as long as they know what's happening there (or not). We've done our responsibilities, but the sad thing is not everyone in the world will listen to us. Let's just move forward.

Personally I second the idea of aes-256-gcm-v2 or chacha20-ietf-poly1305-v2.

Mygod commented 3 years ago

Veto. Introducing timestamps does not fix the issue entirely and only gives you more trouble in the future. Fixing this issue is not worth the breaking update.

EDIT: Also that name is terrible.

madeye commented 3 years ago

Also that name is terrible.

😭 Hope someone can come up with some better names.

Introducing timestamps does not fix the issue entirely

IMO, bloom filter can still be used for short-term replay resistance, then we should fix all the known short-term, long-term and swapping replay attacks.

shadowsocks / shadowsocks-org