net4people / bbs

Forum for discussing Internet censorship circumvention
3.19k stars 75 forks source link

Comments on certain past cryptographic flaws affecting fully encrypted censorship circumvention protocols #287

Open wkrp opened 9 months ago

wkrp commented 9 months ago

I've posted an article that takes a historical look at some crypto bugs that have affected circumvention protocols.

This article presents three retrospective case studies of cryptography-related flaws in censorship circumvention protocols: a decryption oracle in Shadowsocks “stream cipher” methods, non-uniform Elligator public key representatives in obfs4, and a replay-based active distinguishing attack exploiting malleability in VMess. These three protocols come from the family of “fully encrypted” circumvention protocols: their traffic in both directions is indistinguishable from a uniformly random stream of bytes (or at least, is supposed to be). Some of the flaws are fixable implementation errors; others are rooted in more fundamental design errors. Their consequences range from enabling passive probabilistic detection to complete loss of confidentiality. All have been fixed, mitigated, or superseded since their discovery.

My primary purpose is to provide an introduction of circumvention threat models to specialists in cryptography, and to make the point that while cryptography is a necessary tool in circumvention, it is not the sole or even most important consideration. Secondarily, I want to furnish a few instructive examples of cryptographic design and implementation errors in uncontrived, deployed protocols. While the flaws I discuss affected systems of significant social importance with millions of collective users, they are not well-known outside a small circle of specialists in circumvention.

The article was originally a talk proposal for the Real World Crypto symposium. As such, it's targeted more at an audience of cryptographers than circumvention developers. Some of the concepts and motivation will already be well-known to people here. The crypto bugs in the article have already been discussed here or on related forums:

The term "fully encrypted protocol (FEP)" comes from a paper at this year's FOCI workshop, "Security Notions for Fully Encrypted Protocols". The same concept has formerly been called "look-like-nothing" or "randomized". It's good to have a clear and unambiguous name for these kinds of protocols, so I've started using the "fully encrypted" label as well. You can also see it used in the title of the recent "How the Great Firewall of China Detects and Blocks Fully Encrypted Traffic".

RPRX commented 9 months ago
* obfs4 Elligator

  * [Non-canonical public key representatives](https://bugs.torproject.org/tpo/applications/tor-browser/40804#note_2834533)
  * [One bit always 0 in public key representatives](https://bugs.torproject.org/tpo/anti-censorship/team/91)
  * [Public key representatives restricted to a subgroup](https://bugs.torproject.org/tpo/anti-censorship/pluggable-transports/lyrebird/40007)

感谢 @wkrp 的 post,这一段让我想起了 Shadowsocks 的 Cloak 插件,它的 Client Hello 是:

  1. random 字段为客户端生成的 x25519 public key
  2. session id 字段为前 32 字节密文
  3. key share x25519 的值为后 16 字节密文加 16 字节 authentication tag

这样的设计似乎存在类似的问题(包括 Sever Hello),在此邀请作者进行评论 @cbeuw

Thanks for @wkrp's post. This section reminds me of the Cloak plugin for Shadowsocks, whose Client Hello is:

  1. the random field is the client-generated x25519 public key
  2. the session id field is the first 32 bytes of the ciphertext
  3. the value of key share x25519 is the last 16 bytes of ciphertext plus the 16-byte authentication tag

This design seems to have similar problems (including Server Hello), and the author is invited to comment here @cbeuw


当然会有人问 REALITY 有没有类似的问题,也为了避免有人分不清,所以写一下它和 Cloak 的主要区别:

  1. 在设计上尽可能遵循 TLSv1.3 的语义,比如 random、key share x25519 仍保持定义,仅使用废弃的 session id 做认证
  2. 对于客户端认证信息,除了 x25519,还进行了 hkdf,并且 aead 加密时顺便认证了整个 Client Hello
  3. 使用目标网站的 Sever Hello,仅替换了其中的 key share,以及后续信息长度 padding 至和目标网站一致
  4. 使用真正的 TLSv1.3 进行加密,这包括完整的 ECDHE,它提供了服务端私钥泄露级别的前向安全性
  5. 其它的,比如客户端 spider,服务端实现方式等,后者主要是处理一些细节不一致的问题,参考 这里 的代码

注意我只是简单地列出了区别,这些东西我没写过就写一下给有需要的人参考,别再来个什么宣传 REALITY

Of course, someone will ask if there is a similar problem with REALITY. In case some people cannot tell the difference, I will write about the main differences between it and Cloak:

  1. the design, as far as possible, follows the semantics of TLSv1.3: fields such as random and key share x25519 keep their definition, and the deprecated session id is used only to do authentication
  2. for the client authentication information, in addition to x25519, hkdf and aead encryption are also done to authenticate the entire Client Hello
  3. it uses the target site's Server Hello, only replacing the key share, and padding the subsequent information to the length of the same target site
  4. encrypts using real TLSv1.3, including full ECDHE, which provides forward security up to the level of server-side private key disclosure
  5. other things, such as client-side spiders, server-side implementations, etc., the latter mainly dealing with inconsistencies in details, see the code here

Note that I simply listed the differences, just write a reference for those who need it, it's not an ad for REALITY

cbeuw commented 9 months ago

感谢 @wkrp 的 post,这一段让我想起了 Shadowsocks 的 Cloak 插件,它的 Client Hello 是:

  1. random 字段为客户端生成的 x25519 public key
  2. session id 字段为前 32 字节密文
  3. key share x25519 的值为后 16 字节密文加 16 字节 authentication tag

这样的设计似乎存在类似的问题(包括 Sever Hello),在此邀请作者进行评论 @cbeuw

You're right. I have realised this a while back and now looking at it it's a rather silly oversight: the data Cloak put in Random is an X25519 public key - why not just put it in a field originally meant for a X25519 public key?

It's fixable, but requires some backwards compatibility measures on the server side.

RPRX commented 9 months ago

关于 fully encrypted protocols 的设计问题,补充一些我在 AEAD 时代的研究:

Shadowsocks AEAD 和 Brook 的加密设计存在“响应未关联请求”等问题(这些问题也存在于 Shadowsocks 的流加密中) https://github.com/shadowsocks/shadowsocks-org/issues/183 https://github.com/txthinking/brook/discussions/1164

Shadowsocks、VMess 等协议为了避免握手成为特征,不预先进行握手,为了防重放攻击导致了可被“拒绝服务攻击”的问题 https://github.com/shadowsocks/shadowsocks-org/issues/184

VMess AEAD 未认证 packet length(未默认修复)、结尾发空包(已修复)、客户端没有 drain(已修复)的问题 https://github.com/v2fly/v2ray-core/pull/940 第三点和第二点还可以扩展到 协议边界探测问题 和最近聊的 TLS-in-whatever 特征

Shadowsocks 系列还一直有一个独特的特征是 TCP、UDP 同端口,这个我不确定有没有别人提到过 https://github.com/shadowsocks/shadowsocks-org/issues/177

以及这些协议普遍缺乏前向安全等高级安全特性,当然我觉得虽然这些协议多次被爆出存在设计上的问题,但大多数问题并非不可解决。它们最大的问题还是长得像全随机数,主要是因为目前没有正经的流量长成这样,审查者可以闭着眼封。即使不封,若审查者设置了协议白名单,只允许一些常见的协议通过,亦可以简单地过滤掉 fully encrypted protocols。


On the subject of the design of fully encrypted protocols, I'd like to add some of my research during the AEAD era:

Shadowsocks AEAD and Brook's encryption design suffers from issues such as "responding to unrelated requests" (these issues are also present in Shadowsocks's stream encryption) https://github.com/shadowsocks/shadowsocks-org/issues/183 https://github.com/txthinking/brook/discussions/1164

Shadowsocks, VMess, and other protocols don't do handshakes in advance, to avoid handshaking becoming a feature, which leads to "denial of service attacks" in order to prevent replay attacks https://github.com/shadowsocks/shadowsocks-org/issues/184

VMess AEAD does not authenticate packet length (not fixed by default), sends empty packets at the end (fixed), no drain on the client (fixed) https://github.com/v2fly/v2ray-core/pull/940 The third and second points can also be extended to protocol boundary probing issues and the recent chat about TLS-in-whatever features

The Shadowsocks series has also always had a unique feature of TCP and UDP being on the same port, which I'm not sure anyone else has mentioned. https://github.com/shadowsocks/shadowsocks-org/issues/177

And the general lack of advanced security features such as forward security for these protocols, but of course I think that while these protocols have been exposed many times as having design problems, most of the problems are not insurmountable. The biggest problem with them is still that they look like full random numbers, mainly because there is no real traffic that looks like that at the moment, and censors can block them with their eyes closed. Even if they don't, if the censor sets up a protocol whitelist that allows only common protocols to pass, they can simply filter out fully encrypted protocols.

xiaokangwang commented 9 months ago

There was a previous attempt in Shadowsocks in addressing the decryption oracle, known as ota mode, which is not working and we could try to learn from it: https://prinsss.github.io/why-do-shadowsocks-deprecate-ota/.

Shadowsocks AEAD also and still have issue of having anti-replay attack for IV without a time window or other design to limited the amount of IV need to be remembered,

wkrp commented 9 months ago

There was a previous attempt in Shadowsocks in addressing the decryption oracle, known as ota mode, which is not working and we could try to learn from it: https://prinsss.github.io/why-do-shadowsocks-deprecate-ota/.

I wrote an English summary of that post and the OTA active probing vulnerability:

https://groups.google.com/d/msg/traffic-obf/CWO0peBJLGc/Py-clLSTBwAJ

A further mitigation was the introduction of OTA (One-Time Auth). OTA adds a MAC to the handshake message, and breaks the formerly unstructured client data stream into chunks, each with a 2-byte DATA.LEN field and a MAC per-chunk. The MACs are computed before encryption (MAC-then-encrypt). One problem with OTA was that it was optional: clients could still connect without it, so the old probing attack still worked. But OTA also has an active-probing vulnerability of its own, because the per-chunk MACs protect the data but not DATA.LEN--you need to decrypt DATA.LEN and read that many bytes before you can verify the MAC. The censor can record a legitimate Shadowsocks session, then replay the client→server packets. When replaying, the censor tweaks the byte corresponding to the most significant byte of DATA.LEN, which will tend to make the decrypted DATA.LEN large, which will cause the server to continue trying to read from the client, until timeout. (I'm not sure I have this attack exactly, but that's the gist of it.) I think this flaw was also discovered by @breakwa11.

The OTA vuln is fairly reminiscent of the VMess one in the paper. In both cases, the data encoding required parsing untrusted data in order to locate an authenticator for that data.