net4people / bbs

Forum for discussing Internet censorship circumvention
3.35k stars 78 forks source link

GFW Archaeology: gfw-looking-glass.sh #25

Open gfw-report opened 4 years ago

gfw-report commented 4 years ago

Author: Anonymous

Date: Sunday, March 08, 2020

Credits: GFW Report did not contribute in any step of this work. All credits goes to gfwrev.

中文版: GFW考古:gfw-looking-glass.sh

This report first appeared on GFW Report. We also maintain an up-to-date copy of the report on both net4people and ntc.party.


I came across a one-liner script by @gfwrev and got seriously impressed by it. Although it does not work anymore, I still would like to have a writeup on it for its beauty and for the author's creativity.

The one-liner named gfw-looking-glass.sh is as follows:

while true; do printf "\0\0\1\0\0\1\0\0\0\0\0\0\6wux.ru\300" | nc -uq1 $SOME_IP 53 | hd -s20; done

As shown in the figure below, it was able to print out part of the memory of the GFW. But how?

KBCrx

nc

nc -uq1 $SOME_IP 53 sends input from stdin to the port 53 of $SOME_IP as a UDP packet. As explained by @gfwrev, $SOME_IP can be any host that 1) does not response to DNS query on port 53 and 2) is on the other side of the GFW (meaning if the query is sent from China, $SOME_IP should be outside of China). Requirement 1 makes sure any response was from the GFW, rather than the destination host; while requirement 2 makes sure the well-crafted DNS query would be seen by the GFW.

Background

A little bit background on DNS format and DNS compression pointer can be very helpful to understand this exploitation.

General DNS Format

Below is the general format of DNS queries and responses:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         Identification        |              flags            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|      number of questions      |      number of answer RRs     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     number of authority RRs   |    number of additional RRs   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                            questions                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                 answers(varaible number of RRs)               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                anthority(varaible number of RRs)              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         additional information(varaible number of RRs)        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Questions Field Format

The format of questions field is as follows:

 0                   1
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           query name          |
\                               \
|                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           query type          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           query class         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Query Name Field Format

The query name of www.google.com can be represented as follows:

 0                   1
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|3| www |6|   google  |3| com |0|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

When compression pointer is used, one example is as follows:

 0                   1
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|3|  www|1|1|           offset          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

We can see www is followed by a two-byte pointer, whose two higher-order bits are turned on. The 14 bits after the two high-order bits in the pointer are offset. When offset == n, it points to the nth byte of DNS query message.

Explainations on the crafted DNS query

We now take a closer look at this well-crafted DNS query:

printf "\0\0\1\0\0\1\0\0\0\0\0\0\6wux.ru\300" | xxd -b -c 4
00000000: 00000000 00000000 00000001 00000000  ....
00000004: 00000000 00000001 00000000 00000000  ....
00000008: 00000000 00000000 00000000 00000000  ....
0000000c: 00000110 01110111 01110101 01111000  .wux
00000010: 00101110 01110010 01110101 11000000  .ru.

The first 12 bytes is just a typical DNS query where:

The most interesting part is in the questions field from byte 12 to 19.

I first thought \6wux.ru was a typo, which was supposed to be \3wux\2ru. But then I realized \6wux.ru was intentionally used to demonstrate how GFW parses the query name. In particular, although \6wux.ru does not follow the query name format, the fact it could equivalently trigger the GFW as what \3wux\2ru could do suggested the GFW "converted query name to string before pattern matching".

As introduced in the background section, a pointer takes 2 bytes. However, the crafted query has only 1 byte of the pointer. This incomplete pointer caused the GFW treating the following byte in the buffer as part of the offset. It can be inferred the offset in this query ranges from 0 to 2^8-1 and when the offset was greater than the DNS query length, the GFW would jump out of the DNS query and treat some bytes in its memory as part of the domain name. The GFW seemed not to validate if the offset is smaller than the DNS query length.

Now that the GFW has included its memory as part of the query name, all we have to do is to trigger the GFW to send a forged DNS response. @gfwrev used wux.ru as the kw{rnd} like keyword in this query. Note different keyword patterns are summarized in the Table 2 (b) of this paper.

Explainations on the forged response

After receveing the forged DNS response, hd -s20 helps to truncate the first 20 bytes of it. The 20 bytes contain 12 bytes of the fields before questions field and the first 8 bytes of the questions field: \6wux.ru\300.

The parts that are not truncated are 1) what in GFW's memory 2) followed by a forged answers field. Taking the first hex dump in the screenshot above as one example, the 2) forged answers field is:

c0 0c 00 01 00 01 00 00 01 2c 00 04 cb 62 07 41

Excluding 2) the bytes for answers field, we thus know 1) the bytes in GFW's memory.

One thing interesting is the length of the questions field in these forged responses. The questions field started with 8 bytes \6wux.ru\300 and was followed by 122 bytes GFW memory: cb 9e ... 65 61. Interestingly, the hexdump of both exploits in the screenshot have a questions field of exactly 130 bytes. Since the maximum length of a domain name and a label of domain name are 253 bytes and 63 bytes respectively, I conjectured 130 bytes was an artifitial limitation set by the GFW for each question name.

Sidenote

dig $(python -c "print( 'a.'*121 + 'twitter.com')") @"$SOME_IP"

Credits

GFW Report did not contribute in any step of this work. All credits goes to @gfwrev.