v2fly / v2ray-core

A platform for building proxies to bypass network restrictions.
https://v2fly.org
MIT License
29.42k stars 4.65k forks source link

Deep packet inspection to classify V2Ray traffic in Dec, 2020 #557

Closed rickyzhang82 closed 2 years ago

rickyzhang82 commented 3 years ago

I collected V2Ray traffic data and reran my deep packet inspection test, as described from the issue here.

I compiled V2Ray (commit hash 5dffca842) in Go 1.15 and use TLS + websocket.

In 10 days, I collected V2ray connections 17,998, none-V2ray TLS connections 136,981. I trained the new CNN model. It still could reach traffic classification accuracy 0.9959. It shows a perfect ROC curve.

ROC

Has V2Ray dev team schedule any road map to blend in V2Ray with other none-V2Ray TLS traffic yet?

kslr commented 3 years ago

go tls is the same

dyhkwong commented 3 years ago

go tls is the same

Not exactly. See https://github.com/v2ray/v2ray-core/issues/2522 and https://github.com/v2ray/v2ray-core/pull/2521. According to the commit author, only when transport = h2 and disableSessionResumption = true, v2ray has the same tls fingerprint as golang default.

DuckSoft commented 3 years ago
TL;DR #### Problem * Want To Use CDN -> Need WebSocket; * Golang http have alpn h2 and http/1.1 as default, * but CDN Edge Nodes cannot upgrade to WebSocket from HTTP/2: * -> Avoid Handshake with h2 * -> Remove h2 from alpn * -> Got busted from normal Golang traffic. #### Solution 1. ~~WebSocket over HTTP/2~~ (nope, need CDN support) 2. gRPC (Protobuf over HTTP/2): Supported by CDNs like CloudFlare

Well, we know that WebSocket is poorly supported on HTTP/2. The Go standard libaray use HTTP/2 by default, so it's difficult to reach 100% anonymous in VMess / WebSocket / TLS mode (you have to avoid HTTP/2).

A very direct purpose of use VMess / WebSocket / TLS instead of VMess / TLS is that, the former can easily be transported over a CDN (like CloudFlare). If not, a more direct way like VMess / TLS even VMess / TCP is preferred. It's the CDN that decided the usage of WebSocket.

However, Golang programs are known to have h2,http/1.1 alpns. If we don't change this default value, when we handshake with CDN edge nodes, we will likely to get an HTTP/2 connection instead of HTTP/1 connection. It's known that WebSocket can only be upgraded from HTTP/1, so getting an HTTP/2 connection means no WebSocket available. That's why we need to hardcode http/1.1 into alpn when we are using WebSocket as transport. And when you hardcode http/1.1, this becomes distinguishable from normal Golang network traffic.

And that's why I started https://github.com/Qv2ray/gun, an attempt to use gRPC (Protobuf over HTTP2) as the transportation layer. This will make it a lot easier to blend into normal Golang program traffic.

Maybe it's time to let WebSocket die. Let's hug gRPC and forget about HTTP/1.1.

RPRX commented 3 years ago

@dyhkwong

目前除了 WSS,都没有特意设置请求的 ALPN。

至于 SessionTicketsDisabled,目前默认传 false,我需要去看下 Go 的默认值。

DuckSoft commented 3 years ago

@rickyzhang82 would you mind to do an activation visualization or something, to help us locate the ROI? Thanks in advance.

ghost commented 3 years ago

@rickyzhang82 would you mind to do an activation visualization or something, to help us locate the ROI? Thanks in advance.

na me wo xiang wen yi xia da ti zi yong shen me xie yi bi jiao hao?

darhwa commented 3 years ago

The fact that v2ray's ClientHello when using wss transport is unque is not a news. And it will remain even after adopting uTLS, because none of common browsers still use http/1.1-only ALPN.

~Two choices: whether to give up wss transport in v2ray, or to keep using it until another boot (unlikely) falls. Which one will cause less pain?~ gRPC looks promising.

RPRX commented 3 years ago
    // SessionTicketsDisabled may be set to true to disable session ticket and
    // PSK (resumption) support. Note that on clients, session ticket support is
    // also disabled if ClientSessionCache is nil.
    SessionTicketsDisabled bool
    // ClientSessionCache is a cache of ClientSessionState entries for TLS
    // session resumption. It is only used by clients.
    ClientSessionCache ClientSessionCache

Currently, though SessionTicketsDisabled is set to false by default like other golang programs, ClientSessionCache is being used at client side, which may produce differences from other golang programs that use the default configuration.

Note that TLSv1.3 handshake is always 1-RTT in Golang, as Golang doesn't support early data yet. So maybe it's time to disable session resumption by default.

darsvador commented 3 years ago

I collected V2Ray traffic data and reran my deep packet inspection test, as described from the issue here.

I compiled V2Ray (commit hash 5dffca842) in Go 1.15 and use TLS + websocket.

In 10 days, I collected V2ray connections 17,998, none-V2ray TLS connections 136,981. I trained the new CNN model. It still could reach traffic classification accuracy 0.9959. It shows a perfect ROC curve.

ROC

Has V2Ray dev team schedule any road map to blend in V2Ray with other none-V2Ray TLS traffic yet?

It seems you are using a very imbalanced dataset. What about your PR-curves?

Ref: https://en.wikipedia.org/wiki/Accuracy_paradox

rickyzhang82 commented 3 years ago

I repeated what I did last year. See the steps here.

The only difference I make this year is to add a filter in none-V2ray traffic where only TLS traffic were retained.

But I still kept the equal size between V2Ray traffic and none-V2Ray traffic in data generator. @DuckSoft, the binary classification categories in both training data set and inference data set are equal.

The change is too trivial to publish the whole thing again. But anyone could repeat the whole thing at home. @xiaokangwang confirmed he could replicate it.

See

def generate_train_validation_packet_path_list(data_root=DATA_ROOT, training_pct=TRAINING_DATA_PERCENTAGE, eqaul_size=True,
                                               non_v2ray_file_filter_func=None):
    # All file list
    file_list = rglob(data_root, PACKET_FILE_EXT)
    # V2ray file list
    v2ray_file_list = [file_path for file_path in file_list if binary_classification(file_path) == 1]
    # None V2ray file list
    if non_v2ray_file_filter_func is None:
        non_v2ray_file_list = [file_path for file_path in file_list if binary_classification(file_path) == 0]
    else:
        non_v2ray_file_list = [file_path for file_path in file_list if binary_classification(file_path) == 0 and non_v2ray_file_filter_func(file_path)]

    v2ray_file_list.sort()
    non_v2ray_file_list.sort()

    if eqaul_size:
        cut_off_count = min(len(v2ray_file_list), len(non_v2ray_file_list))
        v2ray_file_size = cut_off_count
        non_v2ray_file_size = cut_off_count
    else:
        v2ray_file_size = len(v2ray_file_list)
        non_v2ray_file_size = len(non_v2ray_file_list)

    v2ray_indexes = np.arange(len(v2ray_file_list))
    np.random.shuffle(v2ray_indexes)
    non_v2ray_indexes = np.arange(len(non_v2ray_file_list))
    np.random.shuffle(non_v2ray_indexes)

    training_file_list = [v2ray_file_list[index]
                          for index in v2ray_indexes[:math.ceil(v2ray_file_size * training_pct)]] + \
                         [non_v2ray_file_list[index]
                          for index in non_v2ray_indexes[:math.ceil(non_v2ray_file_size * training_pct)]]

    validation_file_list = [v2ray_file_list[index]
                            for index in v2ray_indexes[math.ceil(v2ray_file_size * training_pct): v2ray_file_size]] + \
                           [non_v2ray_file_list[index]
                            for index in non_v2ray_indexes[math.ceil(non_v2ray_file_size * training_pct): non_v2ray_file_size]]

    print("Statistics: ")
    print("Total V2ray traffic %d, Total non-V2ray traffic %d" % (len(v2ray_file_list), len(non_v2ray_file_list)))
    print("Output train traffic %d, Total validation traffic %d" % (len(training_file_list), len(validation_file_list)))

    return training_file_list, validation_file_list
DuckSoft commented 3 years ago

Golang programs are rare and I'm not surprised that it's accurately picked out. I thought V2Ray should at least blend with normal Golang traffic.

Currently WS + TLS is easily distinguished due to the ALPN and disableSessionResumption, mentioned above. Again, a visualization of the neural network layers can help us to locate the ROI and do adversials quickly.

@rickyzhang82

darsvador commented 3 years ago

I repeated what I did last year. See the steps here.

The only difference I make this year is to add a filter in none-V2ray traffic where only TLS traffic were retained.

But I still kept the equal size between V2Ray traffic and none-V2Ray traffic in data generator. @DuckSoft, the binary classification categories in both training data set and inference data set are equal.

The change is too trivial to publish the whole thing again. But anyone could repeat the whole thing at home. @xiaokangwang confirmed he could replicate it.

See

def generate_train_validation_packet_path_list(data_root=DATA_ROOT, training_pct=TRAINING_DATA_PERCENTAGE, eqaul_size=True,
                                               non_v2ray_file_filter_func=None):
    # All file list
    file_list = rglob(data_root, PACKET_FILE_EXT)
    # V2ray file list
    v2ray_file_list = [file_path for file_path in file_list if binary_classification(file_path) == 1]
    # None V2ray file list
    if non_v2ray_file_filter_func is None:
        non_v2ray_file_list = [file_path for file_path in file_list if binary_classification(file_path) == 0]
    else:
        non_v2ray_file_list = [file_path for file_path in file_list if binary_classification(file_path) == 0 and non_v2ray_file_filter_func(file_path)]

    v2ray_file_list.sort()
    non_v2ray_file_list.sort()

    if eqaul_size:
        cut_off_count = min(len(v2ray_file_list), len(non_v2ray_file_list))
        v2ray_file_size = cut_off_count
        non_v2ray_file_size = cut_off_count
    else:
        v2ray_file_size = len(v2ray_file_list)
        non_v2ray_file_size = len(non_v2ray_file_list)

    v2ray_indexes = np.arange(len(v2ray_file_list))
    np.random.shuffle(v2ray_indexes)
    non_v2ray_indexes = np.arange(len(non_v2ray_file_list))
    np.random.shuffle(non_v2ray_indexes)

    training_file_list = [v2ray_file_list[index]
                          for index in v2ray_indexes[:math.ceil(v2ray_file_size * training_pct)]] + \
                         [non_v2ray_file_list[index]
                          for index in non_v2ray_indexes[:math.ceil(non_v2ray_file_size * training_pct)]]

    validation_file_list = [v2ray_file_list[index]
                            for index in v2ray_indexes[math.ceil(v2ray_file_size * training_pct): v2ray_file_size]] + \
                           [non_v2ray_file_list[index]
                            for index in non_v2ray_indexes[math.ceil(non_v2ray_file_size * training_pct): non_v2ray_file_size]]

    print("Statistics: ")
    print("Total V2ray traffic %d, Total non-V2ray traffic %d" % (len(v2ray_file_list), len(non_v2ray_file_list)))
    print("Output train traffic %d, Total validation traffic %d" % (len(training_file_list), len(validation_file_list)))

    return training_file_list, validation_file_list

@rickyzhang82 The above code undersamples both the training set and the validation set. It's ok to undersample on the training set, but undersampling on the val/test set is not such a good choice. This method of dividing the data set allows the classifier to cheat. Because the distribution of positive and negative examples of the test set has been artificially adjusted. For machine learning, what is important is the independence of the training set and the test set.

Ref: https://datascience.stackexchange.com/questions/61858/oversampling-undersampling-only-train-set-only-or-both-train-and-validation-set

darhwa commented 3 years ago

@rickyzhang82 Can you please try the iptables rules I posted on https://github.com/v2ray/discussion/issues/704#issuecomment-636470971 to the same datasets? It should still work for wss mode by now. I wonder what its accuracy is.

HirbodBehnam commented 3 years ago

Hello From yesterday, Iran is actively blocking VLess + TLS traffics from v2ray v4.32.1. However, it seems that v2ray v4.34.0 is working fine. (I haven't tested v4.33.0) I'm not sure if Iran is using the same thing said here or not, but, any vless + tls traffic to port 443 even with fallback web server and legit certificate is getting ACTIVELY blocked: The client's hello message won't get delivered to server. However, I have found two easy ways to bypass this blocking:

  1. Change the port to anything but 443
  2. Use v2ray v4.34 (It looks like that version also sometimes fails but I'm not sure)

The biggest problem right now is v2rayNG and v2ray itself because they are using the old version of v2ray which does not work at all in Iran. If you know persian, you can read my tweets about this. And at last, I'm not writing this comment to request anything or ask for anything; I'm just writing this to inform everyone who searches about Iran and dpi in this repository.

DuckSoft commented 3 years ago

@HirbodBehnam That's no surprise. In 4.34 Release Notes

Also: https://github.com/v2fly/v2ray-core/issues/557#issuecomment-751962569


You may validate my conclusion by setting disableSessionResumption to true in 4.32.1.

RPRX commented 3 years ago

@HirbodBehnam Please test (at client side):

v2ray-core v4.32.1/v4.33.0 with "disableSessionResumption": true in "tlsSettings"

v2ray-core v4.34.0 with "enableSessionResumption": true in "tlsSettings"

HirbodBehnam commented 3 years ago

@DuckSoft @rprx I might have made a mistake, because I tested both on a new client on my laptop that is working and tested on v2rayNG on my phone with v2ray 4.32.1 and apparently, none of the changes made any difference in any devices. My laptop still had access to my server while my phone didn't. So maybe it's not about session resumption? The more strange thing is that I upgraded one of my family member's laptop's v2ray client to 4.34 and It still was broken. The more strange thing is that both of us, had the same config, same OS (windows 10), same server and even same network and ISP. Still my laptop was working fine but that laptop was not working at all. (Just note that I had manually resolved the domain name in config files so that wasn't an DNS issue) I do not have any explanation why these computers are acting different under exactly same conditions (Except the hardware of course).

HirbodBehnam commented 3 years ago

Hello again So apparently, changing ports doesn't do much because my new port got blocked after 2 days...

RPRX commented 3 years ago

@HirbodBehnam try v2ray-core v4.24.2 (as client)

RPRX commented 3 years ago

@HirbodBehnam and v2ray-core v4.23.1

HirbodBehnam commented 3 years ago

@rprx Still nothing... I am now currently testing VMess + Raw WS. Probably there is something in TLS connections (probably Go's fingerprint?) that triggers the Iran's firewall. I might also try trojan (the C++ server and client) in order to see if that also triggers the Iran's firewall or not.

rickyzhang82 commented 3 years ago

@HirbodBehnam VMESS protocol is the first one I ran my test. It subjects to the same vulnerability to be identified by DNN classifier.

NathanIceSea commented 3 years ago

@rprx Still nothing... I am now currently testing VMess + Raw WS. Probably there is something in TLS connections (probably Go's fingerprint?) that triggers the Iran's firewall. I might also try trojan (the C++ server and client) in order to see if that also triggers the Iran's firewall or not.

@DuckSoft @rprx I might have made a mistake, because I tested both on a new client on my laptop that is working and tested on v2rayNG on my phone with v2ray 4.32.1 and apparently, none of the changes made any difference in any devices. My laptop still had access to my server while my phone didn't. So maybe it's not about session resumption? The more strange thing is that I upgraded one of my family member's laptop's v2ray client to 4.34 and It still was broken. The more strange thing is that both of us, had the same config, same OS (windows 10), same server and even same network and ISP. Still my laptop was working fine but that laptop was not working at all. (Just note that I had manually resolved the domain name in config files so that wasn't an DNS issue) I do not have any explanation why these computers are acting different under exactly same conditions (Except the hardware of course).

Test with clean Linux OS with client v4.34 and see, native socks5 or http proxy port could possibly get detected by spyware, same goes with smart pohone

HirbodBehnam commented 3 years ago

@NathanIceSea Hello I don't think this is a spyware problem. With wireshark, I captured the client hello of one of my devices with was not working, moved it to a computer which was working, and send it to my server. The packet never reached the server. Also, Iran uses a packet injection to client for some reason. Have a look at this. Translation:

PSH,ACK :) Probably the IP.TTL is different from SYN, ACK. Except Irancell (an ISP in Iran), the rest of ISPs copy the last IP.ID for the packet which is injected. When I used to talk about packet injection, I meant this. Usually happens randomly and stays on that computer. (Meaning that everytime, you will get packet injection). I haven't yet found a explanation for this.

And you can see the wireshark screenshot which I have tweeted. Same ID, but different TTL.

Plus, it's good to note that on phone, using v2ray with Termux, or by using HTTP Injector, I can connect to my server without any problems. (With the port 10808 which is the default port in my config)

Also small status report: VMess + Raw WS and Trojan are working fine for now. Probably Iran doesn't DPI the WS connections? Or doesn't have to tools/knowledge to identify the VMess.

klzgrad commented 3 years ago

Hi, this adversarial work is important, and I know it takes a lot of effort, so keep doing it, if you can.

It would be more helpful if you can provide more microscopic or ablation analysis of the most informative features, which will then be actionable, otherwise it's just one big conclusion.

malikshi commented 3 years ago

VMess + Raw WS and Trojan are working fine for now

how your configuration for Vmess?

HirbodBehnam commented 3 years ago

@malikshi I used this

malikshi commented 3 years ago

@malikshi I used this

and cf setting disable orange proxied?

HirbodBehnam commented 3 years ago

Yeah I'm not using cloudflare at all.

malikshi commented 3 years ago

Thank you.

HirbodBehnam commented 3 years ago

Hello again After some small tests and breaks, I found something which could be the cause of the issue which we had. So I realized that if I save the client hello packet on a computer which v2ray is working it in, move it to a computer which client hello is not working in it, and re-send the exact packet to the server, it reaches the server and the server responds to it. And vice versa; Meaning that client hello from not working computers, still does not work on working computers. So in wireshark, I found one small difference in client hello packets. Here is cipher suites of a working computer: Working-Ciphers And here are ciphers of a computer which was not working: Not-Working-Ciphers As you can see, on all the computers which v2ray was fine, the AES cipher was sitting on the top, however, on all the devices that v2ray was not working ChaCha cipher was sitting on top. This made me think that probably the AES-NI is playing a role here. After some digging in Golang's code I found these lines. This means that if a computer doesn't support AES-NI, the Golang will prefer the ChaCha ciphers to AES ciphers and vice versa.

So on my own computer I edited that if statement to false to force my computer to use ChaCha ciphers at top and used tls.Dial to make simple TLS connection to my server. And as I guessed, it didn't work. Now for the final test, I forced Go to use the AES ciphers and moved my application to another device which was not working; As expected again, the dialing using default configs resulted in timeout and forcing AES ciphers resulted in successful connection. So probably, the Iran's firewall, blocks all TLS connections which has ChaCha20 cipher on top? If that is the case, what kind of fucking bullshit method is this :D. Why block connections with ChaCha20 on top? I have no clue.

But anyway; I really want to appreciate everyone who has contributed in this project and made v2ray. Thanks everyone :)

PS 1: It's worth noting that v2rayNG used ChaCha as it's top cipher suites, while HTTP injector used AES. PS 2: I'm a really noob guy; But I try to recompile v2ray with forced AES and post the results here. As I said, It looks like that the Cipher Suits is the problem but I can't yet be sure.

Thanks again every one :)

Edit: I also forgot to apologize from everyone; because at first I thought this has to do something with DPI, but from what it looks like, it's just a simple matter of fingerprint + cipher suite combinations which triggers Iran's firewall.

DuckSoft commented 3 years ago

@HirbodBehnam That's very fruitful discovery! Thank you!

HirbodBehnam commented 3 years ago

So apparently that was the case! I have successfully connected to my server by forcing the v2ray to prioritize AES ciphers. Here is the binary file I compiled and used: v2ray-aes.zip

Here is what I did to compile this binary 1. At first install Go (I used go 1.15.7) 2. Go to where you have installed Go, then `src/crypto/tls/common.go` and open it. 3. There is a line which says `hasGCMAsm = hasGCMAsmAMD64 || hasGCMAsmARM64 || hasGCMAsmS390X` in the function `initDefaultCipherSuites`. 4. Add `|| true` to last of it. 5. Download the latest source code of v2ray from releases (I used 4.34.0) 6. Compile it with go. (I used `go build -o D:/v2ray.exe -trimpath -ldflags "-s -w -buildid=" ./main`)

This client should force all TLS connections to prefer AES to Chacha. It's also worth noting that I also tried forcing Chacha and I couldn't connect from devices which was working fine.

I really doubt there is anything v2ray can do to fix this; unless adding custom ciphers which It looks like there is no plan for it. But anyway thanks everyone, and once again I apologize for posting this issue in a DPI related issue!

kslr commented 3 years ago

This client should force all TLS connections to prefer AES to Chacha. It's also worth noting that I also tried forcing Chacha and I couldn't connect from devices which was working fine.

Removed in a previous security update, you can manually patch https://github.com/v2fly/v2ray-core/commit/9321210bcfbd92243c13f49d0a3e558e800e1097#diff-be836badf579ea512702a700ff7bb7f654b6f6ccced8a38c031fb331a1b491cdR144

klzgrad commented 3 years ago

While checking naiveproxy's fingerprints I started to question the decision to disable session resumption by default. The reason given was simply that with session resumption the TLS fingerprint (manifested as the pre_shared_key extension in TLS 1.3) differs from the top ranking one and in order to minimize exposure let's just use the top ranking fingerprint all the time.

Now I believe this reasoning is flawed, and it's fairly easy to see why by looking into usage of session resumption in the real world. Session resumption does happen organically in the wild all the time, though less frequently than not. Per tlsfingerprint.io, the top fingerprint 9c673fd64a32c8dc has two neighbors 5408690af1e08199 (past week 5.88%, large session tickets that preclude padding), e360886acbf4f415 (past week 0.53%, smaller session tickets with padding). Do we look at this number of 0.53% and say it makes this fingerprint more classifiable as a circumvention technology? No. The fact that it appears less frequently than not does not imply it is not an organic fingerprint factor.

Session resumption being less likely to happen is a result of its requirements of multiple conditions being present:

I took a real tcpdump capture of Chrome browsing and found pre_shared_key with these SNIs: www.google-analytics.com, adservice.google.com, fonts.googleapis.com, update.googleapis.com, ajax.cloudflare.com. These examples can explain the unlikeliness of the third condition: If the usage pattern is of frequent requests to the same host, a carefully designed network stack will reuse the same TLS session without needing to restart and resume it. If the usage pattern is too infrequent, or the browser itself is restarted or drops the session tickets, it will also not resume previous TLS sessions. And these domains are exactly the type of hosts that will serve infrequent but not too infrequent requests over time.

The unlikeliness of session resumption also once again reveals it has fallen out of favor in the task of reducing connection setup overhead (TLS RTT) compared to the strategy of connection pooling and reuse, but nevertheless it remains an organic usage pattern emitted by the most common browsers and servers.

Now back to the decision to disable session resumption by default, I believe it is not only based on flawed logic, but it is actively harmful to the goal of minimizing classification exposure. The act of disabling a commonly supported and less commonly used feature altogether itself creates a less common/more unique configuration, and the total lack of any session resumption also constitutes a passive feature.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

github-actions[bot] commented 2 years ago

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 5 days

ghost commented 2 years ago

As you can see, on all the computers which v2ray was fine, the AES cipher was sitting on the top, however, on all the devices that v2ray was not working ChaCha cipher was sitting on top. This made me think that probably the AES-NI is playing a role here.

So probably, the Iran's firewall, blocks all TLS connections which has ChaCha20 cipher on top? If that is the case, what kind of fucking bullshit method is this :D. Why block connections with ChaCha20 on top? I have no clue.

I believe the GFW of Iran is trying to block everything while causing the least damage to ordinary web browsing traffic (TLS traffic on port 443 and cleartext on 80) that affect social or economical aspects, because as we all know banks and government/social services rely on these technologies to provide basic services to citizens.

And here's the interesting part: Most websites tend to use TLS v1.3 with the AES 128 GCM cipher on a web browser (check out twitter.com on Firefox as an example)! And that may solve the mystery above. It seems like they have trained an AI model on mass amounts of captured traffic to create a profile for normal web browsing and automatically block anything that doesn't resemble it!