projectdiscovery / nuclei

Nuclei is a fast, customizable vulnerability scanner powered by the global security community and built on a simple YAML-based DSL, enabling collaboration to tackle trending vulnerabilities on the internet. It helps you find vulnerabilities in your applications, APIs, networks, DNS, and cloud configurations.
https://docs.projectdiscovery.io/tools/nuclei
MIT License
20.87k stars 2.52k forks source link

Preserve header order to defeat JA4H #5233

Open BraveLittleRoaster opened 6 months ago

BraveLittleRoaster commented 6 months ago

Nuclei version:

latest:v3.2.8

Current Behavior:

A while back I had made a discussion on JA3 randomization and you guys were awesome and implemented it. It worked very well to avoid my scans being blocked into certain cloud providers. (minor note: -tlsi doesn't work if you use --proxy, it reverts the to JA3 of golang).

Now there is more advanced fingerprinting that is starting to pick up and take hold. So I looked into nuclei's behavior when it came to these new fingerprinting techniques and it is possible to defeat them. If you use -tlsi it will also cause your JA4 to be randomized. This will avoid being nuclei's static JA4 of:

           Unique JA4 Hashes: nuclei-noevasion |           
              nuclei-noevasion_https_ja4.json              
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    Test Case     ┃               JA4 Hash               ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ nuclei-noevasion │ t13d191000_9dc949149365_e7c285222651 │
└──────────────────┴──────────────────────────────────────┘

           Unique JA4 Hashes: nuclei-headless |           
              nuclei-headless_https_ja4.json              
┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    Test Case    ┃               JA4 Hash               ┃
┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ nuclei-headless │ t13d1515h2_8daaf6152771_f37e75b10bcc │
└─────────────────┴──────────────────────────────────────┘

But that is not necessarily a good thing. Chrome has a randomized JA3 hash that is created per session, but a static JA4 - meaning it is possible to whitelist certain browsers based on the JA4 if you're a WAF. Yes, we could use headless mode or a raw request, but I want to kitchen sink scan all the things with existing templates without having Cloud-whoever block me or having to modify each template to be headless.

Header order is not preserved. If I want to pass in a custom header file, like so:

sec-ch-ua: "Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Linux"
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br, zstd
Accept-Language: en-US,en;q=0.9

When the request is formed it will not preserve order and will shift headers around, and make requests that look like so:

GET /a?param=FUZZ HTTP/1.0
Host: local.example.com:8000
Connection: close
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7
Accept-Encoding: gzip, deflate, br, zstd
Accept-Language: en-US,en;q=0.9
Sec-Ch-Ua: "Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"
Sec-Ch-Ua-Mobile: ?0
Sec-Ch-Ua-Platform: "Linux"
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: none
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1

The reason this matters is because of advanced fingerprinting done by JA4+, such as JA4H:

image

Here is what the fingerprints of Chrome v125 look like.

image

Browsers send specific headers in a specific order with every request, so it would be possible to mimic a JA4H to avoid nuclei being blocked on grounds of a JA4, JA4H or JA3 fingerprints. Also having the headers appear exactly how I pass it in with -H should be the expected behavior. Because if your headers look like:

GET /a?param=FUZZ HTTP/1.0
Host: local.example.com:8000
Connection: close
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36 Edg/124.0.0.0
Accept: */*
Accept-Language: en
Accept-Encoding: gzip

It is a far cry from what a browser (from legitimate traffic) looks like.

Expected Behavior:

Headers go in a file in a certain order, and are preserved through to the request.

Steps To Reproduce:

Anything else:

BraveLittleRoaster commented 6 months ago

Proof of concept of evading JA3 and JA4H in python, if anyone is curious on how to do it.

import urllib3
import ssl
from collections import OrderedDict
import random

class RandomizedCiphersHTTPSConnection(urllib3.connection.HTTPSConnection):
    def __init__(self, *args, **kwargs):
        self.ciphers = kwargs.pop('ciphers', None)
        super().__init__(*args, **kwargs)

    def connect(self):
        self.ssl_context = ssl.create_default_context()
        if self.ciphers:
            self.ssl_context.set_ciphers(self.ciphers)
        self.ssl_context.check_hostname = False
        self.ssl_context.verify_mode = ssl.CERT_NONE
        self.sock = self._new_conn()
        self.sock.settimeout(self.timeout)
        self.sock = self.ssl_wrap_socket(self.sock, server_hostname=self.host)

def get_randomized_ciphers():
    ciphers = [
        "ECDH+AESGCM",
        "DH+AESGCM",
        "ECDH+AES256",
        "DH+AES256",
        "ECDH+AES128",
        "DH+AES",
        "ECDH+HIGH",
        "DH+HIGH",
        "ECDH+3DES",
        "DH+3DES",
        "RSA+AESGCM",
        "RSA+AES",
        "RSA+HIGH",
        "RSA+3DES",
        "!aNULL",
        "!eNULL",
        "!MD5"
    ]
    random.shuffle(ciphers)
    ciphers = ciphers[:-1]
    ciphers = ':'.join(ciphers)
    print(f"[-] Using CIPHERS: {ciphers}")
    return ciphers

def main():
    headers = OrderedDict([
        ('sec-ch-ua', '"Google Chrome";v="125", "Chromium";v="125", "Not.A/Brand";v="24"'),
        ('sec-ch-ua-mobile', '?0'),
        ('sec-ch-ua-platform', '"Linux"'),
        ('Upgrade-Insecure-Requests', '1'),
        ('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36'),
        ('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7'),
        ('Sec-Fetch-Site', 'none'),
        ('Sec-Fetch-Mode', 'navigate'),
        ('Sec-Fetch-User', '?1'),
        ('Sec-Fetch-Dest', 'document'),
        ('Accept-Encoding', 'gzip, deflate, br, zstd'),
        ('Accept-Language', 'en-US,en;q=0.9')
    ])

    ciphers = get_randomized_ciphers()
    ssl_context = ssl.create_default_context()
    ssl_context.set_ciphers(ciphers)
    ssl_context.check_hostname = False
    ssl_context.verify_mode = ssl.CERT_NONE

    # Suppress only the single InsecureRequestWarning from urllib3 needed.
    urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

    http = urllib3.PoolManager(
        num_pools=1,
        headers=headers,
        ssl_context=ssl_context
    )

    url = "https://www.example.com:8443/"
    response = http.request('GET', url, headers=headers, retries=False, timeout=10.0)

    print(response.data)

if __name__ == "__main__":
    main()

I suspect the reason the JA4 is static on Chrome while the JA3 is random is due to QUIC.