projectdiscovery / tlsx

Fast and configurable TLS grabber focused on TLS based data collection.
MIT License
810 stars 76 forks source link

JA3S #542

Open hexagr opened 1 month ago

hexagr commented 1 month ago

Modify ztls.go and ja3.go to fix JA3 issue. This should close bug report #537 and result in coherent Ja3 fingerprints. On my test machine:

./tlsx -u https://microsoft.com -ja3

  _____ _    _____  __
 |_   _| |  / __\ \/ /
   | | | |__\__ \>  < 
   |_| |____|___/_/\_\  v1.1.6

        projectdiscovery.io

[INF] Current tlsx version v1.1.6 (latest)
microsoft.com:443 [364ff14b04ef93c3b4cfa429d729c0d9]

This appears to be stable. You can find these fingerprints on Shodan, too. I think this is the correct way to do JA3S. See here: https://beta.shodan.io/search/facet?query=http&facet=ssl.ja3s

These changes reflect the stated JA3 specification here in the Salesforce blog post:

"The JA3S method is to gather the decimal values of the bytes for the following fields in the Server Hello packet: Version, Accepted Cipher, and List of Extensions. It then concatenates those values together in order, using a “,” to delimit each field and a “-” to delimit each value in each field.

The field order is as follows: TLSVersion,Cipher,Extensions"

Example: 769,47,65281–0–11–35–5–16

If there are no TLS Extensions in the Server Hello, the fields are left empty.

Example: 769,47,

These strings are then MD5 hashed to produce an easily consumable and shareable 32 character fingerprint. This is the JA3S Fingerprint.

769,47,65281–0–11–35–5–16 → 4835b19f14997673071435cb321f5445

On line 95 in ja3.go, the comment should read to remove the last dashByte. I commented on it wrong. But other than that, this is a first stab at fixing bug #537. I left some fmt.Print statements commented out, which anyone can uncomment for testing. For the print statements to run, just include the 'fmt' package in the imports before uncommenting them.

./tlsx -u https://microsoft.com -ja3

  _____ _    _____  __
 |_   _| |  / __\ \/ /
   | | | |__\__ \>  < 
   |_| |____|___/_/\_\  v1.1.6

        projectdiscovery.io

[INF] Current tlsx version v1.1.6 (latest)
Fingerprint before hashing: 771,49200,65281
microsoft.com:443 [364ff14b04ef93c3b4cfa429d729c0d9]

The fingerprint now reflects the format in the Salesforce blogpost. The ServerHello reply sends us the TLS Version, a chosen cipher, and a few extensions. The only things that need to be ordered are the extensions.

I've used as many fields/extensions from the ServerHello field as possible. There are a few others, like Random and SessionId, but I think using them would cause the fingerprints to be unstable again. Here are the calls available to the ServerHello function, e.g. ServerHello.Version, etc.

Field: Version, Type: tls.TLSVersion
Field: Random, Type: []uint8
Field: SessionID, Type: []uint8
Field: CipherSuite, Type: tls.CipherSuite
Field: CompressionMethod, Type: uint8
Field: OcspStapling, Type: bool
Field: TicketSupported, Type: bool
Field: SecureRenegotiation, Type: bool
Field: HeartbeatSupported, Type: bool
Field: ExtendedRandom, Type: []uint8
Field: ExtendedMasterSecret, Type: bool
Field: SignedCertificateTimestamps, Type: []tls.ParsedAndRawSCT
Field: AlpnProtocol, Type: string

One problem is that not a lot of TLS implementations use or have built in functionality for all of the TLS extension fields. Even golang's standard library refuses to push them upstream because they're used mostly for fingerprinting. One solution to this would be to clone the standard library and patch it. See the discussion here: https://github.com/golang/go/issues/32936#issuecomment-1781229365

Another solution would be to parse them raw and handle the bytes. But this pull request uses the functionality available through the APIs and imports tlsx is already using. And it seems to work.

GeorginaReeder commented 1 month ago

Hey @hexagr , thanks so much for your contribution!

We also have a Discord server, which you’re more than welcome to join. It's a great place to connect with fellow contributors and stay updated with the latest developments!

dogancanbakir commented 1 month ago

duplicate of https://github.com/projectdiscovery/tlsx/pull/395

hexagr commented 1 month ago

@dogancanbakir I didn't see your pull request before. My bad. I was just attempting to fix the issue with JA3 after seeing the problem mentioned in the bug tracker.

Upon further investigation, scanning the internet a bit, and logging into Shodan to compare hashes, I don't think this pull request I've made fully solves the issue, despite lining up with a few hashes on Shodan. It needs some work.

The bottom line here should be to figure out how popular services generally do JA3, so that there's a baseline to compare hashes with—to be able to say "tlsx shows md5 [x] for google.com" and have it generally line up with what other internet mapping services say.

Your code looks cleaner than my attempt here. But I might dig around some more and try to develop a stable, comprehensive solution, even if it takes me a month and a thousand lines of code.