General-purpose JWT detector

dinvlad commented 1 year ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

It would be great if Trufflehog caught JWTs in the general case.

Problem to be Addressed

Looking through the list of detectors, it doesn't seem like Trufflehog has a pattern for JWTs in the general case (regardless of the identity provider). Detecting arbitrary JWTs would allow for a much greater breadth of detections, without having to add a new detector each time.

For example, Google IAP tokens are not currently captured by TH, but they're just regular JWTs.

Description of the Preferred Solution

I think this could be a simple detector that scans for the typical JWT pattern, possibly validated against the JWKS endpoint when possible.

Additional Context

References

dxa4481 commented 1 year ago

Hi @dinvlad ,

Unfortunately some basic testing of detecting JWT's on Github turned up greater than 90% of the JWT's uncovered to be test/non-sensitive JWT's.

This level of signal to noise isn't really palatable for most developers, but we may add support for specific JWT's in a similar way we added Private Key verification https://trufflesecurity.com/blog/driftwood

It's still very much in r&d, but we have a few ideas we're stewing on. We will not be adding support for generic JWT's without some way to improve the signal to noise issues.

dxa4481 commented 1 year ago

You can also find more information about our detection philosophy here https://trufflesecurity.com/blog/its-impossible-to-find-every-vulnerability-so-we-dont-try-to/

dinvlad commented 1 year ago

Thanks for a quick response - I wonder if it's possible to be more specific in their detection, e.g. only detect the ones that are signed by a public issuer (and verifiable against its JWKS endpoint). Or do you mean that even those tokens are usually test/non-sensitive?

dxa4481 commented 1 year ago

Yeah, verifying against JWKS endpoints would be something we'd be open to exploring, that's actually simpler than the idea I had in mind originally

dinvlad commented 1 year ago

Awesome, looking forward to it! I'd also be glad to submit a PR, if you'd like.

dxa4481 commented 1 year ago

A PR would be good, but it might be good to chat through a little more. Is the JWKS endpoint usually referenced in the JWT body? Or is the thinking you'd extract domain information from the file you're scanning, and try the standard /.well-known/jwks.json path?

Related (but separate) it would be great if we crawled the entire internet looking for JWKS's and uploaded them to Driftwood https://trufflesecurity.com/blog/driftwood

mac2000 commented 1 year ago

JWT may not contain any information about for whom it was created, also it may be signed by much simpler algorithms without fancy JWKS

and the most important - usually tokens are shortlived, aka if it is expired - does it matter?

for verification,

sooner or later TH may allow choosing certain detectors to speedup checks

and if such JWT detector will be checking everything we won't be able to turn it off partially (e.g. check only google but not amazon)

that's why it seems that there is a chance that having multiple small detectors may be a better approach (aka after having the first one adding the second will be easy peasy)

also consider that majority of providers are using so called "reference tokens" which are actually not valid JWT, in other words if we want to catch Google account we still gonna need dedicated detector to do so

PS: standard path will be: /.well-known/openid-configuration usually it is used for discovery of capabilities and if JWKS is supported it will have link to it

but once again, Microsoft here is an good example because token may be valid only for certain tenant 🤷‍♂️

dinvlad commented 1 year ago

@mac2000 Sorry for a delayed response - I think we can be flexible here, however in case JWT is signed without a private key but with a secret instead, we may not be able to verify it (since the secret is not known), although we could try to use an issuer endpoint if it's provided in the token. I would say we should be opportunistic here - JWKS just seems like a low-hanging fruit, since it's standardized so we may always check it in case JWT has relevant claims.

You also have a good point re multi-detector verification - I think we may just need to be careful when adding such detectors, to then explicitly exclude their issuer endpoint from the generic JWT detector.

Not sure about reference tokens - could you elaborate (possibly with an example)?

dxa4481 commented 1 year ago

I think we should leave off pre-shared key / secret JWT detection for now. The reason I know those have such high false positive rates is I guessed the secret for a sample of a few thousands JWT's using hashcat, and I was able to correctly guess about 80% of the JWT's on github, but when attempting to guess the secret for JWT's issued by webapps on websites, I was only able to correctly guess about 5%, which tells me that 1 in 20 keys checked into GitHub are just throw away test keys with dummy passwords. That's a lot of noise.

Asymmetric JWT's, that verify with JWKS endpoints on URL's found nearby are a good starting point

dinvlad commented 1 year ago

Agreed, totally!

mac2000 commented 1 year ago

I did not find something specific in the OpenId Connect spec but here is a good description from the Identity Server

https://docs.identityserver.io/en/latest/topics/reference_tokens.html

Long story short: to be able to invalidate tokens immediately instead of sending them to clients we are storing them somewhere on the backend and sending to clients only identifier which is used later to retrieve an actual token from storage

PS: yes, yes, I know, sounds strange, why then at all is JWT needed, but as have mentioned before from what I see exactly that is used for example in Google and Microsoft

That's why depending on an implementation even a string like "123" may be a valid reference token 🤷‍♂️

But indeed if we are talking about something looking like JWT it is not a case for this topic

trufflesecurity / trufflehog