Open rgmz opened 4 months ago
FYI @dustin-decker @ahrav
Let me take a closer look. My initial test finds both secrets 🤔
Try testing with #2948. That changes the keywords to _mmk
, which appears at the end of the key.
I suspect this will occur for any secret where the keywords aren't at the beginning (e.g., OpenAI.)
I'm in New York at the moment, I'll do some more troubleshooting when I get home.
I suspect this will occur for any secret where the keywords aren't at the beginning (e.g., OpenAI.)
I'm in New York at the moment, I'll do some more troubleshooting when I get home.
Yes, that's exactly why. I have a couple of solutions to work around this limitation:
Proposal 1: Extend the span to include data both before and after the keyword, creating a radius around it instead of only looking forward. This assumes we have many detectors where the keyword follows the credential.
Proposal 2: Create a new interface that detectors can optionally implement, instructing the spanCalculator to look backwards instead of forwards.
I'm waiting for input from others before deciding. There might even be a better approach that I'm missing.
Proposal 1: Extend the span to include data both before and after the keyword, creating a radius around it instead of only looking forward. This assumes we have many detectors where the keyword follows the credential.
This is probably the best solution in the interim. There are a number of detectors that rely on keywords that aren't at the exact beginning of the secret.
Examples
GCP relies on provider_x509
which is found near the end of the JSON. Preceding attributes like private_key_id
, private_key
, client_email
, and client_id
can all be of arbitrary length. Right now, the detector only receives the final few lines of the JSON.
provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/seafood%40seafood-204620.iam.gserviceaccount.com"
}
auths
to match a JSON structure. It's typically only a couple characters away from the starting {
, but even those few missing characters will cause the detector to break.Proposal 2: Create a new interface that detectors can optionally implement, instructing the spanCalculator to look backwards instead of forwards.
This is a good solution in the long-term. It could dovetail nicely into a solution for matching secrets against a known prefix or suffix.
Right now the prefixes are hard-coded in the regex, meaning that this gets detected:
curl -X GET https://api.snyk.io/ -u "api:7ec9d869-25ae-4347-a21d-c24556f06d67"
but this does not:
curl -u "api:7ec9d869-25ae-4347-a21d-c24556f06d67" https://api.snyk.io/
Proposal 2 seems ideal because it seems pretty rare to need to look backward
Proposal 2 seems ideal because it seems pretty rare to need to look backward
It's quite common; I've encountered some variation of the example posted above countless times.
Secrets that have a distinct prefix are the only case where the possibility is entirely eliminated.
I think https://github.com/trufflesecurity/trufflehog/pull/2957 should hopefully catch most of these cases.
Please review the Community Note before submitting
TruffleHog Version
https://github.com/trufflesecurity/trufflehog/commit/4e21590cbe895b0796acec8d3204f9f9013d9d5e
Description
The optimizations introduced in #2812 don't work as expected when multiple secrets are in the same chunk.
Given the following file you'd expect to receive two matches:
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
andTkkITc_kwJzmRgzmC1zeVGrhDq7YWaCe1jxH_mmk
. However, onlyTkkITc_kwJzmRgzmC1zeVGrhDq7YWaCe1jxH_mmk
is detected.Steps to Reproduce
This includes the following change to demonstrate what the detector is receiving.
Scan using the new matching optimization
Scan the entire chunk
Environment
N/A
Additional Context
N/A
References
N/A