Avoid verifying the same key multiple times in a session

trufflesecurity / trufflehog

Find, verify, and analyze leaked credentials

https://trufflesecurity.com

GNU Affero General Public License v3.0

15.78k stars 1.64k forks source link

Avoid verifying the same key multiple times in a session #2262

Open bugbaba opened 9 months ago

bugbaba commented 9 months ago

Hello Team,

Description

Verify a key only once even if it is found multiple times in the same or different files.

For example, in the below screenshot, we can see that the same gitlab key(revoked) is getting verified twice as it's mentioned twice in the file.

Preferred Solution

To avoid wasting resources on re-verification and hitting detectors with the same keys multiple times, it is ideal to check if the given key has already been verified.

Maybe a check before the if verify block? to confirm if the key has not been verified previously.

-- Best Regards, @bugbaba

rgmz commented 9 months ago

This would benefit both trufflehog and any endpoints it calls.

I was actually playing around with adding caches to detectors so that known verifications only happen once. Work smarter, not harder. :)

rgmz commented 8 months ago

@ahrav has an experimental implementation of this in #2276.

ahrav commented 8 months ago

I agree, using a cache reduces duplicate external API calls for credential verification, improving performance and API stewardship. However, storing plaintext credentials presents a potential security risk if exposed, even though the chance of in-memory hash compromise is low. An option to mitigate exposure while retaining detection speed/efficiency could be hashing credentials before caching them with a high-speed algorithm like XXHash. This safeguards credentials while still allowing cache hits on matched hashes. Overall this balances security, performance, and responsible API usage - preventing duplicate verification calls for the same credentials.

bugbaba commented 8 months ago

Also, I think we shouldn't add ExtraData as metadata in the cache as it mostly contains sensitive info extracted from valid responses.

We just need to check if the given match exists in the cache or not. So maybe we can ignore all metadata if there are no plans to use metadata from cache.

if match in cachelist{
  ignore
}
verify block {
  verification logic
  add match to cachelist
}