trufflesecurity / trufflehog

Find, verify, and analyze leaked credentials
https://trufflesecurity.com
GNU Affero General Public License v3.0
15.63k stars 1.63k forks source link

Inconsistent treatment of default file archive scanning behaviour depending upon data source #2506

Open 0x736E opened 6 months ago

0x736E commented 6 months ago

Please review the Community Note before submitting

TruffleHog Version

v3.63.7 and later

Trace Output

N/A

Expected Behavior

TruffleHog should treat all data sources equally; unless the data source does not intrinsicly support a given feature or behaviour, then the default behaviour should be common amongst all data sources.

Actual Behavior

TruffleHog treats data sources differently when scanning for file archives. When scanning, the 'archive' handler may or may not be enabled, depending upon the data source. This results in secrets being found in file archives for some data sources but not others. There is also no mechanism for the user to enable or disable this behaviour.

The default behaviour as of v3.63.7 is as follows:

Data Source skipArchives
CircleCI False
Docker False
FileSystem False
GCS False
Git True
GitHub False
GitLab False
S3 False
SysLog False
TravisCI False

When we compare findings when scanned with the 'filesystem' and 'git' data sources we can see that the filesystem datasource scans and produces findings for the 10 secrets located in the zip file, however in the results from the scan which used the git data source we do not see this:

results_screenshot

Steps to Reproduce

  1. Create a zip file containing some secrets
  2. Scan the folder containing the zip file with the filesystem data source configuration
  3. Scan the folder containing the zip file with the git data source configuration
  4. Compare the result

Environment

Additional Context

Root cause analysis is located here:

References

dgarozzo commented 2 weeks ago

I'm experiencing this issue, too. I had previously pulled down repos and ran trufflehog against the filesystem, but am now pointing at the repo directly, and all of the previously found findings in .zip files are now no longer being found.

Is there a fix in the works?