trufflesecurity / trufflehog

Find, verify, and analyze leaked credentials
https://trufflesecurity.com
GNU Affero General Public License v3.0
16.01k stars 1.67k forks source link

Native support for scanning docker images (transparent nested .tar unpacking) #674

Open hlein opened 2 years ago

hlein commented 2 years ago

Community Note

Description

It would be nice if trufflehog could smartly scan nested .tar files, as seen in e.g. docker containers.

Problem to be Addressed

When scanning a docker image tarball (such as one saved with docker save ...), trufflehog currently just prints the top-level .tar filename for every hit. This doesn't give a lot of transparency to what component inside the image, or what resulting file path inside a container launched using the image, contains the hit.

Description of the Preferred Solution

Best-case, trufflehog would understand and record-keep when looking inside tar archives, and support doing so in a nested fashion, because docker images are typically nested .tar files of multiple layers, and then print out that context on a hit, maybe something like:

File: foo.tar:b0d4d7051229875a2bfd9809c631c9899748f0e1fc6f408a446048dc6b60ca20:etc/secrets

Maybe this would be something generalized, that makes trufflehog filesystem smarter. Or, it might have to be a dedicated mode, trufflehog archive or something. Uncompressed .tar is one thing; I expect compressed archives would be more painful.

Additional Context

There is a fuse filesystem for mounting archives which supports recursive/nested archives as well, https://github.com/mxmlnkn/ratarmount, which transparently turns archive files into subdirectories.

So for example:

mkdir -p some_container
ratarmount -c -r -o ro,allow_other some_container.tar some_container
trufflehog filesystem --directory=some_container 2>&1 | tee "trufflehog_some_container.out"

Found unverified result 🐷🔑❓
Detector Type: URI
Raw result: http://user:host@foo:3128
File: some_container/e60a0dfc08a94dabb221d8a28c6fdbeaa7cab0c146d35e8eff8e50bc2e4c194b/layer.tar/usr/lib/python2.7/site-packages/urlgrabber/grabber.py

Found unverified result 🐷🔑❓
Detector Type: URI
Raw result: http://username:password@host.com:80/path
File: some_container/96e436883f4940841fc9f1f7e935bada3859d2ffb0e5455952438d844f8e9c26/layer.tar/usr/lib/python2.7/site-packages/pip/_vendor/urllib3/util/url.py

Found unverified result 🐷🔑❓
Detector Type: PrivateKey
Raw result: -----BEGIN PRIVATE KEY-----
MIICd[snip]
-----END PRIVATE KEY-----
File: some_container/b0d4d7051229875a2bfd9809c631c9899748f0e1fc6f408a446048dc6b60ca20/layer.tar/usr/share/doc/perl-IO-Socket-SSL/example/simulate_proxy.pl
...

Or for a large collection of them:

# for A in *tar ; do 
  D=$(echo "$A" | sed 's/\.tar$//') ;
  mkdir -p "$D" ; 
  ratarmount -r -o ro,allow_other "$A" "$D" ;
done
$ for A in *tar ; do
  D=$(echo "$A" | sed 's/\.tar$//') ;
  test -s "trufflehog_${D}.out" && continue ;
  echo "$D" ;
  trufflehog filesystem --directory="$D" 2>"trufflehog_${D}.err" | tee "trufflehog_${D}.out"
done

If adding native nested-archive support does not seem worth it/desirable, then perhaps just polish/improve this example and document it somewhere.

nyanshak commented 1 year ago

I was going to have a look into this but realized I probably don't have enough time to untangle this right now since it's tied to multiple things, so instead I'll try to leave some notes that might be helpful for anyone else looking into it.

Right now the Handler interface has FromFile(context.Context, io.Reader) chan ([]byte). For archive handler, we might instead want the return type to be (path string, []byte). Then we could update some field on the chunk.SourceMetadata to represent any sub-archive paths.

The problems that I see with it:

Suggestion might be to add something like ArchivePath to SourceMetadata directly, where you can set full paths, like some_container/b0d4d7051229875a2bfd9809c631c9899748f0e1fc6f408a446048dc6b60ca20/layer.tar:/usr/share/doc/perl-IO-Socket-SSL/example/simulate_proxy.pl. More generally it could look like PATH_TO_FILE_IN_ARCHIVE[:PATH_TO_FILE_IN_SUB_ARCHIVE]...