trufflesecurity / trufflehog

Find, verify, and analyze leaked credentials
https://trufflesecurity.com
GNU Affero General Public License v3.0
17.34k stars 1.72k forks source link

APK scanning error: `resources.arsc file not found in the APK archive` #3619

Open rgmz opened 4 days ago

rgmz commented 4 days ago

I've encountered an APK that does not include a resources.asrc file. It's unclear whether this is aberrant or a special type of .apk that should be account for (ignored?).

2024-11-18T12:05:24-05:00 error trufflehog error processing apk content {"repo": "https://github.com/facebook/buck.git", "commit": "15d6524", "path": "assets/android/agent.apk", "timeout": 60, "mime": "application/vnd.android.package-archive", "timeout": 60, "timeout": 60, "error": "resources.arsc file not found in the APK archive"}

https://github.com/facebook/buck/blob/5094f40f9461f27c836f3860140bfa8ffe4d3696/assets/android/agent.apk

The APK only contains the following files:

$ unzip agent.apk
Archive:  agent.apk
  inflating: AndroidManifest.xml
  inflating: classes.dex
  inflating: lib/armeabi-v7a/libagent.so
  inflating: lib/x86/libagent.so
  inflating: META-INF/CERT.SF
  inflating: META-INF/CERT.RSA
  inflating: META-INF/MANIFEST.MF
ahrav commented 4 days ago

@joeleonjr might be interested.

joeleonjr commented 4 days ago

My sense is this is not super common, but if it's important to support these situations, then I would suggest the following change:

In apk.go, if there is no resources.arsc file, then we skip scanning the resources.arsc file for secrets, treat *.xml files as plaintext (even though most will be encoded) and then process *.dex and all other file types as normal.

As an alt: we could add a third check in the isAPKFile() function in handlers.go and specifically search for resources.arsc. If it's not found, then the file would be treated as n zip. If we pull the *.dex logic out of the apk.go handler and make a generic dex.go handler, then any apk without the resources.arsc file being treated like a zip would still benefit from our best-effort processing.

Both get at the same result.

What do you all think?