hlein / distro-backdoor-scanner

GNU General Public License v2.0
39 stars 4 forks source link

distro-backdoor-scanner

Tools to scan OS distributions for backdoor indicators.

See USAGE for a rundown of how to use each script.

The toolkit used for the xz-utils backdoor is far too sophisticated to be a first draft. Were there earlier iterations of this, that shared some things in common but were slightly simpler, injected into other projects? Can we detect the style/"fist" of the author elsewhere? Moreso the delivery mechanics - backdooring codebases - than the contents of the extracted+injected malicious .so.

There need to be more search patterns, among other things; see TODO.

Distros supported:

Checking distfiles

Tools:

These scripts unpack the source packages for all of a distro repo's current packages, then scan them for content similar to the malware that was added to xz-utils.

Have been run over:

This gives a manageable amount of results (~hundreds of hits), digestable by a human. So far the only confirmed malicious results are... from the backdoored xz-utils versions.

Checking M4 macros

Tools:

These scripts harvest every iteration of every .m4 macro file ever committed to some specific repos considered "known good" (if, say, GNU automake upstream has already been trojaned, then the preppers were right, civilization is ending). Build an SQLite database of files, their embedded serial numbers (if any), their plain sha256 checksum plus a checksum of the file contents with comments and whitespace-only lines removed.

Then, for a given tree of sources (such as unpacked by package_unpack_all.sh), bash every .m4 file found against the known-good database. Alert on .m4 files that differ from any known upstream, and emit cut-and-paste-able git diff commands for human review (maybe the package customized it for good reason... or maybe to hide a trojan). Also warn about new .m4 files (nothing inherently wrong with a package shipping its own, but noteworthy). Generate a database of unknowns, so that package A's new.m4 and package B's nouveau.m4 can be recognized to be the same, suggesting a shared upstream and/or developer.

Running over the source trees of ~19k Gentoo packages containing 50k .m4 files finds about 5k that are unrecognized (new, or modified), with a little under 1k git diff commands to compare mismatches to a candidate upstream file. Note, the candidate selection in case of mismatch is pretty basic now; plan is to implement fuzzy matching, see https://github.com/hlein/distro-backdoor-scanner/issues/18

Comparing decompression output

Tools:

Compare the output of the backdoored xz-utils decompressing a large corpus of .xz files vs the output of an independent implementation, just in case of some fancy injection of malware into the output stream whenever a recognized block of tarred-up code is decompressed. Verrry unlikely to catch something, but easy to look for so why not. So far this has only caught minor bugs in other decompressors (upstream bugs will be filed, but not urgent).