redballoonsecurity / ofrak

OFRAK: unpack, modify, and repack binaries.
https://ofrak.com
Other
1.86k stars 127 forks source link

Improve CPIO (and other archive) packer/unpacker pairs using `libarchive` #397

Open rbs-jacob opened 9 months ago

rbs-jacob commented 9 months ago

What is the use case for the feature?

In short: replacing some archive packer/unpacker pairs that use command line tools with ones that use libarchive will make those packers more robust.

Currently, some archive packer/unpackers (such as CPIO) are lossy in that the repacked binary is different from the original in critical ways. For example, CPIOs may contain files that must be unpacked to absolute paths. If OFRAK unpacks to absolute paths, it is a security risk – unpacking CPIOs could overwrite critical files (or at least cause permissions errors that make unpacking fail). On the other hand, if OFRAK doesn't unpack to absolute paths, the repacked version will not be repacked with absolute paths, which may cause the repacked file to behave differently when used.

The general solution is to unpack in-memory, instead of to the local filesystem.

Previously, we've tried to address this by using in-memory Python libraries for parsing archive formats. But often, the libraries fail in critical ways that the more robust, battle-tested command-line tools do not. The solution is to use a library for unpacking that is as battle-tested as the command-line tools we're currently relying on. libarchive is such a library, as it is used for bsdtar and bsdcpio that do TAR and CPIO packing/unpacking on macOS and BSD.

How would you implement this feature?

Using the Python ctypes bindings for libarchive.

alchzh commented 2 months ago

I think it makes sense for us to switch to using libarchive for all formats that it supports, both because you can build it on Windows and basically all of the archive formats not just CPIO have problems with some sort of information loss with actual extraction which is avoided with an internal representation.