Open rbs-jacob opened 11 months ago
I think it makes sense for us to switch to using libarchive
for all formats that it supports, both because you can build it on Windows and basically all of the archive formats not just CPIO have problems with some sort of information loss with actual extraction which is avoided with an internal representation.
What is the use case for the feature?
In short: replacing some archive packer/unpacker pairs that use command line tools with ones that use
libarchive
will make those packers more robust.Currently, some archive packer/unpackers (such as CPIO) are lossy in that the repacked binary is different from the original in critical ways. For example, CPIOs may contain files that must be unpacked to absolute paths. If OFRAK unpacks to absolute paths, it is a security risk – unpacking CPIOs could overwrite critical files (or at least cause permissions errors that make unpacking fail). On the other hand, if OFRAK doesn't unpack to absolute paths, the repacked version will not be repacked with absolute paths, which may cause the repacked file to behave differently when used.
The general solution is to unpack in-memory, instead of to the local filesystem.
Previously, we've tried to address this by using in-memory Python libraries for parsing archive formats. But often, the libraries fail in critical ways that the more robust, battle-tested command-line tools do not. The solution is to use a library for unpacking that is as battle-tested as the command-line tools we're currently relying on.
libarchive
is such a library, as it is used forbsdtar
andbsdcpio
that do TAR and CPIO packing/unpacking on macOS and BSD.How would you implement this feature?
Using the Python
ctypes
bindings forlibarchive
.