libarchive / libarchive

Multi-format archive and compression library
http://www.libarchive.org
Other
3.05k stars 772 forks source link

7z: failed extraction due to unsupported header/data type #2076

Open mehrabiworkmail opened 8 months ago

mehrabiworkmail commented 8 months ago

When attempting to use libarchive to extract a corpus of malicious SFX files, I found that libarchive fails to extract many SFX files that the 7z tool can extract just fine. I investigated the issue and found the following to be one of the causes.

After reading archive_read_support_format_7zip.c, I realized that libarchive currently does not support the following:

  1. Parsing the kAdditionalStreamsInfo header type (and extracting data from that stream, for that matter)
  2. Parsing kName values from FilesInfo that have external flag set (i.e., the data is external to the header)
  3. Parsing 'kAttributeswhen theexternal` flag is set

Unfortunately, it seems that when libarchive encountes one such data/header, it simply gives up processing the file, even if it is able to extract data from the main stream just fine. This might not tbe desired behaviour, depending on the application (e.g., for processing malicious files it is often desirable to extract as much info as possible, even if it means skipping some unsupported files)

Obviously, the ideal solution is to implement support for these header/data types. However, a workaround is to make libarchive skip such data/header types, but only if the users so chooses by setting specific compiler flags. Below are screenshots of such a solution I implemented.

Firstly, as the kAdditionalStreamsInfo header has basically the same structure as the kMainStreamsInfo header, we can reuse the latter's code to read the former: image

image

Similarly, libarchive already has partial support for skipping external data in the read_Times function in the same file (archive_read_support_format_7zip.c:2753). So, we can simply reuse the same code to skip external values in other places: image

image

Here's the git patch for my solution: 0002-7z-optional-compiler-flag-to-skip-additonalstreamsin.patch

Please let me know if you have any feedback or other suggestions.

kientzle commented 8 months ago

I prefer to avoid compiler flags for cases like this: It makes testing much more complex (do we now need to build and test both versions?) and means that only the few people who are able to compile their own library can benefit.

There is precedent elsewhere in libarchive for skipping unrecognized or malformed data with a warning. I think that would be the better approach here. (Obviously, the best answer is to implement support where possible.)

mehrabiworkmail commented 8 months ago

I have created a pull request for this issue that contains my kAdditionalHeaderInfo skip code (just in case someone is interested). It is closed for now as I intend to implement support for the header instead. Will reopen the pull-request once it is done.