libarchive / libarchive

Multi-format archive and compression library
http://www.libarchive.org
Other
3.05k stars 771 forks source link

[Example request] Universal archive extractor (to disk)? #1624

Open SamuelMarks opened 2 years ago

SamuelMarks commented 2 years ago

How do I use libarchive to extract any archive it supports, and inflate to a specific directory on disk?

Attempts:

0.

int extract_archive(const char *archive_filepath, const char *output_folder) {
#define BUF_SIZE 16384
    FILE *fp;
    int exit_code;
    int r;
    ssize_t size;
    static char buff[BUF_SIZE];
    struct archive *a;
    struct archive_entry *ae;

    a = archive_read_new();
    archive_read_support_filter_all(a);
    /* archive_read_support_compression_all(a); <-- deprecated symbol used in your example */
    archive_read_support_format_raw(a);
    r = archive_read_open_filename(a, archive_filepath, BUF_SIZE);
    if (r != ARCHIVE_OK) {
        /* ERROR */
    }
    r = archive_read_next_header(a, &ae);
    if (r != ARCHIVE_OK) {
        /* ERROR */
    }

    for (;;) {
        size = archive_read_data(a, buff, BUF_SIZE);
        if (size < 0) {
            /* ERROR */
        }
        if (size == 0)
            break;
        write(1, buff, size);
    }

    archive_read_free(a);
#undef BUF_SIZE
    return EXIT_SUCCESS;
}
  1. Tried modifying contrib/untar.c, but verify_checksum always fails on my archive (.zip). Is there a verify_checksum somewhere in libarchive that I should be using, that verifies differently depending on archive format?

Also in both examples I get a whole lot of binary text dumped to my stdout.

Thanks for any suggestions

kientzle commented 2 years ago

The "Examples" page in the Wiki walks through a series of increasingly-complex uses of libarchive, culminating in a "Complete Extractor" that is a good place to start from: https://github.com/libarchive/libarchive/wiki/Examples

In your first code above, I think you want archive_read_support_format_all instead of format_raw. Remember that libarchive is broken up into "format" modules that handle particular archive types and "filter" modules that deal with various encodings of the resulting archive. Your example code above enables all filters (which are generally used only by tar and cpio archives) but only the "raw" archive format (which is a specialized tool for handling things that are not actually archives at all).

You're also writing the resulting data to stdout via write(1, buff, size) which is why you're seeing binary data on your terminal. The "Complete Extractor" example shows how to use the "archive_write_disk" tools to push the data into a directory hierarchy. In essence, "archive_write_disk" treats a directory on disk as if it were an archive: "creating an entry" becomes "creating a file", etc. So the basic outline for extraction is to copy entries (and their data) from the input archive (the Zip or Tar file you want to read) to another (the directory tree you want to create). In between these, you can modify the "entry" in any way you wish (alter filenames, permissions, etc).

Similarly, examples/untar.c currently only enables archive_read_support_format_tar so it only handles tar format, which is why you're seeing checksum failures trying to extract something that is not a tar archive. You can change that to archive_read_support_format_zip to handle only zip archives or archive_read_support_format_all to enable (almost) all of the formats that libarchive supports.

You might find the examples/minitar example program a more useful starting point, as it is more complete than examples/untar.c

SamuelMarks commented 2 years ago

Thanks @kientzle; I've started incorporating that. Found a few type discrepancies in your wiki, will start to edit sometime today or tomorrow.

dandingol03 commented 2 years ago

The "Examples" page in the Wiki walks through a series of increasingly-complex uses of libarchive, culminating in a "Complete Extractor" that is a good place to start from: https://github.com/libarchive/libarchive/wiki/Examples

In your first code above, I think you want archive_read_support_format_all instead of format_raw. Remember that libarchive is broken up into "format" modules that handle particular archive types and "filter" modules that deal with various encodings of the resulting archive. Your example code above enables all filters (which are generally used only by tar and cpio archives) but only the "raw" archive format (which is a specialized tool for handling things that are not actually archives at all).

You're also writing the resulting data to stdout via write(1, buff, size) which is why you're seeing binary data on your terminal. The "Complete Extractor" example shows how to use the "archive_write_disk" tools to push the data into a directory hierarchy. In essence, "archive_write_disk" treats a directory on disk as if it were an archive: "creating an entry" becomes "creating a file", etc. So the basic outline for extraction is to copy entries (and their data) from the input archive (the Zip or Tar file you want to read) to another (the directory tree you want to create). In between these, you can modify the "entry" in any way you wish (alter filenames, permissions, etc).

Similarly, examples/untar.c currently only enables archive_read_support_format_tar so it only handles tar format, which is why you're seeing checksum failures trying to extract something that is not a tar archive. You can change that to archive_read_support_format_zip to handle only zip archives or archive_read_support_format_all to enable (almost) all of the formats that libarchive supports.

You might find the examples/minitar example program a more useful starting point, as it is more complete than examples/untar.c

Which command in minitar is to extract files into a specific folder?

SamuelMarks commented 2 years ago

@dandingol03 In my aforementioned project there's:

int extract_archive(enum Archive archive, const char *archive_filepath, const char *output_folder);