libarchive / libarchive

Multi-format archive and compression library
http://www.libarchive.org
Other
3.04k stars 769 forks source link

Libarchive detects as tar a file with all zeroes #1778

Open CatyGreen opened 2 years ago

CatyGreen commented 2 years ago

Basic Information Version of libarchive: All of them How you obtained it: (build from source, pre-packaged binary, etc): build from source Operating system and version: Linux Centos 8 What compiler and/or IDE you are using (include version): GCC

Description of the problem you are seeing: What did you do? Run libarchive with a file with at least 512 bytes of zeroes at the begining. What did you expect to happen? Function archive_read_open_memory should not return ARCHIVE_OK What actually happened? Function archive_read_open_memory returns ARCHIVE_OK What log files or error messages were produced? None

How the libarchive developers can reproduce your problem: What other software was involved? What other files were involved? Test file with all zeroes How can we obtain any of the above? Run dd if=/dev/zero of=sample_file bs=1 count=512

Debugging the code I reach a function called archive_read_format_tar_bid, inside the function there is the following check:

/* If it's an end-of-archive mark, we can handle it. */
    if (h[0] == 0 && archive_block_is_null(h)) {
        /*
         * Usually, I bid the number of bits verified, but
         * in this case, 4096 seems excessive so I picked 10 as
         * an arbitrary but reasonable-seeming value.
         */
        return (10);
    }

The content of archive_block_is_null:

/*
 * Return true if this block contains only nulls.
 */
static int
archive_block_is_null(const char *p)
{
    unsigned i;

    for (i = 0; i < 512; i++)
        if (*p++)
            return (0);
    return (1);
}

Why does libarchive make this test?

Thanks.

kientzle commented 2 years ago

A block of 512 zero bytes is the standard end-of-archive marker in a tar file. So a file that starts with 512 (or more) zero bytes is a valid empty tar file.

CatyGreen commented 2 years ago

Ok, yes. But if the file has more content after the zeroes, libarchive also considers it as a valid tar file.

emaste commented 2 years ago

It is a valid tar file with extra junk following it.

I don't see a reference to this case/behaviour in the documentation; I think it is something that ought to be added.

kientzle commented 2 years ago

I can see why some people would be surprised by the handling of a file that starts with 512 zeros.

It might make sense to split our tar handling into two "formats":

We could then include both in archive_read_support_format_all() to preserve the existing behavior, but folks who specifically wanted to not recognize the empty tar case would have the option of only enabling the other format handler.

This is a small change if someone wanted to give it a shot: just split the existing archive_read_format_tar_bid to create a new bidder archive_read_format_tar_empty_bid and use that to construct a new archive_read_format_tar_empty.