rikyoz / bit7z

A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
https://rikyoz.github.io/bit7z
Mozilla Public License 2.0
639 stars 116 forks source link

Opening Tar file with Zip format does not fail #235

Open dhananjay-gune opened 2 months ago

dhananjay-gune commented 2 months ago

I am writing a negative test case for a function like containsArchiveEntry().
So I created a .tar file and trying to open it through bit7z::BitFormat::Zip .
I am expecting it to fail, but it partially succeeds šŸ˜® i.e. it is able to iterate through the some items, but not all.

I am trying to look for Sub Folder 2\Sub File 2.2.txt šŸ‘ˆ

D:\MyFolder
|   Root File 2.txt
|   Root File 3.7z
|   RootFile1.txt
|
+---Sub Folder 2
|       Sub File 2.2.txt šŸ‘ˆ 
|       Sub File 2.3.tar
|       SubFile2.1.txt
|
\---SubFolder1
        Sub File 1.2.txt
        SubFile1.1.txt
        SubFile1.3.zip

The BitArchiveReader reports only these entries, but at the root level.
It does not report any other entries.

Sub File 1.2.txt
SubFile1.1.txt

here is my code:

const BitInFormat & archiveFormat = GetInputArchiveFormat();
Bit7zLibrary lib(this->m_7ZipDllPath);
BitArchiveReader reader( lib, archivePath, archiveFormat); // šŸ‘‰ here the archive file is .tar, but the archiveFormat is .Zip
for each (auto entry in reader)
{
    auto itemPath = entry.path();
    auto compareResult = lstrcmpi(itemPath.c_str(), archiveEntry.c_str()); // case insensitive comparison
    bool isEqual =  compareResult == 0;
    if (isEqual)
    {
        return OpResult(true);
    }
}
return OpResult(false);

The tar file shows all the entries: 2024-08-19 11_55_01-

I have embedded the tar file inside a zip file here: Test_ContainsArchiveEntryContainingTarFile.zip

I am expecting it to throw an exception when the formats don't match.
Any clues?

rikyoz commented 2 months ago

Hi!

The BitArchiveReader reports only these entries, but at the root level. It does not report any other entries.

Sub File 1.2.txt
SubFile1.1.txt

In reality, what happens is something else: you're actually reading the content of the only Zip file inside the Tar archive, SubFile1.3.zip. Coincidentally, the content of this Zip archive is the same of the SubFolder1 (without the Zip file itself, of course):

image

But this is only a coincidence.

Basically, since you requested to read a Zip archive, 7-Zip seeks through the Tar archive in search of the start of the Zip archive (i.e., the magic number PK...). Since the Tar archive is not compressed, 7-Zip can find it, and starts reading the Zip archive within the Tar. In other cases where the inner archive is within a compressed archive, the reader should throw an exception, as expected.

I am expecting it to throw an exception when the formats don't match.

The fact that it doesn't throw in this particular case is mainly due to backwards compatibility with previous versions of bit7z. Making it throw would be a breaking change in the library's behavior, so I'm evaluating whether and how to include this change in a future minor/patch version of bit7z.

dhananjay-gune commented 2 months ago

Thanks for this detail analysis.
It would be really nice to have some tweak šŸ™ e.g. via preprocessor macro or a bool in class - you know best. :-)