srikanth-lingala / zip4j

A Java library for zip files and streams
Apache License 2.0
2.06k stars 312 forks source link

Errors while reading archives from java.util.zip via ZipInputStream #279

Closed TalgatAkhm closed 3 years ago

TalgatAkhm commented 3 years ago

There is an error while reading zip archive from standard java.util.zip archiver. Zip4j ZipInputStream successfully read only first header in archive, next it can not find any.

Here is a full code example (with zip generating, and reading via zip4j stream) and reasoning below:


    @Test
    public void test() throws IOException {
        // Arrange
        String archivePath = "C:/Users/Public/test.zipByJavaUtilZip";
        File zipArchive = new File(archivePath);

        File folderToZip = new File("C:/Users/Public/FolderToZip");
        int expectedNumberOfEntries;
        try (java.util.zip.ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(zipArchive))) {
            expectedNumberOfEntries = zipByJavaUtilZip(zos, folderToZip, folderToZip.listFiles());
            zos.flush();
        }

        // Act
        List<String> elements = new ArrayList<>();
        try(InputStream is = new FileInputStream(zipArchive);
                net.lingala.zip4j.io.inputstream.ZipInputStream zis = new ZipInputStream(is)) {

            LocalFileHeader fh;
            while((fh = zis.getNextEntry()) != null)
                elements.add(fh.getFileName());
        }

        // Act-assert
        // Lets prove that there are five elements in archive
        net.lingala.zip4j.ZipFile zf = new net.lingala.zip4j.ZipFile(zipArchive);
        Assertions.assertEquals(expectedNumberOfEntries, zf.getFileHeaders().size());
        Assertions.assertEquals(5, zf.getFileHeaders().size());

        // Assert
        // Check for mistakes in zip4j streams
        Assertions.assertEquals(expectedNumberOfEntries, elements.size()); // (elements.size() == 1) = true
    }

    private static int zipByJavaUtilZip(ZipOutputStream zos, File rootFolder, File[] files) throws IOException {
        int entriesNumber = 0;
        for (File f : files) {
            if (f.isDirectory())
                entriesNumber += zipByJavaUtilZip(zos, rootFolder, f.listFiles());
            else {
                entriesNumber++;
                String path = rootFolder.toPath().relativize(f.toPath()).toString().replaceAll("\\\\", "/");
                ZipEntry entry = new ZipEntry(path);
                zos.putNextEntry(entry);
                try (InputStream fis = new FileInputStream(f)) {
                    IOUtils.copy(fis, zos);
                }
                zos.closeEntry();
            }
        }
        return entriesNumber;
    }

However, comparing archives from zip4j and java.util.zip in hex viewer, I found, that there are not many differences between them. One of them is the general purpose flag, but the 3rd bit of it have been set in both archives. That means, that in LFH compression size is 0, and the actual size is written in the data descriptor (they are identical). So it is normal to dont know the actual size while reading LFH. But the problem is, then I call zis.getNextEntry() second time the result will be null, because in zis.getNextEntry() there is call of readUntilEndOfEntry(), in it I can see this code:

    if (localFileHeader.isDirectory() || localFileHeader.getCompressedSize() == 0) {
      return;
    }
.....

In this condition there is a leaving from function, because localFileHeader.getCompressedSize() is equal to zero. But after creating archive with the same content via zip4j ZipFile or ZipOutputStream, the test above will pass. So I found this as a strange behaviour.

Moreover, if I fix it in my code with the problems will appear in reading directories-lfh:

if (!lfh.isDirectory())
   zis.readAllBytes();

I don’t know exactly why this is happening and how to fix it (maybe it doesn't need to be fixed at all). The test archive in attach. Thank you! test.zip

TalgatAkhm commented 3 years ago

A quick fix is reading content of any element (no matter file or directory) after each call ZipInputStream.getNextEntry(), even if no one is going to use the data, like:

zis.getNextEntry();
zis.readAllBytes();

The reason for it: 3d bit of general purpose flag is set to 1, so compressedSize in LFH is set to zero, and after calling getNextEntry() second time readUntilEndOfEntry() doesn't do anything, because of compressedSize = 0. That's why zip4j try to read second LFH from first element content. So quick fix in library code is:

private void readUntilEndOfEntry() throws IOException {
    if (localFileHeader.isDirectory() || (localFileHeader.getCompressedSize() == 0 && !localFileHeader.isDataDescriptorExists())) {
      return;
    }
....

But, I think this only solve consequence but not cause

srikanth-lingala commented 3 years ago

Thanks for the detailed explanation and your analysis. Appreciate it. I have fixed the issue and will include the fix in the next release.

srikanth-lingala commented 3 years ago

Issue fixed in v2.7.0 which was released today.