spring-projects / spring-integration-extensions

The Spring Integration Extensions project provides extension components for Spring Integration
http://www.springintegration.org/
280 stars 266 forks source link

IOException while unzipping File #215

Closed idueppe closed 5 years ago

idueppe commented 5 years ago

I got an IOException while unzipping a ZipFile caused by a FileNotFountException. The cause is that zip file can contain zip entries that are of type file but located in a subfolder. For instance, I received a zip file that only has one ZipEntry of "invoices/invoice_1.pdf". The UnZipTransformer will not create the subfolder invoices because the ZipEntry is of type file.

artembilan commented 5 years ago

You know we have a test-case on the matter:

    @Test
    public void unZipWithMultipleEntries() throws Exception {

        final Resource resource = resourceLoader.getResource("classpath:testzipdata/countries.zip");

And that countries.zip has a subfolder inside: image

(Sorry for Windows and Russian 😄 )

Our config in the test has this code:

<int-file:outbound-channel-adapter id="write-file" channel="out" directory-expression="'${workDir}/' + headers.zip_entryPath" auto-create-directory="true"/>

You might use some old Spring Integration version which doesn't create subfolders in the target destination: https://jira.spring.io/browse/INT-3726 ...

idueppe commented 5 years ago

@artembilan this doesn't work. Currently, only zip files are tested with a folder hierarchy. Like: directy_entry: subfolder/ file_entry: textA.txt file_entry: textB.txt

My issue is about a flat file entry structure like:

This is a valid structure of zip files and the UnZipTransformer is breaking on such a zip file structure.

I enhanced the UnZipTransformerTests with a unzipFlatFileEntryZip Test. This test creates a flat file-entry zip and then starts the UnZipTransformerTests on it.

artembilan commented 5 years ago

I don't understand what is that flat file entry structure. Does it really exist in the real world? Is it really possible to zip a complex folder with such an unusual structure?

On the other hand our project is fully based on the https://github.com/zeroturnaround/zt-zip library. Why don't you think your specific use-case has to be supported over there? When they have a fix we just transparently consume it without any changes to our code.

I mean that I don't see a reason in changes on the our side yet because it looks like we try to support something what is just made up.

Sorry for my ignorance, but it isn't clear to me what we are talking about here...

idueppe commented 5 years ago

The flat file entry structure exists in the real world. For instance, the German Telekom sends there invoices via email in a zip file with a flat file structure. In my project test data, I found more than 20 files from different companies, that has a flat file entry structure. Even most java jar files do not contain a directory entry, also. Just try to find a META-INF folder on your class file. It depends on the tool you use to build your jar file.

I don't think that it is an issue of ZT-ZIP because it is based on java.util.zip. Just have a look at java.util.zip.ZipEntry.isDirectory(). It defines that a zip entry with a trailing slash is a directory. But as far as I know, no spec demands to create such directory entries in a zip file.

Have a look on country.zipfrom the test data with unzip -vl countries.zip

Archive:  countries.zip
Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
       6  Defl:N        8 -33% 06-21-2013 16:44 2f6a430c  pl.txt
       6  Defl:N        8 -33% 06-21-2013 16:43 590a2082  fr.txt
       7  Defl:N        9 -29% 06-21-2013 16:43 9515b8b7  de.txt
       0  Defl:N        2   0% 06-25-2013 00:11 00000000  continents/
       6  Defl:N        8 -33% 06-25-2013 00:12 4f3f5de8  continents/europe.txt
       4  Defl:N        6 -50% 06-25-2013 00:12 dad1b76d  continents/asia.txt
--------          -------  ---                            -------
      29               41 -41%                            6 files

You can see, that continents/ is an additional zip entry that just consumes some space in the zip file. The zip file would also be valid if the continents/ is missing. And you can see that europe.txt and asia.txt also contain the subfolder name in their names. But the implementation of UnZipTransformer in FILE Mode depends on the existence of the additional directory entry.

In my test, I create the following valid zip file:

Archive:  flatfileentry.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
      25  Defl:N       27  -8% 06-17-2019 21:57 d885435d  subfolder/single.txt
--------          -------  ---                            -------
      25               27  -8%                            1 file

So, In my opinion, ZT-ZIP should reflect the actual structure of the underlying zip file and shouldn't add some virtual ZipEntries for missing directories.

But the UnZipTransformer shouldn't rely on the existence of additional directory entries.

Regards Ingo

artembilan commented 5 years ago

OK! Now it's getting clearer for me. Thank you for such a thorough explanation! Let's just into your OR for review now! 😄