owncloud / core

:cloud: ownCloud web server core (Files, DAV, etc.)
https://owncloud.com
GNU Affero General Public License v3.0
8.35k stars 2.06k forks source link

Problems with Zip Files after upgrade to 7.0.0 #10001

Closed maddox-DX closed 8 years ago

maddox-DX commented 10 years ago

Steps to reproduce

  1. Login und go to files
  2. select a couple of files download the generated zip file
  3. try to decompress with Total Commander

    Expected behaviour

Files should be extracted

Actual behaviour

Total Commander gives CRC warnings

Server configuration

Operating system: debian 7

Web server: apache 2.2 Database: postgres 9.1 PHP version: 5.4 ownCloud version: (see ownCloud admin page) 7.0.0 Updated from an older ownCloud or fresh install: From 6.0.4

I got an hint from the Total Commander forum: http://ghisler.ch/board/viewtopic.php?t=40766 (in german)

The linked forum post says that the generated zip has two issues:

McNetic commented 9 years ago

Update: There is now a deflate branch in PHPZipStreamer (https://github.com/McNetic/PHPZipStreamer/tree/deflate) that implements deflate compression when pecl_http (should now also work with v >= 2.0) is available. It could now be tested if MacOS is happy with Zip files created this way. I'm still investigating how to make it work without pecl_http.

DeepDiver1975 commented 9 years ago

@LukasReschke @georgehrke my dear mac users - can you please test this branch? THX

McNetic commented 9 years ago

Another update: Looking at the deflate rfc, I noticed deflate can fall back to uncompressed blocks, which is totally simple to implement and imposes nearly no overhead compared to not using deflate at all, so I implemented an option to select deflate compression (COMPR::DEFLATE), but set compression level to none (COMPR::NONE). This should produce valid zip files (zip -t does not complain) with compression formally enabled, but without the need for pecl_http. Said branch now contains this addition.

DeepDiver1975 commented 9 years ago

awesome @McNetic THX

LukasReschke commented 9 years ago

Will test later.

LukasReschke commented 9 years ago

As I'm a big fan of digging in binary formats (not) I had the past few hours a look at the implementation and the ZIP specification (and realised that with > 10.8 even the store method might work and used compress => COMPR::STORE and level => COMPR::NORMAL, though maybe not for directories) but there seem to be some problems left:

screen shot 2015-02-13 at 19 39 58

Files are available at: https://s3.owncloud.com/owncloud/public.php?service=files&t=c8d0394f050872efc8f32bd15d0a4d84

What is interesting is that http://pablotron.org/software/zipstream-php/ works just fine (after changing the version to 0x000A because of some Finder madness). Can we maybe take a look at their implementation and see what makes the difference? (they seem to use gzdeflate and no ZIP64 but that is irrelevant for now)

@McNetic Do you happen to see on a first glance what coul dmake the difference there?

LukasReschke commented 9 years ago

I also compared some 7ZIP and ZipStreamer file using STORE, the output looks pretty much the same except the CRC (excerpt - more was not really relevant):

ZipStreamer Working directory ZIP created with 7zip
Local file header signature 50 4B 03 04 50 4B 03 04
Version needed to extract 14 00 14 00
General purpose bit flag 00 00 00 00
Compression method 00 00 00 00
File last modification time BA 74 BA 74
File last modification date 4D 46 4D 46
CRC-32 00 00 00 00 00 00 00 00
Compressed size 00 00 00 00 00 00 00 00
Uncompressed size 00 00 00 00 00 00 00 00
File name length (n) 05 00 05 00
Extra field length (m) 00 00 00 00
File name 74 65 73 74 2F 74 65 73 74 2F
Local file header signature 50 4B 03 04 50 4B 03 04
Version needed to extract (minimum) 0A 00 0A 00
General purpose bit flag 00 00 00 00
Compression method 00 00 00 00
File last modification time 9C 5B 9C 5B
File last modification date 4D 46 4D 46
CRC-32 00 00 00 00 5E AC 80 ED
Compressed size 00 00 00 00 A3 00 00 00
Uncompressed size 00 00 00 00 A3 00 00 00
File name length (n) 10 00 10 00
Extra field length (m) (maybe also filename - not sure) 00 00 00 00
File name 74 65 73 74 2F 77 65 6C 63 6F 6D 65 2E 74 78 74 74 65 73 74 2F 77 65 6C 63 6F 6D 65 2E 74 78 74
Data 57 65 6C 63 6F 6D 65 20 74 6F 20 79 6F 75 72 20 6F 77 6E 43 6C 6F 75 64 20 61 63 63 6F 75 6E 74 21 0A 0A 54 68 69 73 20 69 73 20 6A 75 73 74 20 61 6E 20 65 78 61 6D 70 6C 65 20 66 69 6C 65 20 66 6F 72 20 64 65 76 65 6C 6F 70 65 72 73 20 61 6E 64 20 67 69 74 20 75 73 65 72 73 2E 20 0A 54 68 65 20 70 61 63 6B 61 67 65 64 20 61 6E 64 20 72 65 6C 65 61 73 65 64 20 76 65 72 73 69 6F 6E 73 20 77 69 6C 6C 20 63 6F 6D 65 20 77 69 74 68 20 62 65 74 74 65 72 20 65 78 61 6D 70 6C 65 73 2E 0A 0A 57 65 6C 63 6F 6D 65 20 74 6F 20 79 6F 75 72 20 6F 77 6E 43 6C 6F 75 64 20 61 63 63 6F 75 6E 74 21 0A 0A 54 68 69 73 20 69 73 20 6A 75 73 74 20 61 6E 20 65 78 61 6D 70 6C 65 20 66 69 6C 65 20 66 6F 72 20 64 65 76 65 6C 6F 70 65 72 73 20 61 6E 64 20 67 69 74 20 75 73 65 72 73 2E 20 0A 54 68 65 20 70 61 63 6B 61 67 65 64 20 61 6E 64 20 72 65 6C 65 61 73 65 64 20 76 65 72 73 69 6F 6E 73 20 77 69 6C 6C 20 63 6F 6D 65 20 77 69 74 68 20 62 65 74 74 65 72 20 65 78 61 6D 70 6C 65 73 2E 0A 0A
??? 5E AC 80 ED A3 00 00 00 A3 00 00 00
Central directory file header signature 50 4B 01 02 50 4B 01 02
Version made by 3F 00
Version needed to extract (minimum) 14 00 14 00
General purpose bit flag 00 00 00 00
Compression method 00 00 00 00
File last modification time BA 74 BA 74
File last modification date 4D 46 4D 46
CRC-32 00 00 00 00 00 00 00 00
Compressed size 00 00 00 00 00 00 00 00
Uncompressed size 00 00 00 00 00 00 00 00
File name length (n) 05 00 05 00
Extra field length (m) 00 00 24 00
File comment length (k) 00 00 00 00
Disk number where file starts 00 00 00 00
Internal file attributes 00 00 00 00
McNetic commented 9 years ago

First of all, thank you for your effort.

So, what I'm not sure you did, but I'd like to know: do zip files work with one top-level directory everything is in, and compress == COMPR::DEFLATE and variying settings of level?

LukasReschke commented 9 years ago

First of all, thank you for your effort.

Nah, nothing to thank. You are the one doing all the hard work here. I'm just doing some monkey testing :smiley:

  • For compress == COMPR::STORE, level has no meaning. And we only had COMPR::STORE up until now, so I thought the idea was to use COMPR::DEFLATE (with level = COMPR::NONE if you don't have pecl_http available) to see if MacOS would like that.

Yep. Unfortunately it doesn't always like that it seems. Apple is doing great things…

  • I'm not sure I understand the part about files not in subdirectories. If I remember correctly, owncloud will always put all files in the zip into one top-level directory? Or did you test manually, without owncloud? The problem about the ADD flag is, that I have to provide the CRC and compressed size in the file header, so I would have to make two passes over the files, or hold them in memory. I also wonder that unsetting the ADD flag works at all. If I do this, unzip -t complains about "invalid compressed data to inflate". Maybe I misunderstood what you did?

There is a way to get files not in a top-level directory. To do that create two text files in the top directory and select them both with the checkbox. In the upper right of the table there will be a "Download" button then. In that case without the ADD flag it worked fine. However, that seems more like Finder accepting some probably not correct values but failing with correct ones :see_no_evil:

  • I had a look at that pablotron implementation (not sure I found this one when looking for a solution to the problem about two years ago), and for a moment, I wondered about my sanity when seeing the use of gzdeflate(). I'm still puzzled how they could not have noticed the problem with gzdeflate(), as they even have a special handling of "large" files. I suspect you only tried small files with their implementation. The problem arises, when the file is split in chunks to process, so the file must be larger than the chunk size (~1MB for ZipStreamer, configurable but default 20MB for pablotrons ZipStream) to trigger the problem. Every output of gzdeflate() contains a "final" block defined in the deflate specification, which makes concatenating the output impossible. unzip -t complains about bad CRC for files generated in that way (I checked with pablotrons ZipStream to confirm still it really does ;-)). Aside from that, they in fact do two passes over all files (and hold "small" files in memory), so they can add the CRC/compressed sizes to the header.

:insert_sighing_meme_about_all_this_madness_caused_by_apple_doing_some_things_terribly_wrong_since_ages:

So, what I'm not sure you did, but I'd like to know: do zip files work with one top-level directory everything is in, and compress == COMPR::DEFLATE and variying settings of level?

Let me post some test combinations here using COMPR::DEFLATE and ZIP64 support disabled:

The confusing thing is the error thrown in the console Error -60005 creating authorization which on OS X usually indicates that a process has not the correct permissions to create something which is why I first suspected that this has to do with some path values, however when looking at the working and the not working ZIP (as compared above) this makes no sense at all… :disappointed:

LukasReschke commented 9 years ago

@McNetic Btw. I just got access from @mmayer to a OS X VM (accessible via VNC etc…) which we can use for unit testing… I guess he won't have anything against if I grant you access to it so that you could do some live testing and ZIP support might work again?

If you're interested you know my mail address (or can look it up on my profile) ;-)

iliyasbasir commented 9 years ago

One of our enterprise customer having this issue, can you advise me do you have fix/Patch for this issue ? If “yes”...Can we have our internal QA test fix; we can provide Mac access if he/she needed.

Here is detail from case

“We are having a new issue since migrating to OC EE 7.0.5 - Specifically on Mac OS clients, people are reporting when multiple files or directories are downloaded from ownCloud, it results in a zip file that can not be opened on Mac OS system.

I'm wondering if there is a workaround we can implement server side until we can get to OC EE 8.2, and/or what your official recommendation is?

We have a lot of users which have Macs, and multi-file downloads are effectively broken for them right now.”

Thank you for help

LukasReschke commented 9 years ago

Ilyias, we do not have a patch at the moment that works on OS X reliably. For the moment users can use another unzipper such as Keka or StuffIt.

LukasReschke commented 9 years ago

(as also stated in the discussion above)

MorrisJobke commented 9 years ago

@iliyasbasir There is nothing more I can add.

McNetic commented 9 years ago

Ok, I found some time to have a look at the OSX problem with the VM provided by @mmayer, myself. I also found other sources complaining about the OSX incompatibility with some zip features. So my conclusion is that the OSX finder cannot handle zip general purpose flag bit 3 set (aka streaming zip files, part of official zip standard since 1993). I can not handle this problem in ZipStreamer itself, as it does also work on networked streams, which don't support fseek(). Only potential upside: maybe Finder does in fact support zip64 - I haven't tried yet.

Now, to solve the problem for owncloud, I could add an option to provide file size and crc32 to ZipStreamer from owncloud directly. There are still some questions how this whole thing should be designed:

McNetic commented 9 years ago

I just noticed: Technically, this issue was about a small bug that was fixed in Zipstreamer months ago. The OSX issues are not even mentioned in the issue report. There is also at least one separate issue about the OSX problems (#13669). I'm not quite sure what to do about that. One could implement the current Zipstreamer version (I'd do a release if necessary, it should not be a step backward related to the OSX issues), and close this issue, and follow up in #13669.

LukasReschke commented 9 years ago

Please apologize, @McNetic, that I get back to you that late. This simply slipped through on my huge backlog :-/

Now, to solve the problem for owncloud, I could add an option to provide file size and crc32 to ZipStreamer from owncloud directly. There are still some questions how this whole thing should be designed: should owncloud try to detect osx and fall back to the proposed solution while retaining the current behaviour otherwise?

Personally, I'd like to avoid a fallback as much as possible. This leads to confusing behaviour as soon as people start mailing / sharing the generated ZIP files. Unlikely, but can happen. – Yet, I assume that this would require quite more resources and we should do this though. Requires some measuring on my end.

could owncloud calculate and cache the crc32 (and size) for each file on upload (when missing, it wil still be needd to calculate on first time it is downloaded via zip file)?

I think we should avoid this. Files can also be modified with other means than through ownCloud (i.e. a mounted Windows Network Drive) and going this route is probably opening a can of worms as we need to ensure that all backends behave properly. :boom:

(and there are actually some cases of custom backends :-()

for crc32 calculation on download: is (and will be) the stream provided by Filesystem class always seek()able?

I'm not completely sure about that. @icewind1991 @DeepDiver1975 can one of you folks please comment on this? THX. – Also input on the other questions is appreciated.

I just noticed: Technically, this issue was about a small bug that was fixed in Zipstreamer months ago. The OSX issues are not even mentioned in the issue report. There is also at least one separate issue about the OSX problems (#13669). I'm not quite sure what to do about that. One could implement the current Zipstreamer version (I'd do a release if necessary, it should not be a step backward related to the OSX issues), and close this issue, and follow up in #13669.

We can do that if that would help. – Tagging a new release would also be appreciated. (we don't ship master when we can avoid it)

cdamken commented 9 years ago

After upgrade to 7.0.5 does not work either.

@MorrisJobke

00002808

cdamken commented 9 years ago

@maddox-DX @LukasReschke Any news?

LukasReschke commented 9 years ago

According to the issue tracker: 2015-06-25_10-44-20

I have asked questions in https://github.com/owncloud/core/issues/10001#issuecomment-102022858 which are unanswered and unless those are answered we can't proceed.

DeepDiver1975 commented 9 years ago

@cdamken this issue is scheduled for 8.2 - there can be no progress ...

bildschirmfoto von 2015-06-25 12-08-49

cdamken commented 9 years ago

@DeepDiver1975 Ok, thanks

@MorrisJobke 00002808 00003426 00003439

MTRichards commented 9 years ago

After all this, can we fix this for 8.2? Do we know what we want to do?

MTRichards commented 9 years ago

@cmonteroluque while this is tagged, it is in?

DeepDiver1975 commented 9 years ago

After all this, can we fix this for 8.2? Do we know what we want to do?

we rely on an upstream project which is fighting this issue for a long time. The problem at the end is: as soon as we fix it for one platform - it will fall a part on another. Long story short: zip seems to be a nightmare with respect to it's specification.

If I remember correctly this issue was meanwhile reported to Apple as well - so maybe they fix it? :speak_no_evil:

LukasReschke commented 9 years ago

Well @McNetic has asked for questions which still are unanswered at https://github.com/owncloud/core/issues/10001#issuecomment-94480285 :speak_no_evil:

If I remember correctly this issue was meanwhile reported to Apple as well - so maybe they fix it? :speak_no_evil:

Yeah. Maybe in unicorn land :speak_no_evil:

bboule commented 9 years ago

While it does not solve all of the problems can we at least address the one use case (multiple file downloads) with a small change in behavior i.e. Serial downloads on multi file select rather than zip them up? There are still a number of other impacted use cases (like downloading a folder) but perhaps we could provide some relief on one front?

PVince81 commented 9 years ago

@bboule serial download is not possible because browsers only support downloading one file at a time. This is the reason why zip is used. Or are you aware of any new tech that could allow that ?

DeepDiver1975 commented 9 years ago

Well @McNetic has asked for questions which still are unanswered at #10001 (comment) :speak_no_evil:

I see no reliable way to store the crc checksum for every file - this is too much of a change. Hmmm - this is really a pain ....

scolebrook commented 9 years ago

@bboule To my knowledge only FF supports multipart HTTP bodies allowing multiple downloads in one request. Perhaps there's a way for javascript to be used so the browser sends multiple separate requests. Could be security implications though with accidental (or deliberate) DOS.

So perhaps, offering both methods (make zip file and send it or stream) and allowing admins to choose via config.php which method is used under which circumstances (for selections of files, folders, selections over a certain size, etc) would be best. When a user base is Mac heavy, use the old way and just deal with needing the extra space to store the zips while they are downloaded. When there are few to no Mac users, take advantage of streaming.

If hell ever freezes over and Apple decides to invest 10 minutes of engineering effort into something that doesn't have flashy animations then we can leave the old method forever. But an end user doesn't care that it's Apple's fault. To them, it's broken and they want it fixed. Installing an alternate unarchive tool is just a band aid solution to them. Giving admins more power to handle the unique nature of each installation is better than simply waiting on Apple.

DeepDiver1975 commented 9 years ago

we might want to have a look at : https://github.com/barracudanetworks/ArchiveStream-php

PVince81 commented 9 years ago

@DeepDiver1975 "Only the Zip64 (version 4.5 of the Zip specification) format is supported.". Wasn't that the actual problem with Apple ?

DeepDiver1975 commented 9 years ago

Wasn't that the actual problem with Apple ?

yes - but there will be tar for mac then

bboule commented 9 years ago

@PVince81 Yes indeed but I was hoping we could control it via javascript magic (sounds like my hopes are dashed ;) )

PVince81 commented 9 years ago

@bboule I read somewhere that you can trigger downloads by appending hidden iframes. Problem is that some browsers will pop up the "save as" dialog. Imagine this if the user selected 100 files...

bboule commented 9 years ago

I see your point not ideal :)

gig13 commented 9 years ago

@MTRichards @bboule I would like to request a backport to OC7 if possible.

MorrisJobke commented 9 years ago

I would like to request a backport to OC7 if possible.

We don't even have a patch for master. Even an idea how to fix this properly is very hard :(

guruz commented 9 years ago

Why not generate different kind of zip files based on the client browser OS? The usual use case is probably downloading a directory and this would enable that.

MTRichards commented 9 years ago

Why not generate different kind of zip files based on the client browser OS? The usual use case is probably downloading a directory and this would enable that.

Can we do that? This would be ideal, and allow for future unknown OS support problems with certain file types.

DeepDiver1975 commented 9 years ago

Can we do that? This would be ideal, and allow for future unknown OS support problems with certain file types.

afaik not because MacOS does not support streamed zip archives - but we require zip streaming to have a low memory foodprint

butonic commented 9 years ago

I tried using the deflate branch and toyed with constructor options

            $zip = new ZipStreamer([
                'zip64' => false,
                'compress'=>\ZipStreamer\COMPR::DEFLATE,
                'level'=>\ZipStreamer\COMPR::NORMAL
            ]);

            $zip = new ZipStreamer([
                'zip64' => false,
                'compress'=>\ZipStreamer\COMPR::NONE
            ]);

            $zip = new ZipStreamer([
                'zip64' => false,
                'compress'=>\ZipStreamer\COMPR::DEFLATE,
                'level'=>\ZipStreamer\COMPR::NONE
            ]);

In all cases downloadad zips fail to open with the Archive Utility but allow me to use unzip (5.52 of 28 February 2005):

$ unzip download-11.zip
Archive:  download-11.zip
  inflating: Neue Textdatei.txt      
  inflating: üöäß⊂∫∀∃ℕℝ∂.txt  
  inflating: welcome.txt
$ zipinfo download-11.zip
Archive:  download-11.zip   1350 bytes   3 files
-rw-r--r--  4.5 unx     1487 bl defN  4-Aug-15 11:30 Neue Textdatei.txt
-rw-r--r--  4.5 unx        0 bl defN  4-Aug-15 11:30 üöäß⊂∫∀∃ℕℝ∂.txt
-rw-r--r--  4.5 unx      163 bl defN  4-Aug-15 11:30 welcome.txt
3 files, 1650 bytes uncompressed, 940 bytes compressed:  43.0%
$ zipinfo download-12.zip
Archive:  download-12.zip   2231 bytes   3 files
-rw-r--r--  4.5 unx     1658 bl stor  4-Aug-15 11:38 Neue Textdatei.txt
-rw-r--r--  4.5 unx        0 bl stor  4-Aug-15 11:38 üöäß⊂∫∀∃ℕℝ∂.txt
-rw-r--r--  4.5 unx      163 bl stor  4-Aug-15 11:38 welcome.txt
3 files, 1821 bytes uncompressed, 1821 bytes compressed:  0.0%
$ zipinfo download-13.zip
Archive:  download-13.zip   2256 bytes   3 files
-rw-r--r--  4.5 unx     1658 bl defN  4-Aug-15 11:41 Neue Textdatei.txt
-rw-r--r--  4.5 unx        0 bl defN  4-Aug-15 11:41 üöäß⊂∫∀∃ℕℝ∂.txt
-rw-r--r--  4.5 unx      163 bl defN  4-Aug-15 11:41 welcome.txt
3 files, 1821 bytes uncompressed, 1846 bytes compressed:  -1.4%

I was using the "Neue Textdatei.txt" for copy paste exchange ... thats why the size changed.

Setting 'zip64' => false makes the zip file use 0x0a as the min version required, and I fail to understand what the remaining problem with the Archiver Utility is ... so ... I don't know ... progress?

McNetic commented 9 years ago

The native OSX Tools (unzip + archive utility) both don't support the zip64 extension, which is part of zip standard since at least 2006. As a bonus, the archive tool also does not support streamed zip files, which is part of zip standard since 1993. Zipstreamer does always use streamed zip - as the name suggests). I think the best way is to detect OSX clients, and disable large zip downloads (with a manual override stating that third party tools are required to unpack).

butonic commented 9 years ago

@DeepDiver1975 I guess detecting OSX and then disabling 64bit support in zipstreamer is the first step. If we can detect if the selected files require using 64bit we can threw en exception. That should solve this for the majority of cases.

LukasReschke commented 9 years ago

I guess detecting OSX and then disabling 64bit support in zipstreamer is the first step. If we can detect if the selected files require using 64bit we can threw en exception. That should solve this for the majority of cases.

It's not that easy. It's also about streamed ZIPs which are unsupported as well. We do need a completely different solution here thus.

butonic commented 9 years ago

OK, can we agree on OSX being broken beyond sanity? Good. Now, the only thing we can do for end users is providing a non zip64 download when the file size allows. That would allow them to use unzip. That's it. No more (archive utility still does not work), no less (makes downloading collections minimally usable).

That being said, IIRC someone mentioned using tar(.gz) for OSX? That approach would still require detecting osx ...

LukasReschke commented 9 years ago

.tar stuff: https://github.com/owncloud/core/issues/17930

That said: Yes, sounds feasible. - Maybe also a notification on the download page or so about this.

MTRichards commented 9 years ago

Excellent! Do we have a plan? Sounds like it...

karlitschek commented 9 years ago

Not sure what the plan is. Switching to tar makes things not easier. There are a ton of different incompatible tar dialects as I have discovered with our software download files. We use zip files now because it has better compatibility across systems.

DeepDiver1975 commented 9 years ago

Not sure what the plan is. Switching to tar makes things not easier. There are a ton of different incompatible tar dialects as I have discovered with our software download files.

We would need to test the created tar on Mac and see if it works on this platform.

Tar is no understood by Windows - but we dont care since tar will be a OSX only solution