maennchen / ZipStream-PHP

:floppy_disk: PHP ZIP Streaming Library
https://maennchen.dev/ZipStream-PHP/
MIT License
1.77k stars 105 forks source link

.docx files are recognized as corrupted by Microsoft Word #55

Closed stil closed 7 years ago

stil commented 7 years ago

I tried to simply repackage existing .docx file with ZipStream, but resulting document cannot be opened by Microsoft Word.

.docx files are valid ZIP archives, they just have different file extension.

I'm going to add PR fixing this bug shortly.

maennchen commented 7 years ago

Could you explain briefly why zip files inside a zip files are getting damaged? How does the proposed PR solve that problem?

stil commented 7 years ago

It seems that Microsoft Word checks version needed to extract field for certain value.

Documentation of version needed to extract can be found in this document, at paragraph 4.4.3 https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Previously it was set to (6 << 8) + 3, which is equal to:

Those were ancient values and I'm not sure how did they got there. For example indication about file system can be used to recognize new line character of compressed files, as we know Unix systems have \n and Windows systems have \r\n. Here, it pointed HPFS, which is file system for OS/2 operating system. I don't think it's in use today or it was in use even 10 years ago.

I bumped those values to those used and recognized by Microsoft Word and 7-Zip.