maennchen / ZipStream-PHP

:floppy_disk: PHP ZIP Streaming Library
https://maennchen.dev/ZipStream-PHP/
MIT License
1.74k stars 104 forks source link

Cascading subjects coming from ZIP_VERSION_MADE_BY #299

Closed arnapou closed 7 months ago

arnapou commented 7 months ago

Hello,

I developed my own code about this subject of streaming zip when I discovered there was already this lib. I didn't want to rewrite all my code, but I played with your lib to compare things.

In all humility, I may have discovered several possible issues which all started about the ZIP_VERSION_MADE_BY constant. I read the #84 but I still think it could be nice to bring to you what I discovered.

Zip test

Write a file

<?php

require __DIR__ . '/vendor/autoload.php';

$filename = '/tmp/php/hello.zip';

$zip = new ZipStream\ZipStream(
    outputName: $filename,
    outputStream: fopen($filename, 'wb'),
);
$zip->addFile(
    fileName: 'hello.txt',
    data: 'This is the contents of hello.txt',
);
$zip->finish();

zipinfo on it

$ zipinfo -v /tmp/php/hello.zip
Archive:  /tmp/php/hello.zip
There is no zipfile comment.

End-of-central-directory record:
-------------------------------

  Zip archive file size:                       197 (00000000000000C5h)
  Actual end-cent-dir record offset:           175 (00000000000000AFh)
  Expected end-cent-dir record offset:         175 (00000000000000AFh)
  (based on the length of the central directory and its expected offset)

  This zipfile constitutes the sole disk of a single-part archive; its
  central directory contains 1 entry.
  The central directory is 55 (0000000000000037h) bytes long,
  and its (expected) offset in bytes from the beginning of the zipfile
  is 120 (0000000000000078h).

Central directory entry #1:
---------------------------

  hello.txt

  offset of local header from start of archive:   0
                                                  (0000000000000000h) bytes
  file system or operating system of origin:      OS/2 or NT HPFS
  version of encoding software:                   0.3
  minimum file system compatibility required:     MS-DOS, OS/2 or NT FAT
  minimum software version required to extract:   4.5
  compression method:                             deflated
  compression sub-type (deflation):               normal
  file security status:                           not encrypted
  extended local header:                          yes
  file last modified on (DOS date/time):          2024 Feb 1 14:21:46
  32-bit CRC value (hex):                         eaca2cc2
  compressed size:                                33 bytes
  uncompressed size:                              33 bytes
  length of filename:                             9 characters
  length of extra field:                          0 bytes
  length of file comment:                         0 characters
  disk number on which file begins:               disk 1
  apparent file type:                             binary
  non-MSDOS external file attributes:             000000 hex
  MS-DOS file attributes (20 hex):                arc 

  There is no file comment.

[!important] Extract:

file system or operating system of origin:      OS/2 or NT HPFS
version of encoding software:                   0.3

The ZipStream constant is public const ZIP_VERSION_MADE_BY = 0x603;

Bytes:

  1. 06 means OS/2 or NT HPFS
  2. 03 means 0.3

Issue 1: very old documentation

ZipStream

You use in the phpdoc this link https://www.iana.org/assignments/media-types/application/zip

Date: (Jul 20, 93)

      version made by (2 bytes)

      The upper byte indicates the host system (OS) for the
      file.  Software can use this information to determine
      the line record format for text files etc.  The current
      mappings are:

      0 - MS-DOS and OS/2 (F.A.T. file systems)
      1 - Amiga                     2 - VAX/VMS
      3 - *nix                      4 - VM/CMS
      5 - Atari ST                  6 - OS/2 H.P.F.S.
      7 - Macintosh                 8 - Z-System
      9 - CP/M                      10 thru 255 - unused

      The lower byte indicates the version number of the
      software used to encode the file.  The value/10
      indicates the major version number, and the value
      mod 10 is the minor version number.

Pkware

I read the following doc https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

Date: (Nov 01, 2022)

   4.4.2 version made by (2 bytes)

        4.4.2.1 The upper byte indicates the compatibility of the file
        attribute information.  If the external file attributes 
        are compatible with MS-DOS and can be read by PKZIP for 
        DOS version 2.04g then this value will be zero.  If these 
        attributes are not compatible, then this value will 
        identify the host system on which the attributes are 
        compatible.  Software can use this information to determine
        the line record format for text files etc.  

        4.4.2.2 The current mappings are:

         0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
         1 - Amiga                     2 - OpenVMS
         3 - UNIX                      4 - VM/CMS
         5 - Atari ST                  6 - OS/2 H.P.F.S.
         7 - Macintosh                 8 - Z-System
         9 - CP/M                     10 - Windows NTFS
        11 - MVS (OS/390 - Z/OS)      12 - VSE
        13 - Acorn Risc               14 - VFAT
        15 - alternate MVS            16 - BeOS
        17 - Tandem                   18 - OS/400
        19 - OS X (Darwin)            20 thru 255 - unused

        4.4.2.3 The lower byte indicates the ZIP specification version 
        (the version of this document) supported by the software 
        used to encode the file.  The value/10 indicates the major 
        version number, and the value mod 10 is the minor version 
        number. 

Issue 2: change the lower byte of ZIP_VERSION_MADE_BY

Actual Proposed
0x0603 0x063F

0x3F = 63 = 6.3 which is 18 years old (Sept 29, 2006) if I understand properly the Pkware doc.

(tested with zipinfo : the number is accurate)

Issue 3: change the upper byte of ZIP_VERSION_MADE_BY

I am less confident here about this because I couldn't test it on every platform. I used it under ubuntu (GUI, cli, ...).

Actual Proposed
0x0603 0x0E3F

0x0E = 14 which correspond to VFAT regarding the Pkware doc.

[!important] What is very interesting here is that change obsoletes #196 about the tricky flag for Linux unzip.

I could unzip files with UTF-8 names without the trick. Tho "it works on my machine" ... is not enough : some other folks must test this on their computers too !

Finally

I can make a accordingly PR if you think that's useful : let me know πŸ™πŸ». I wanted your feedback before doing it because I may have missed important things regarding backward compatibility of your lib.

Unrelated but here is my own lib if you are curious about how I managed things in php 8.3 (sharing is caring πŸ€—).

NicolasCARPi commented 7 months ago

Hello,

Thanks for sharing. I don't understand what issue you're trying to fix here, could you expand on that please?

arnapou commented 7 months ago

Sorry if I was not clear.

To me :

  1. phpdoc about obsolete documentation
  2. The 0.3 version is not correct, either you want the ZipStream version 3.0 then you need to put 0x061E, not 0x0603. Either you want to put the Pkware version then something like 0x063F.
  3. The trick/hack flag for Linux unzip is unnecessary code which can be removed if you change the upper byte to something like VFAT 0x0E (to be more tested than what I did)
maennchen commented 7 months ago

@arnapou Thanks for your detailed analysis.

I'm generally open to adjust those things, but I'm also a bit apprehensive about it since we've had a lot of issues with various unarchiving tools and what we have right now seems to work with most of them.

If we change this, I would propose to add a config flag to enable the new behavior and see how it affects people before applying it for everyone.

arnapou commented 7 months ago

I understand, that's why I wasn’t confident in doing a PR. But I couldn't pass my way too.

[!note] These suggestions are not technically necessary to make ZipStream work.

The thing is, I'm a bit technically picky and I wanted to pass these information to you.

:point_right: Up to you to:

:hugs:

NicolasCARPi commented 7 months ago

I agree with @maennchen, if it ain't broken, don't fix it. I'm also afraid that this will have consequences, so it's best to leave it like that, except maybe the phpdoc because this cannot have an impact on users.

arnapou commented 7 months ago

Your lib, your choice πŸ™‚

I agree :100: with "if it ain't broken, don't fix it", but beware of tons of dust to clean after a few years πŸ˜…

πŸ‘‹ πŸ’™

PS: feel free to reopen or reach me if needed.