zlib-ng / minizip-ng

Fork of the popular zip manipulation library found in the zlib distribution.
Other
1.23k stars 430 forks source link

-c works with -x but not -l #689

Open Jimmy-Z opened 1 year ago

Jimmy-Z commented 1 year ago

I got a zip file in CP936/GBK, -x -c 936 is able to extract the file correctly, but:

Pipe to iconv likeminizip -l a.zip | iconv -f gbk -t utf8 works.

It seems `-c' doesn't affect -l in any way.

pmqs commented 1 year ago

Tested the attached zip file, folder.zip on my Ubuntu setup. Running a fresh minizip

$ minizip -h  
minizip-ng 3.0.9 - https://github.com/zlib-ng/minizip-ng

The zip file contains the following

$ unzip -l -O cp936 folder.zip 
Archive:  folder.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2019-07-09 13:27   folder/
        0  2019-07-09 13:26   folder/新建文本文档.txt
        0  2019-07-09 13:27   folder/新建文档.docx
---------                     -------
        0                     3 files

First try listing its contents with minizip

$ minizip -c 936 -l  folder.zip 
minizip-ng 3.0.9 - https://github.com/zlib-ng/minizip-ng
---------------------------------------------------
-c -l folder.zip 
      Packed     Unpacked Ratio Method   Attribs Date     Time  CRC-32     Name
      ------     -------- ----- ------   ------- ----     ----  ------     ----
           0            0    0% stored        10 07-09-19 06:27 00000000   folder/
           0            0    0% stored        20 07-09-19 06:26 00000000   folder/�½��ı��ĵ�.txt
           0            0    0% stored        20 07-09-19 06:27 00000000   folder/�½��ĵ�.docx

I see the same encoding issue. Now use minizip to extract the contents of the zip file

$ minizip -c 936 -x  folder.zip 
minizip-ng 3.0.9 - https://github.com/zlib-ng/minizip-ng
---------------------------------------------------
-c -x folder.zip 
Archive folder.zip
Extracting folder/
Extracting folder/�½��ı��ĵ�.txt
Extracting folder/�½��ĵ�.docx

Note the encoding issue with the Extracting... lines

Check what was written to disk.

$ ls -l folder
total 0
-rw-rw-rw- 1 paul paul 0 Jul  9  2019 新建文本文档.txt
-rw-rw-rw- 1 paul paul 0 Jul  9  2019 新建文档.docx

That looks fine.

Looks like there are (at least) two places where the code isn't doing what is expected when the -c option is specified.

After a brief look at the code I see that mz_os_utf8_string_create is used to do the UTF8 encoding on the filename. That function is only called from mz_zip_reader_save_all which is part f the extract workflow.

nmoinvaz commented 1 year ago

If would be helpful if you can submit a PR.