mattconnolly / ZipArchive

zip archive processing for Cocoa - iPhone and OS X
http://code.google.com/p/ziparchive/
MIT License
840 stars 260 forks source link

Support unicode filenames #21

Closed mronkko closed 11 years ago

mronkko commented 11 years ago

This might be a one line change. (At least it works for me this way.)

Change

NSString * strPath = [NSString stringWithCString:filename encoding:NSUTF8StringEncoding];

to

NSString * strPath = [NSString stringWithCString:filename encoding:NSUTF8StringEncoding];

on

https://github.com/mattconnolly/ZipArchive/blob/master/ZipArchive.m#L293

mattconnolly commented 11 years ago

I did notice that, but it hasn't been a problem for me.

Could zip files have their file names in another character set?

Perhaps it would be better to make the filename encoding a property that the user can set. That way, if they know the file is coming from windows, they can use NSWindowsCP1252StringEncoding for example.

Thoughts?

mronkko commented 11 years ago

User configurable encoding would work for me. In my application I do not know what files the users will be working with, thought so I would just set it to UTF8. Wouldn't UTF8 be a good default? I am not expert on this, but it should be backwards compatible with ASCII and therefore if a filename really is ASCII, decoding it with UTF8 produces the correct result.

Here is a test case

https://dl.dropboxusercontent.com/u/694399/5XVQU5E3.zip

The file opens on my Mac, but the filename is garbage.

Mikkos-MacBook-Pro:Downloads mronkko$ unzip -l 5XVQU5E3.zip 
Archive:  5XVQU5E3.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
  2866661  08-06-13 14:24   Питер Ю, Мануэль Кардона - Основы физики полупроводников - 2002 [05].pdf
 --------                   -------
  2866661                   1 file

This does not open with the current version of ZipArchive, but it opens with the same garbage filename when I change the encoding to NSUTF8StringEncoding

mattconnolly commented 11 years ago

What an interesting filename. I don't know what encoding that is in. I think it's lucky that UTF-8 accepted it.

UTF-8 has some rules about what is valid. You can't expect to throw random bytes at it and expect it to mean something.

Would you happen to know what language that file is in, and then guess the correct encoding?

Certainly looks like a configurable option is the way to go. Perhaps you could guess the encoding based on the user's locale settings or something.

mronkko commented 11 years ago

The filename is in Russian and cyrilic characters. It may work with NSWindowsCP1251StringEncoding, but I have not tested it.

User configurable encoding is a good idea because it would allow fallback options or even presenting the user an UI to choose the correct encoding if it cannot be determined otherwise.

mattconnolly commented 11 years ago

Added property in https://github.com/mattconnolly/ZipArchive/commit/11c879a8b7b6c57f1ae45d2aa7fd98d44149b02e

Can you please verify this works for you?

mronkko commented 11 years ago

Works just fine.

I will include this in the test version of my app and let you know if any of my testers encounter any problems.

On Aug 7, 2013, at 15:07 , Matt Connolly notifications@github.com<mailto:notifications@github.com> wrote:

Added property in 11c879ahttps://github.com/mattconnolly/ZipArchive/commit/11c879a8b7b6c57f1ae45d2aa7fd98d44149b02e

Can you please verify this works for you?

— Reply to this email directly or view it on GitHubhttps://github.com/mattconnolly/ZipArchive/issues/21#issuecomment-22246354.

mattconnolly commented 11 years ago

Released updated cocoapod, version 1.2.0 including this update.

mronkko commented 11 years ago

This change or some other change has caused some problems for some of my testers and I have myself confirmed cases where after encoding filenames with UTF8, the zipfile can no longer be opened either with ZipArchive on iOS or on my Mac.

I am still investigating the exact cause.

mattconnolly commented 11 years ago

Can you please clarify: If you set the stringEncoding to ascii it works, right? Like so:

    ZipArchive* zip;
    zip.stringEncoding = NSASCIIStringEncoding;
mronkko commented 11 years ago

I need to do some more tests, and will post the results.

mronkko commented 11 years ago

I created an empty project and setup ZipArchive with CocoaPods and this setup works just fine with UTF8, so the corrupted zip files are caused by something else.

mronkko commented 11 years ago

I have done some testing and it seems that I cannot create valid zip archives with any version of ZipArchive installed with cocoapods. Unzipping existing archives works just fine.

I set up a a very simple iOS app that compresses a PDF file in the AppDelegate and then uncompresses the same file. The uncompressing part fails and when I run this in the simulator, I can uncompress the archive, but the file inside is corrupted.

https://dl.dropboxusercontent.com/u/694399/ZA.zip