pombreda / libarchive

Automatically exported from code.google.com/p/libarchive
Other
0 stars 0 forks source link

Add support for UTF-8 extra header in ZIP. #381

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Create files with characters some language specific characters such as ö, 
ä, å on a Windows machine.
2. Compress them to ZIP using the Windows default packer.
3. Try to uncompress with libarchive.

What is the expected output?
Correct file names.

What do you see instead?
Wrong characters in the file name.

What version are you using?
ToT, also stable. Both fail.

On what operating system?
Linux / Chrome OS.

How did you build?  (cmake, configure, or pre-packaged binary)
make

What compiler or development environment (please include version)?
NaCL ports.

Please provide any additional information below.

I investigated the code, and it seems that libarchive handles UTF-8 file names 
in ZIP as long as they are used in the primary file path field in ZIP.

However, it seems that on Windows, the primary header uses the OEM encoding (on 
Windows depending on the language), but UTF-8 is saved in the special extra 
header with 0x7075 id.

Original issue reported on code.google.com by mtomasz@chromium.org on 6 Nov 2014 at 9:01

GoogleCodeExporter commented 9 years ago
I'll be happy to add support for it, if you are interested in accepting the 
patch.

Original comment by mtomasz@chromium.org on 6 Nov 2014 at 9:02

GoogleCodeExporter commented 9 years ago
This sounds like a very good idea.

Please send us a pull request on github with your patches.

On a related note, I believe the Info-Zip maintainers are considering writing 
UTF-8 in the primary header when they create archives; I wonder if libarchive 
should do the same?

Original comment by kientzle@gmail.com on 7 Nov 2014 at 1:43

GoogleCodeExporter commented 9 years ago
I can add read support, as that's what I need now. Are you OK to merge only 
read support for the header?

Original comment by mtomasz@chromium.org on 7 Nov 2014 at 1:49

GoogleCodeExporter commented 9 years ago
A patch for only read support would be welcome.

Original comment by kientzle@gmail.com on 7 Nov 2014 at 1:55

GoogleCodeExporter commented 9 years ago
I created a pull request at:
https://github.com/libarchive/libarchive/pull/93

Original comment by mtomasz@chromium.org on 10 Nov 2014 at 7:34

GoogleCodeExporter commented 9 years ago
Hi guys. It would be great if we could merge this patch quickly, as it's 
blocking crbug.com/429987. Thank you.

Original comment by mtomasz@chromium.org on 13 Nov 2014 at 1:28

GoogleCodeExporter commented 9 years ago
Thanks for the reminder.

Could you please provide a small example archive that tickles the problem so I 
can add it to the test suite?  We want to make sure this feature doesn't get 
broken with future changes.

Original comment by kientzle@gmail.com on 13 Nov 2014 at 2:09

GoogleCodeExporter commented 9 years ago
@kientzle: I think the one attached to crbug.com/429987 should be perfect.

https://chromium.googlecode.com/issues/attachment?aid=4299870003001&name=Test+-+
created+with+winrar+5.20.zip&token=ABZ6GAfKW7ci14bKEDFt6VZ0hDZNKHVjOg%3A14158447
52176

Original comment by mtomasz@chromium.org on 13 Nov 2014 at 2:14

GoogleCodeExporter commented 9 years ago
Hmmm...  I built a simple test around the first of the sample Zip archives 
attached to that bug and your patch seems to make no difference at all.

Looking more carefully, that file does not actually use the 0x7075 extension.  
I haven't disassembled the file in any detail, but the filenames are certainly 
not UTF-8 and I don't see anything obvious pointing out what encoding those 
filenames are using.

I switched to the second file (the one created with WinRar).  That one does 
seem to use the 0x7075 extension, but it's still not working for me on MacOS; 
what platform(s) have you tried it on?  I won't have time to dig any further 
until next week.

Original comment by kientzle@gmail.com on 14 Nov 2014 at 5:06

GoogleCodeExporter commented 9 years ago
@kientzle: Sorry, I must have attached the incorrect file. Did you set the 
default locale to UTF-8 in the code using libarchive? Eg. setlocale(LC_ALL, 
"en_US.UTF-8")?

I tried on Linux, and it worked fine.

Original comment by mtomasz@chromium.org on 14 Nov 2014 at 5:10