mholt / archiver

DEPRECATED. Please use mholt/archives instead.
https://github.com/mholt/archives
MIT License
4.45k stars 392 forks source link

messy code with chinese file name #33

Closed chrislearn closed 6 years ago

chrislearn commented 7 years ago

When there is file with chinese name, the unzipped filename is messy code.

image

I use windows 10 pro. go 1.8.1 64bit version

halfcrazy commented 7 years ago

If you are using a Chinese window system, the file name is in GBK encoding. If you want zip to support this case, you may set zip header's flag bit 11. header.Flags = 1 << 11 Or convert your filename to utf-8.

ref: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

D.2 If general purpose bit 11 is unset, the file name and comment should conform to the original ZIP character encoding. If general purpose bit 11 is set, the filename and comment must support The Unicode Standard, Version 4.1.0 or greater using the character encoding form defined by the UTF-8 storage specification. The Unicode Standard is published by the The Unicode Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files is expected to not include a byte order mark (BOM).

Append: seems there is a commit to set this flag in go 1.9rc1 https://github.com/golang/go/commit/0a3f3e166d702f477863a5260779fa0357c72302