python / cpython

The Python programming language
https://www.python.org
Other
63.42k stars 30.37k forks source link

Zipfile couldn`t recognized character set rightly. #84587

Open ghost opened 4 years ago

ghost commented 4 years ago
BPO 40407
Nosy @Yhg1s, @serhiy-storchaka

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'library', '3.9'] title = 'Zipfile couldn`t recognized character set rightly.' updated_at = user = None ``` bugs.python.org fields: ```python activity = actor = 'iritkatriel' assignee = 'none' closed = False closed_date = None closer = None components = ['Library (Lib)'] creation = creator = '\xea\xb9\x80\xec\xa7\x80\xed\x9b\x88' dependencies = [] files = [] hgrepos = [] issue_num = 40407 keywords = [] message_count = 1.0 messages = ['367429'] nosy_count = 4.0 nosy_names = ['twouters', 'alanmcintyre', 'serhiy.storchaka', '\xea\xb9\x80\xec\xa7\x80\xed\x9b\x88'] pr_nums = [] priority = 'normal' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue40407' versions = ['Python 3.9'] ```

ghost commented 4 years ago

Hi,

I am not a developer. However, when I inquired about an abnormality of an open source program before, it was said that there was a problem with the Zipfile module of Python. So I would like to ask it here.

I`m a Korean, and a Windows user. And there are useful Windows compression programs in Korea. However, when using those compression programs, Debian's unzip utility finds character sets well, but fails to find in the case of python.

If you look at the attached file, (File size is too large, so attach it elsewhere - https://kutt.it/2F2Xec) there are other compressed files in the compressed file. The names in the compressed file are the names of the compressed programs.

And, as I have seen, the result of the basic compression is: 7zip : UTF-8 Alzip : UTF-8 BandiZip : EUC-KR BreadZip : EUC-KR PKZip : UTF-8 StarZip : EUC-KR WinRAR : UTF-8 WinZIP : EUC-KR Zipware : EUC-KR

BandiZip and Alzip are the two programs that compete in Korea. I use BandiZip with few ads and this supports multi-core for compression. StarZip is also a Korean program, but its share is not high. BreadZip is also a Korean program, which has been used a lot, but has been discontinued and used only for some people.

Anyway, it can be considered that compression softwares in Korea use both EUC-KR and UTF-8 formats. However, the Zipfile module does not recognize this properly.