unar does not create files with correct names for (at least) utf-8 characters.

GoogleCodeExporter commented 9 years ago

test.7z contains a single file named "ö"

>>>>> Excerpt from kcharselect >>>>
> Various Useful Representations of "ö"
> UTF-8: 0xC3 0xB6
> UTF-16: 0x00F6
> C octal escaped UTF-8: \303\266
> XML decimal entity: &#246;
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

When extracting, it will not produce this file name, but the octal value 366 
(hex 0xF6, decimal 246) see the following example run:

>>>>> Console output, example run >>>>
1> [jx@localhost test]$ ./unar test.7z 
2> test.7z: 7-Zip
3>   ö  (7 B)... OK.
4> Successfully extracted to "./m".
5> [jx@localhost test]$ ls -b
6> \366  test.7z  unar
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

Notes:
Interestingly, it manages to get the correct string on line 3.
The path on line 4 is random which I believe is indication of an uninitialized 
variable. lsar behaves irrationally this way too.

Some archives with file names containing Japanese characters, also fail 
completely creating a file because of an uncaught exception.

> unar: Uncaught exception NSCharacterConversionException, reason: Can't get 
cString from Unicode string.

Original issue reported on code.google.com by Jeol.For...@gmail.com on 18 Jul 2013 at 4:57

Attachments:

test.7z

GoogleCodeExporter commented 9 years ago

That sounds like a GNUstep bug rather than an unar bug. unar will just ask 
GNUstep to convert to the correct charset for your computer's filesystem. Or 
perhaps your locale settings are not correct and that makes GNUstep confused?

Original comment by paracel...@gmail.com on 18 Jul 2013 at 5:04

GoogleCodeExporter commented 9 years ago

Test archive also illustrating the uncaught exception with Japanese file names.

Original comment by Jeol.For...@gmail.com on 18 Jul 2013 at 5:10

Attachments:

test_jap.7z

GoogleCodeExporter commented 9 years ago

Oh, okay. Thanks and sorry for the ml-noise.

Original comment by Jeol.For...@gmail.com on 18 Jul 2013 at 5:12

GoogleCodeExporter commented 9 years ago

If you have access to another system with some other Linux distro, try it there 
and see if it works any better. Or try to figure out how GNUstep determines 
what character set to use, I have no idea there.

Original comment by paracel...@gmail.com on 18 Jul 2013 at 5:14

GoogleCodeExporter commented 9 years ago

Okay, I managed to avoid the uncaught exception and get correct file names by 
setting an environment variable:
export GNUSTEP_STRING_ENCODING=NSUTF8StringEncoding

It still displays the last character in file names wrong (unless it's ascii) 
when it prompts for how to handle collisions, but since that seems to be purely 
cosmetic I think my bug report can be closed.

Original comment by Jeol.For...@gmail.com on 18 Jul 2013 at 7:11

GoogleCodeExporter commented 9 years ago

It might be useful to take it up with the GNUstep devs, it seems like things 
should work more smoothly than that.

Original comment by paracel...@gmail.com on 18 Jul 2013 at 7:12

Changed state: Invalid

jianlinwei / theunarchiver

unar does not create files with correct names for (at least) utf-8 characters. #684