ip7z / 7zip

7-Zip
458 stars 57 forks source link

Select OEM/ANSI code page according to system locale setting #36

Open unxed opened 1 month ago

unxed commented 1 month ago

Fixes

https://sourceforge.net/p/sevenzip/bugs/2473/ https://sourceforge.net/p/sevenzip/bugs/1060/

LupinThidr commented 3 weeks ago

Hello, thank you for this. I've seen your progress across multiple issue trackers regarding Linux and 7zip path encoding. I ran across your comments while trying to diagnose 7zip unable to decode Shift-JIS (CP932) encoded paths in LZH files I had thought that this patch would apply to those, but it seems ZipItem is specific to the .zip handler. Other software such as unar don't fully support the .LZH spec as 7zip does. I think your iconv conversion should be implemented as a separate function in StringConvert, so it can be easily used with other classes. I was able to make LzhHandler use it rather the standard MultiByteToUnicodeString, but I don't completely understand the 7zip codebase, so I've hardcoded it to CP932 rather than the a mcp argument.

void MultiByteToUnicodeString3_iconv(UString &res, const AString &s)
{
  res.Empty();
  if (s.IsEmpty())
    return;

  iconv_t cd;
  if ((cd = iconv_open("UTF-8", "CP932")) != (iconv_t)-1) {

    AString sUtf8;

    unsigned slen = s.Len();
    char* src = s.Ptr_non_const();

    unsigned dlen = slen * 4 + 1; // (source length * 4) + null termination
    char* dst = sUtf8.GetBuf_SetEnd(dlen);
    const char* dstStart = dst;

    memset(dst, 0, dlen);

    size_t slen_size_t = static_cast<size_t>(slen);
    size_t dlen_size_t = static_cast<size_t>(dlen);
    size_t done = iconv(cd, &src, &slen_size_t, &dst, &dlen_size_t);

    if (done == (size_t)-1) {
      iconv_close(cd);

      // iconv failed. Falling back to default behavior
      MultiByteToUnicodeString2(res, s, 932);
      return;
    }

    // Null-terminate the result
    *dst = '\0';

    iconv_close(cd);

    AString sUtf8CorrectLength;
    size_t dstCorrectLength = dst - dstStart;
    sUtf8CorrectLength.SetFrom(sUtf8, static_cast<unsigned>(dstCorrectLength));
    if (ConvertUTF8ToUnicode(sUtf8CorrectLength, res) /*|| ignore_Utf8_Errors*/)
      return;
  }

}

and then in LzhHandler's GetProperty:

UString dst;
MultiByteToUnicodeString3_iconv(dst, item.GetName());

UString s = NItemName::WinPathToOsPath(dst);

If you'd like to try implementing it for LZH, here's a sample file and the current / expected output of 7zz l https://archive.org/download/narcissu/na_sabun.lzh

2007-06-03 22:25:18 .....          537          392  na_sabun/▒C▒▒▒▒▒▒.txt
2007-06-03 22:25:18 .....          537          392  na_sabun/修正差分.txt

(I noticed the Debian package maintainer for 7zz is Japanese, so he may have some experience)