rikyoz / bit7z

A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
https://rikyoz.github.io/bit7z
Mozilla Public License 2.0
623 stars 113 forks source link

[Bug]: problem with unicode characters in archive item names #65

Closed vladimir-kraus closed 2 years ago

vladimir-kraus commented 2 years ago

bit7z version

3.1.x

7-zip version

v19.00

7-zip DLL used

7z.dll

MSVC version

2019

Architecture

x86_64

Which version of Windows are you using?

Windows 10

Bug description

Bit7z seems to fail when working with unicode characters in archive item names. I use bit7z 3.15 built with MSVC2019. I connect to 7z.dll version 21.07, 64-bit (note that I checked 19.00 version in the ticket params, but I did it because 21.07 is not available in the list). I use the same code which is displayed on the main project page in "Reading archive metadata" example. Here I am opening zip file but the same problem is with 7z format. I have not yet tested more formats.

#include "../include/bitarchiveinfo.hpp"
#include "../include/bitexception.hpp"

#include <iostream>

using namespace std;
using namespace bit7z;

int main()
{
    Bit7zLibrary lib{ L"7z.dll" };
    BitArchiveInfo arc{ lib, L"unicode.zip", BitFormat::Zip };

    //printing archive metadata
    wcout << L"Archive properties" << endl;
    wcout << L" Items count: "   << arc.itemsCount() << endl;
    wcout << L" Folders count: " << arc.foldersCount() << endl;
    wcout << L" Files count: "   << arc.filesCount() << endl;
    wcout << L" Size: "          << arc.size() << endl;
    wcout << L" Packed size: "   << arc.packSize() << endl;
    wcout << endl;

    //printing archive items metadata
    wcout << L"Archive items";
    auto arc_items = arc.items();
    for ( auto& item : arc_items ) {
        wcout << endl;
        wcout << L" Item index: "   << item.index() << endl;
        wcout << L"  Name: "        << item.name() << endl;
        wcout << L"  Extension: "   << item.extension() << endl;
        wcout << L"  Path: "        << item.path() << endl;
        wcout << L"  IsDir: "       << item.isDir() << endl;
        wcout << L"  Size: "        << item.size() << endl;
        wcout << L"  Packed size: " << item.packSize() << endl;
    }
}

Here is the attached archive file: unicode.zip

The output from my program is this:

Archive properties
 Items count: 8
 Folders count: 0
 Files count: 8
 Size: 0
 Packed size: 0

Archive items
 Item index: 0
  Name: ┴rvφzt

So we can see it correctly gets the number of items and archive metadata (the files are empty so size is correctly equal to 0). Finally it stops when attempting to write out the file of the first file. No exception is thrown, the program finishes with exit code 0. When I call 7z.exe to list the content of the file it works well. Note that this file was created also with 7z.exe.

These are the names of the files in the archive: Árvíztűrő.txt şoföre çabucak.txt ψυχοφθόρα.txt фальшивый.txt イロハニホヘト チリヌルヲ.txt いろはにほへとちりぬるを.txt הקליטה.txt ปฏิบัติประพฤติก.txt

Steps to reproduce

No response

Expected behavior

No response

Relevant compilation output

No response

Code of Conduct

rikyoz commented 2 years ago

Hi! Actually, the problem is not due to bit7z, which by default supports Unicode without any issues. It's the example code that doesn't work, and more specifically, the real problem is that printing Unicode strings in console programs on Windows is a nightmare. On Stack Overflow, you can find some possible fixes for this (e.g., https://stackoverflow.com/questions/2492077/output-unicode-strings-in-windows-console-app). I tested one of them with the example code, and it seems to work using the file you attached:

#include "../include/bitarchiveinfo.hpp"
#include "../include/bitexception.hpp"

#include <iostream>
#include <fcntl.h> //for _O_U16TEXT
#include <io.h>  //for _setmode

using namespace std;
using namespace bit7z;

int main()
{
    _setmode(_fileno(stdout), _O_U16TEXT); //setting the output encoding to UTF16

    Bit7zLibrary lib{ L"./7z.dll" };
    BitArchiveInfo arc{ lib, L"./unicode.zip", BitFormat::Zip };

    //printing archive metadata
    wcout << L"Archive properties" << endl;
    wcout << L" Items count: "   << arc.itemsCount() << endl;
    wcout << L" Folders count: " << arc.foldersCount() << endl;
    wcout << L" Files count: "   << arc.filesCount() << endl;
    wcout << L" Size: "          << arc.size() << endl;
    wcout << L" Packed size: "   << arc.packSize() << endl;
    wcout << endl;

    //printing archive items metadata
    wcout << L"Archive items";
    auto arc_items = arc.items();
    for ( auto& item : arc_items ) {
        wcout << endl;
        wcout << L" Item index: "   << item.index() << endl;
        wcout << L"  Name: "        << item.name() << endl;
        wcout << L"  Extension: "   << item.extension() << endl;
        wcout << L"  Path: "        << item.path() << endl;
        wcout << L"  IsDir: "       << item.isDir() << endl;
        wcout << L"  Size: "        << item.size() << endl;
        wcout << L"  Packed size: " << item.packSize() << endl;
    }
}

Output:

Archive properties
 Items count: 8
 Folders count: 0
 Files count: 8
 Size: 0
 Packed size: 0

Archive items
 Item index: 0
  Name: Árvíztűrő.txt
  Extension: txt
  Path: unicode\Árvíztűrő.txt
  IsDir: 0
  Size: 0
  Packed size: 0

 Item index: 1
  Name: şoföre çabucak.txt
  Extension: txt
  Path: unicode\şoföre çabucak.txt
  IsDir: 0
  Size: 0
  Packed size: 0

 Item index: 2
  Name: ψυχοφθόρα.txt
  Extension: txt
  Path: unicode\ψυχοφθόρα.txt
  IsDir: 0
  Size: 0
  Packed size: 0

 Item index: 3
  Name: фальшивый.txt
  Extension: txt
  Path: unicode\фальшивый.txt
  IsDir: 0
  Size: 0
  Packed size: 0

 Item index: 4
  Name: הקליטה.txt
  Extension: txt
  Path: unicode\הקליטה.txt
  IsDir: 0
  Size: 0
  Packed size: 0

 Item index: 5
  Name: ปฏิบัติประพฤติก.txt
  Extension: txt
  Path: unicode\ปฏิบัติประพฤติก.txt
  IsDir: 0
  Size: 0
  Packed size: 0

 Item index: 6
  Name: いろはにほへとちりぬるを.txt
  Extension: txt
  Path: unicode\いろはにほへとちりぬるを.txt
  IsDir: 0
  Size: 0
  Packed size: 0

 Item index: 7
  Name: イロハニホヘト チリヌルヲ.txt
  Extension: txt
  Path: unicode\イロハニホヘト チリヌルヲ.txt
  IsDir: 0
  Size: 0
  Packed size: 0

(note that I checked 19.00 version in the ticket params, but I did it because 21.07 is not available in the list)

Yeah, I need to update the issue templates.

I use the same code which is displayed on the main project page in "Reading archive metadata" example.

I'll probably need to update the example in the README, or at least pointing out that there may be this issue.

vladimir-kraus commented 2 years ago

Wow, such a swift response... Thank you for clarification. Your library is amazing!

rikyoz commented 2 years ago

Wow, such a swift response... ​Thank you for clarification.

No problem!

Your library is amazing!

Thank you! 🙏