stachenov / quazip

Qt/C++ wrapper over minizip
Other
552 stars 229 forks source link

Test failing with Qt4 #127

Open WOnder93 opened 2 years ago

WOnder93 commented 2 years ago

When quazip 1.1 is built against Qt4 on Linux, the TestJlCompress::extractDir(Japaneses) subtest is failing as follows:

1: ********* Start testing of TestJlCompress *********
1: Config: Using QTest library 4.8.7, Qt 4.8.7
1: PASS   : TestJlCompress::initTestCase()
1: PASS   : TestJlCompress::compressFile()
1: PASS   : TestJlCompress::compressFiles()
1: PASS   : TestJlCompress::compressDir()
1: PASS   : TestJlCompress::extractFile()
1: PASS   : TestJlCompress::extractFiles()
1: QWARN  : TestJlCompress::extractDir(Japaneses) Couldn't create 
1: FAIL!  : TestJlCompress::extractDir(Japaneses) Couldn't create test files
1:    Loc: [/builddir/build/BUILD/quazip-1.1/qztest/testjlcompress.cpp(347)]
1: PASS   : TestJlCompress::zeroPermissions()
1: PASS   : TestJlCompress::symlinkHandling()
1: PASS   : TestJlCompress::cleanupTestCase()
1: Totals: 9 passed, 1 failed, 0 skipped
1: ********* Finished testing of TestJlCompress *********

Is it some known issue of Qt4? Perhaps the test vector should just be skipped under Qt4?

WOnder93 commented 2 years ago

Hm... I cannot reproduce it locally - might be some issue in the build environment where it happened. Let me investigate...

WOnder93 commented 2 years ago

So this happens only when the locale is set to "C" (or something else non-UTF-8). Overriding it to e.g. "C.UTF-8" makes the test pass. A workaround is to run the test with LC_ALL=C.UTF-8 in the environment. It is probably just some quirk of Qt4 (as Qt5 and 6 pass every time), so I'm closing this issue.

stachenov commented 2 years ago

I don't think it's a “quirk.” This test tries to create some files with Japanese names, zip them and extract them back. Obviously it'll fail when it's not possible to create a file with Japanese name. But in Linux, file names are just byte sequences, with no encoding info attached whatsoever. It's up to the apps to interpret those bytes as characters, and it's usually done using the default encoding for the locale. That's why it only works when the locale is set to something that can handle Japanese. It doesn't have to be UTF-8. Any Japanese encoding, like EUC-JP or S-JIS, would do just fine, I believe.

A similar test with Cyrillic names probably succeeds because they are somehow converted to some garble that Linux can handle just fine. The same probably happens with Qt 5 and 6 and Japanese names. So it's actually a quirk that these tests pass with the C locale.

WOnder93 commented 2 years ago

After some digging, it seems that the main difference is that Qt 5+ respects the codec chosen by libicu, which returns UTF-8 also for the plain C locale, at least on my system (Fedora 34):

$ LC_ALL=C icuinfo
 <icuSystemParams type="icu4c">
[...]
    <param name="locale.default">en_US_POSIX</param>
    <param name="locale.default.bcp47">en-US-u-va-posix</param>
    <param name="converter.default">UTF-8</param>
[...]
 </icuSystemParams>
[...]

But it's true that if the system was configured differently (probably rare these days), the test could still fail even on Qt 5+.

So it might be a good idea to call QTextCodec::setCodecForLocale(QTextCodec::codecForName("UTF-8")); at the beginning of the test when running on Linux, so that the file names are always successfully encoded to something. I just tried to add this to qztest.cpp and it really made the test pass with Qt 4 + LC_ALL=C.

stachenov commented 2 years ago

I'm not sure it's a good idea...

First off, I'd definitely like the tests to respect the default codec, at least when running with Qt 5+. No problem here, I could easily add this call only for Linux and Qt <5.

But there's another concern. Using setCodecForLocale is generally a bad idea because it overrides the default encoding set by the system. The system encoding is almost always the correct one, so overriding it makes little sense, and hence few apps actually do it, unless they need it for some very specific reasons.

Now what this failing test tells us is that with Qt 4 and the locale set incorrectly, any app using Qt 4 and QuaZip will fail to work with localized file names. So what's the point of making the tests pass when the same code in most real apps will fail? The tests will simply lie to the user in this case.

What would make sense is to make QuaZip somehow respect the libicu codec, like Qt 5 does. This way the tests will pass, and in a real app, it'll also work fine. But I've no idea how to approach it, since I'm not familiar with libicu. Maybe Qt's sources is a good place to start, but I'm unsure how to deal with the additional dependency on libicu. I don't think it's a good idea to directly link to it, as it may be unavailable...

cen1 commented 3 months ago

Interestingly enough, this is reproducible with Qt5 static build and qtzlib but not Qt6, even though both containers are prepared in essentially the same way. Pipeline ref: https://github.com/cen1/quazip/actions/runs/8460301121/job/23178247490

Given that Qt5 is also affected I am leaving this open, I need to deep dive into it and come to a conclusion.