Tests Fail On Linux With Non-UTF Locale

guzba / zippy

Pure Nim implementation of deflate, zlib, gzip and zip.

MIT License

246 stars 29 forks source link

Tests Fail On Linux With Non-UTF Locale #77

Closed c-blake closed 1 month ago

c-blake commented 1 month ago

export LC_ALL=C nimble test should reproduce the failure in most environments.

More specifically, the test code tries to open "/tmp/zippy/Nim-1.6.6/tests/misc/#U00e5#U00e4#U00f6.nim" (that literal string with '#' 'U' and so on in it). I did not track where within the bowels of test/nimble/zippy string handling the utf8 gets converted to that expanded representation in this mode, though.

guzba commented 1 month ago

I have no idea what export LC_ALL=C nimble test does, what Nim versions that does something on, or what is even being tested here.

I believe I should just remove the Nim zip archive test and move on? This test was not there to test random Unicode stuff, just to be a more complex archive to validate.

c-blake commented 1 month ago

Well, I don't know if you want to know, but briefly LC_ALL is just this Unix "Locale { L)o-C)ale } setting" device via environment variables. (As written, I am probably missing a ';' or have an extra export, but that's POSIX sh BS.). "C" is the regular old ASCII-only locale, not UTF8 and that test archive has a file whose name is utf8 (to test Nim's ability to import such modules).

I believe I should just remove the Nim zip archive test and move on?

Well, that would probably work to close this issue. Fine by me. I only stumbled on it trying to get the Nim CI to work with a new hash(x:string).

guzba commented 1 month ago

Thanks for the clarification. If a developer has disabled UTF-8 but tries to open an archive with UTF-8 file names in it, what should happen? I sort of expect this test to fail in that case.

c-blake commented 1 month ago

Could well be it should fail. I suppose it depends upon the promises one wants to make as well as the platform (Win/Lin/etc.). I'm not a Zip format user and so I don't even know what various other tools do.

Setting that is a relatively common thing to do on Linux (I don't have any stats/a study) if you don't deal with a lot of UTF8 stuff to get max performance out of system utilities like grep (the searcher) and sed {a s)tream ed)itor} where a character is just a byte. There was a time when max/min perf was like literally 50..100X and very, very noticeable. This is why I personally had it set.

It's also conceivable there is some Windows registry setting (for similar performance reasons &| back.compat) to emulate setting LC_ALL=C. So, not using that archive (or dropping the test entirely) might also help in such cases.

guzba commented 1 month ago

Setting that is a relatively common thing to do on Linux (I don't have any stats/a study) if you don't deal with a lot of UTF8 stuff to get max performance out of system utilities like grep (the searcher) and sed {a s)tream ed)itor} where a character is just a byte.

This fact reflects quite poorly on these tools as ASCII is UTF-8 (though UTF-8 is not always ASCII). There is no reason for such a huge perf difference without implicating the software and I've done this myself in https://github.com/guzba/sunny and https://github.com/guzba/unicody

Regardless, I'll remove the test since IMO, sorry to be harsh, but this does not deserve any effort and is a total distraction for me.

guzba commented 1 month ago

Replaced the test zip archive in https://github.com/guzba/zippy/releases/tag/0.10.13 with one that does not appear to have any unicode file names.