Open mgorny opened 9 months ago
Apparently the difference is that glibc rejects codes for "user-defined" Big5 characters, where musl uses them. If I shorten the string to Märchen
, I can reproduce the same problem on a glibc system.
Just to spell it out (please correct me if I am wrong), "ä" (a with umlaut) is 0xe4 in iso-8859-1, which is a valid start byte for Big5 (in the "Less frequently used characters" set). If the following byte is 0x40-0x7e or 0xa1-0xfe, it can be a valid Big5, so e.g. "är" will pass as big5, whereas "ä." (or a string ending with "ä") will fail.
So zbar will favour Big5 in such cases although it should have favoured iso-8859-1 which is the default for QR codes per the standard.
"ü" (u with umlaut) is 0xfc in iso-8859-1, which is a valid start byte for Big5 in the "Reserved for user-defined characters" set. Which fails on glibc but passes as Big5 on musl (independently of the following byte?).
And if we were to ignore Big5, "är" would pass as valid SJIS which zbar currently favours over iso-8859-1.
The wrong detection as big5 is also reported in #212.
When running on musl libc, segno incorrectly detects iso-8859-1 encoded QRcodes as "big5". I've originally noticed this through a test failure in segno package.
An example QRcode file is:
On a glibc system this is decoded correctly:
On a musl system, it gets decoded as:
From debugging, I've established that the problem lies in zbar trying big5 first, and expecting
iconv()
to fail for this string, as it does on glibc:However, it doesn't fail on musl libc:
Confirmed with zbar as of a549566ea11eb03622bd4458a1728ffe3f589163, musl 1.2.3 (Gentoo) and 1.2.4_git20230717 (Alpine).