Log::Dispatch:Screen: fix encoding test

mauke commented 1 year ago

The STD{IN,OUT,ERR} default handles start out in some unspecified (platform-specific) text encoding. The 'utf8' option of Log::Dispatch::Screen manually encodes the logged messages as bytes in UTF-8 format before writing them to the output handle. Thus, by default, the UTF-8 bytes get re-encoded by whatever the native text encoding is on the platform.

The correct way to handle this is to either not set 'utf8' (and rely on the encoding layer of the handle) or set 'utf8' and call binmode() on the handle (or otherwise apply a ':raw' layer) to ensure that bytes are written as is (without further re-encoding).

Fixes #32.

autarch commented 1 year ago

Hi @mauke,

Thanks for this PR.

It's been a while since I thought about this, but looking at the discussion in #32, I'm not sure this is a good change. Here's a summary of relevant points from the discussion:

There's a lot of different env vars that can affect how the Perl interpreter works. In the case of this test, it seems like PERL_UNICODE is the most relevant one.
It's not realistic (IMO) to expect CPAN module authors to account for all of these env vars in their tests, where "account for" could mean resetting their effects as this patch does, or conversely trying to test with all possible variations of them.
The fact that this tests fails with certain PERL_UNICODE settings might be useful information for people. They could have a corresponding issue when using the Log::Dispatch::Screen module, where if PERL_UNICODE is set than passing utf8 => 1 will cause double encoding.

What do you think?

mauke commented 1 year ago

I have some objections.

My patch does not reset the effects of PERL_UNICODE.
PERL_UNICODE is not some weird corner case that completely changes how Perl works (and that authors have to compensate for); it is simply one way to set the default encoding of STD* streams.
Writing binary data to a text stream is a bug, even if it happens to "work" on some platforms some of the time.
The test fails because the code is wrong. People looking for examples of how to use Log::Dispatch::Screen with utf8 are better served by a working test, not a broken one, IMHO.

Quoting perldoc -f binmode:

Arranges for FILEHANDLE to be read or written in "binary" or "text" mode on systems where the run-time libraries distinguish between binary and text files.

... which is all systems that use Unicode, like mine (with LANG=en_US.UTF-8)!

For the sake of portability it is a good idea always to use it when appropriate, and never to use it when it isn't appropriate. Also, people can set their I/O to be by default UTF8-encoded Unicode, not bytes.

In other words: regardless of platform, use "binmode" on binary data, like images, for example.

My use case is explicitly supported by how binmode is supposed to be used according to the documentation. The code in the test file violates that implicit contract by writing binary data to a text stream. Making it use binmode fixes things everywhere, regardless of the value of PERL_UNICODE or whether the platform uses (a superset of) ASCII or not.

(Also, just once I'd like to be able to say cpan Dist::Zilla and have it actually go through without errors.)

houseabsolute / Log-Dispatch

Log::Dispatch:Screen: fix encoding test #68