houseabsolute / Log-Dispatch

Dispatches messages to one or more outputs
https://metacpan.org/release/Log-Dispatch/
Other
12 stars 29 forks source link

Log::Dispatch:Screen: fix encoding test #68

Open mauke opened 1 year ago

mauke commented 1 year ago

The STD{IN,OUT,ERR} default handles start out in some unspecified (platform-specific) text encoding. The 'utf8' option of Log::Dispatch::Screen manually encodes the logged messages as bytes in UTF-8 format before writing them to the output handle. Thus, by default, the UTF-8 bytes get re-encoded by whatever the native text encoding is on the platform.

The correct way to handle this is to either not set 'utf8' (and rely on the encoding layer of the handle) or set 'utf8' and call binmode() on the handle (or otherwise apply a ':raw' layer) to ensure that bytes are written as is (without further re-encoding).

Fixes #32.

autarch commented 1 year ago

Hi @mauke,

Thanks for this PR.

It's been a while since I thought about this, but looking at the discussion in #32, I'm not sure this is a good change. Here's a summary of relevant points from the discussion:

What do you think?

mauke commented 1 year ago

I have some objections.

Quoting perldoc -f binmode:

Arranges for FILEHANDLE to be read or written in "binary" or "text" mode on systems where the run-time libraries distinguish between binary and text files.

... which is all systems that use Unicode, like mine (with LANG=en_US.UTF-8)!

For the sake of portability it is a good idea always to use it when appropriate, and never to use it when it isn't appropriate. Also, people can set their I/O to be by default UTF8-encoded Unicode, not bytes.

In other words: regardless of platform, use "binmode" on binary data, like images, for example.

My use case is explicitly supported by how binmode is supposed to be used according to the documentation. The code in the test file violates that implicit contract by writing binary data to a text stream. Making it use binmode fixes things everywhere, regardless of the value of PERL_UNICODE or whether the platform uses (a superset of) ASCII or not.

(Also, just once I'd like to be able to say cpan Dist::Zilla and have it actually go through without errors.)