Open mauke opened 1 year ago
Hi @mauke,
Thanks for this PR.
It's been a while since I thought about this, but looking at the discussion in #32, I'm not sure this is a good change. Here's a summary of relevant points from the discussion:
PERL_UNICODE
is the most relevant one.PERL_UNICODE
settings might be useful information for people. They could have a corresponding issue when using the Log::Dispatch::Screen
module, where if PERL_UNICODE
is set than passing utf8 => 1
will cause double encoding.What do you think?
I have some objections.
PERL_UNICODE
.PERL_UNICODE
is not some weird corner case that completely changes how Perl works (and that authors have to compensate for); it is simply one way to set the default encoding of STD* streams.utf8
are better served by a working test, not a broken one, IMHO.Quoting perldoc -f binmode
:
Arranges for FILEHANDLE to be read or written in "binary" or "text" mode on systems where the run-time libraries distinguish between binary and text files.
... which is all systems that use Unicode, like mine (with LANG=en_US.UTF-8
)!
For the sake of portability it is a good idea always to use it when appropriate, and never to use it when it isn't appropriate. Also, people can set their I/O to be by default UTF8-encoded Unicode, not bytes.
In other words: regardless of platform, use "binmode" on binary data, like images, for example.
My use case is explicitly supported by how binmode
is supposed to be used according to the documentation. The code in the test file violates that implicit contract by writing binary data to a text stream. Making it use binmode
fixes things everywhere, regardless of the value of PERL_UNICODE
or whether the platform uses (a superset of) ASCII or not.
(Also, just once I'd like to be able to say cpan Dist::Zilla
and have it actually go through without errors.)
The STD{IN,OUT,ERR} default handles start out in some unspecified (platform-specific) text encoding. The 'utf8' option of Log::Dispatch::Screen manually encodes the logged messages as bytes in UTF-8 format before writing them to the output handle. Thus, by default, the UTF-8 bytes get re-encoded by whatever the native text encoding is on the platform.
The correct way to handle this is to either not set 'utf8' (and rely on the encoding layer of the handle) or set 'utf8' and call binmode() on the handle (or otherwise apply a ':raw' layer) to ensure that bytes are written as is (without further re-encoding).
Fixes #32.