Closed youpong closed 6 months ago
Similar instance, adapted from @xenu comment in https://github.com/Perl/perl5/issues/21841:
$ cat xpodlators-25.pl
#use v5.38;
use utf8;
use Pod::Text;
my $doc = <<'EOF';
=encoding utf8
=head1 NAME
😀Grinning face
EOF
my $doc2 = <<'EOF';
=encoding utf8
=head1 NAME
ä½ å¥½
EOF
binmode STDOUT, ":utf8";
my $p = Pod::Text->new();
$p->parse_string_document($doc);
my $q = Pod::Text->new();
$q->parse_string_document($doc2);
$ perlbrew use perl-5.36.0
$ perl xpodlators-25.pl
NAME
😀Grinning face
NAME
ä½ å¥½
$ perlbrew use perl-5.38.0
$ perl xpodlators-25.pl
NAME
ðGrinning face
NAME
你好
Now, suppose that, in the binmode
statement, instead of the ":utf8"
layer, we use ":encoding(UTF-8)"
.
$ cat ypodlators-25.pl
use 5.14.0;
use warnings;
use utf8;
use Pod::Text;
say "Perl version: $^V";
my $doc = <<'EOF';
=encoding utf8
=head1 NAME
😀Grinning face
EOF
my $doc2 = <<'EOF';
=encoding utf8
=head1 NAME
ä½ å¥½
EOF
binmode STDOUT, ":encoding(UTF-8)";
my $p = Pod::Text->new();
$p->parse_string_document($doc);
my $q = Pod::Text->new();
$q->parse_string_document($doc2);
$ diff xpodlators-25.pl ypodlators-25.pl
24c24
< binmode STDOUT, ":utf8";
---
> binmode STDOUT, ":encoding(UTF-8)";
perldoc -f binmode
has this to say about that difference in the layers:
To mark FILEHANDLE as UTF-8, use ":utf8" or ":encoding(UTF-8)".
":utf8" just marks the data as UTF-8 without further checking,
while ":encoding(UTF-8)" checks the data for actually being
valid UTF-8. More details can be found in PerlIO::encoding.
Let's run this.
$ perl ypodlators-25.pl
Perl version: v5.36.0
NAME
😀Grinning face
NAME
ä½ å¥½
$ perl ypodlators-25.pl
Perl version: v5.38.0
NAME
😀Grinning face
NAME
ä½ å¥½
So ":encoding(UTF-8)"
DWIMs, where `":utf8" does not.
This turned out to be due to not explicitly importing PerlIO, since that has to be done for PerlIO::F_UTF8
to exist. Thank you very much to @haarg for finding that. Complicating the problem, importing Test::More imports PerlIO, so this problem never occurs in test functions.
Closed in #28.
I have confirmed that the bug has been fixed by pull request #28. I will revert to ":utf8" in the binmode statement after the next release. Thanks to @jkeenan for the tip. Until the next release, I specify ":encoding(UTF-8)".
Describe the bug
Output of
parse_string_document
method is incorrect when I/O layer is specified inbinmode
.To Reproduce
Running this script will display the result of converting a string written in Pod format. Grinning face emoji is the expected behavior to be displayed, but it is garbled. If the
binmode
function call is commented out, the display will be as expected.Expected behavior
binmode
is not commented out and the grinning face emoji is correctly displayed.Additional context
It behaved as expected up to v4.14. It is reproduced in the latest sources since v5.00.