rra / podlators

Format POD source into various output formats
https://www.eyrie.org/~eagle/software/podlators/
Other
6 stars 11 forks source link

Output is incorrect when I/O layer is specified by binmode() #25

Closed youpong closed 6 months ago

youpong commented 8 months ago

Describe the bug

Output of parse_string_document method is incorrect when I/O layer is specified in binmode.

To Reproduce

Running this script will display the result of converting a string written in Pod format. Grinning face emoji is the expected behavior to be displayed, but it is garbled. If the binmode function call is commented out, the display will be as expected.

use v5.38;
use utf8;
use Pod::Text;

my $doc = <<'EOF';
=encoding utf8

=head1 NAME
    😀Grinning face
EOF

binmode STDOUT, ":utf8";

my $p = Pod::Text->new();
$p->parse_string_document($doc);

Expected behavior

binmode is not commented out and the grinning face emoji is correctly displayed.

Additional context

It behaved as expected up to v4.14. It is reproduced in the latest sources since v5.00.

jkeenan commented 7 months ago

Similar instance, adapted from @xenu comment in https://github.com/Perl/perl5/issues/21841:

$ cat xpodlators-25.pl 
#use v5.38;
use utf8;
use Pod::Text;

my $doc = <<'EOF';
=encoding utf8

=head1 NAME
    😀Grinning face
EOF

my $doc2 = <<'EOF';
=encoding utf8

=head1 NAME

你好

EOF

binmode STDOUT, ":utf8";

my $p = Pod::Text->new();
$p->parse_string_document($doc);

my $q = Pod::Text->new();
$q->parse_string_document($doc2);
$ perlbrew use perl-5.36.0
$ perl xpodlators-25.pl 
NAME
    😀Grinning face
NAME
    你好

$ perlbrew use perl-5.38.0
$ perl xpodlators-25.pl 
NAME
    ðGrinning face
NAME
    你好
jkeenan commented 7 months ago

Now, suppose that, in the binmode statement, instead of the ":utf8" layer, we use ":encoding(UTF-8)".

$ cat ypodlators-25.pl 
use 5.14.0;
use warnings;
use utf8;
use Pod::Text;

say "Perl version: $^V";

my $doc = <<'EOF';
=encoding utf8

=head1 NAME
    😀Grinning face
EOF

my $doc2 = <<'EOF';
=encoding utf8

=head1 NAME

你好

EOF

binmode STDOUT, ":encoding(UTF-8)";

my $p = Pod::Text->new();
$p->parse_string_document($doc);

my $q = Pod::Text->new();
$q->parse_string_document($doc2);

$ diff xpodlators-25.pl ypodlators-25.pl 
24c24
< binmode STDOUT, ":utf8";
---
> binmode STDOUT, ":encoding(UTF-8)";

perldoc -f binmode has this to say about that difference in the layers:

To mark FILEHANDLE as UTF-8, use ":utf8" or ":encoding(UTF-8)".
":utf8" just marks the data as UTF-8 without further checking,
while ":encoding(UTF-8)" checks the data for actually being
valid UTF-8. More details can be found in PerlIO::encoding.

Let's run this.

$ perl ypodlators-25.pl 
Perl version: v5.36.0
NAME
    😀Grinning face
NAME
    你好

$ perl ypodlators-25.pl 
Perl version: v5.38.0
NAME
    😀Grinning face
NAME
    你好

So ":encoding(UTF-8)" DWIMs, where `":utf8" does not.

rra commented 6 months ago

This turned out to be due to not explicitly importing PerlIO, since that has to be done for PerlIO::F_UTF8 to exist. Thank you very much to @haarg for finding that. Complicating the problem, importing Test::More imports PerlIO, so this problem never occurs in test functions.

rra commented 6 months ago

Closed in #28.

youpong commented 6 months ago

I have confirmed that the bug has been fixed by pull request #28. I will revert to ":utf8" in the binmode statement after the next release. Thanks to @jkeenan for the tip. Until the next release, I specify ":encoding(UTF-8)".