Closed bbkr closed 2 years ago
Have you forgot to call binmode *STDOUT, ':utf8';
? Otherwise say
would not be able to print Unicode string to STDOUT in UTF-8.
Thanks, that was it!
Single
ć
as name:my $parsed = Email::MIME->new(q{From: =?UTF-8?Q?=C4=87?= <x@example.com>}."\r\n\r\n"); say $parsed->header_str('from');
ć <x@example.com>
(perfect, works)
Nope, it does not work. It produce warning Wide character in say
which you probably disabled or ignored. This warning is important here because it says that Perl cannot print something in specified encoding (which is probably ISO-8859-1) and printed it in UTF-8.
$ perl -W -Mstrict -MEmail::MIME -Mfeature=say -e 'my $parsed = Email::MIME->new(q{From: =?UTF-8?Q?=C4=87?= <x@example.com>}."\r\n\r\n"); say $parsed->header_str("from");'
Wide character in say at -e line 1.
ć <x@example.com>
Single
é
as name:my $parsed = Email::MIME->new(q{From: =?UTF-8?Q?=C3=A9?= <x@example.com>}."\r\n\r\n"); say $parsed->header_str('from');
� <x@example.com>
(broken, decoding replaced valid UTF-8 character with replacement character fffd)
This is not broken and works correctly. You have not explicitly configured *STDOUT
to print in UTF-8 and therefore default encoding (which is ISO-8859-1) was used. Your terminal probably is not configured in Perl's default encoding (ISO-8859-1) and therefore prints this garbage.
You can verify that output is correct in ISO-8859-1 by sending perl output to iconv which will do conversion from ISO-8859-1 to UTF-8 (I guess your terminal is in UTF-8):
$ perl -W -Mstrict -MEmail::MIME -Mfeature=say -e 'my $parsed = Email::MIME->new(q{From: =?UTF-8?Q?=C3=A9?= <x@example.com>}."\r\n\r\n"); say $parsed->header_str("from");' | iconv -f latin1 -t utf-8
é <x@example.com>
(my terminal is UTF-8, perl printed in latin1=iso-8895-1 and iconv converted output from perl encoding to my terminal encoding)
Combined
éć
as name:my $parsed = Email::MIME->new(q{From: =?UTF-8?Q?=C3=A9=C4=87?= <x@example.com>}."\r\n\r\n"); say $parsed->header_str('from');
éć <x@example.com>
(
é
suddenly works?)
It also produce warning:
$ perl -W -Mstrict -MEmail::MIME -Mfeature=say -e 'my $parsed = Email::MIME->new(q{From: =?UTF-8?Q?=C3=A9=C4=87?= <x@example.com>}."\r\n\r\n"); say $parsed->header_str("from");'
Wide character in say at -e line 1.
éć <x@example.com>
But what happens here? Why Perl in third case automatically did something (with warning)? This has nothing to do with Email::MIME and neither Email::Address::XS. Unfortunately this is standard Perl behavior. It is a bug which due to backward compatibility will never be fixed and is documented as The "Unicode Bug".
If you have not read about it then look into perlunicode documentation: https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22
It is important to understand how Perl works with Unicode as it is different than in other programming languages and misunderstanding may lead to other bugs...
This bug happens when decoding Quoted-Printable headers, and it is complete weirdo:
Single
ć
as name:(perfect, works)
Single
é
as name:(broken, decoding replaced valid UTF-8 character with replacement character fffd)
Combined
éć
as name:(
é
suddenly works?)Email::MIME 1.949 Email::Address::XS 1.04