Closed jwilk closed 2 years ago
Hello! Non-7-bit characters are allowed to fully support Internationalized Email Headers as defined in RFC 6532.
Also Email::Address::XS is used also for storing UNICODE email addresses and then processed by MIME encoder to convert it to full 7-bit ASCII object (just in different UNICODE representation - RFC 2047).
Fair enough (although the address in my example is neither valid per RFC 6532 nor could it be MIME-encoded). But if this is intentional, it should be documented.
although the address in my example is neither valid per RFC 6532
I do not see reason why. U+FF
is fully valid UNICODE code point. It is ÿ
- LATIN SMALL LETTER Y WITH DIAERESIS. In UTF-8 it is encoded as 0xC3 0xBF
.
Lets see:
$ perl -MEncode -e 'my $unicode = "\xFF"; my $utf8 = encode("UTF-8", $unicode); print $utf8;' | xxd -g 1
00000000: c3 bf .. ..
Oh, sure, I could have encoded U+00FF as UTF-8; but that's not what I did.
$ perl -MEmail::Address::XS -E 'say Email::Address::XS->parse("\xFF\@jwilk.net")->format' > addr
$ xdd < addr
00000000: ff40 6a77 696c 6b2e 6e65 740a .@jwilk.net.
$ isutf8 < addr
(standard input): line 1, char 0, byte 0: Expecting bytes in the following ranges: 00..7F C2..F4.
Yea and this is the infamous bug. If the API input is in UNICODE or in UTF-8. But thankfully this XS module is written in the way that all non-7-bit characters are passed as-is and also the internal perl utf8 flag is respected and correctly propagated. So not having checks for character >= 0x80 (non-7-bit-ASCII) just make this things work correctly without need to define if API is in UNICODE, UTF-8 or any other encoding backward compatible with 7-bit-ASCII.
So... I do not see there any issue. Just user has to know how to use UNICODE in Perl correctly.
You cannot print UNICODE string to stdout or file. UNICODE string is just sequence of ordinals (code points, numbers) without any specific format how are numbers encoded to byte stream. UTF-8 is one specific encoding of UNICODE strings (but there are lot of others) to byte stream. So if you have Perl UNICODE string (sequence of ordinals) and want to save it into file, you first need to convert it to byte stream.
If you try to print or store something which is not byte stream then result is same as garbage in, garbage out.
I updated documentation in commit a844a70ff96d62a9dff1db66a4a9622ff24f870b to address Internationalized Email Headers and UNICODE.
This code:
prints 1.
But RFC 5322 addresses are ASCII-only.