sisimai / p5-sisimai

Mail Analyzing Interface for email bounce: A Perl module to parse RFC5322 bounce mails and generating structured data as JSON from parsed results. Formerly known as bounceHammer 4: an error mail analyzer.
https://libsisimai.org
BSD 2-Clause "Simplified" License
77 stars 26 forks source link

Cannot create object from UTF-8 mail address at Sisimai::Address #106

Open azumakuniyuki opened 8 years ago

azumakuniyuki commented 8 years ago

https://github.com/Exim/exim/blob/master/test/mail/4223.%E0%A4%AF%E0%A4%B9%E0%A4%B2%E0%A5%8B%E0%A4%97%E0%A4%B9%E0%A4%BF%E0%A4%A8%E0%A5%8D%E0%A4%A6%E0%A5%80%E0%A4%95%E0%A5%8D%E0%A4%AF%E0%A5%8B%E0%A4%82%E0%A4%A8%E0%A4%B9%E0%A5%80%E0%A4%82%E0%A4%AC%E0%A5%8B%E0%A4%B2%E0%A4%B8%E0%A4%95%E0%A4%A4%E0%A5%87%E0%A4%B9%E0%A5%88%E0%A4%82

$VAR1 = {
          'feedbacktype' => '',
          'deliverystatus' => '5.0.0',
          'rhost' => 'the.local.host.name',
          'timestamp' => 920367873,
          'diagnostictype' => 'SMTP',
          'addresser' => 'यहलोगहिन्दीक्योंनहींबोलसकतेहैं@japanese.なぜみんな日本語を話してくれないのか.local',
          'listid' => '',
          'diagnosticcode' => 'host 127.0.0.1 [127.0.0.1]',
          'reason' => '',
          'subject' => 'test',
          'action' => 'failed',
          'lhost' => 'the.local.host.name',
          'alias' => '',
          'timezoneoffset' => '+0000',
          'recipient' => 'userz@test.ex',
          'smtpcommand' => '',
          'softbounce' => 0,
          'messageid' => 'E10HmaX-0005vi-00@the.local.host.name',
          'smtpagent' => 'Exim'
        };

Sisimai::Address->new cannot create a object from the value of "addresser" above.

azumakuniyuki commented 8 years ago

Sisimai::Address->parse() could not parsed the UTF-8 address. https://github.com/azumakuniyuki/p5-Sisimai/blob/master/lib/Sisimai/Address.pm#L106

 next if $e =~ m/[^\x20-\x7e]/;

parse() does not deal an email address which is not encoded with Punycode.

azumakuniyuki commented 8 years ago

For example, "🐈@neko.nyaan.jp" should be encoded "=?utf-8?B?8J+QiA==?=@neko.nyaan.jp" by Punycode. This issue will be closed soon.

hatukanezumi commented 8 years ago

RFC 6532 extends RFC 2045 to use raw UTF-8 for address fields in message header (Punycode should not be used anyway).

Additionally, RFC 6533 defines new "utf-8-addr-xtext" and "utf-8-addr-unitext" encodings to use UTF-8 addresses in delivery reports.

azumakuniyuki commented 8 years ago

Thanks for the comment. I did not follow these RFCs. A short while ago, I have added 3 emails which are "Cat" in the local part of the recipient address.