rjbs / Email-MIME

perl library for parsing MIME messages
20 stars 30 forks source link

regression, data may be lost in address-list headers #72

Closed dakkar closed 4 years ago

dakkar commented 4 years ago
my $m=Email::MIME->create(header_str=>[To=>"foo"],body=>"bar");
print $m->as_string;

with Email::MIME 1.940 prints:

To: foo
Date: Tue, 18 Aug 2020 10:52:39 +0000
MIME-Version: 1.0

bar

but with 1.949 prints:

Argument contains empty address at /data0/www/adcourier.broadbean/.perlbrew/libs/perl-5.18.4@delayed-email/lib/perl5/Email/MIME/Encode.pm line 71.
To: 
Date: Tue, 18 Aug 2020 10:53:25 +0000
MIME-Version: 1.0

bar

Email::MIME::Header::AddressList removes malformed elements; I don't think this is the expected behaviour, but at least it should be clearly documented.

pali commented 4 years ago

There is already documentation that header_str takes UNICODE string and do MIME-encoding for specified header. You have provided malformed/invalid UNICODE string for To header and the only thing which MIME-encoder could do is to remove invalid elements. Other option would be to call Perl's die.

If you want to pass pre-formatted 7bit string, therefore not UNICODE string, but rather already correctly MIME-encoded, then you should use header_raw. And not header_str.

Some kind of documentation for both header_str and header_raw already exists in Email::MIME but seems it could be improved. Hope it helps.

dakkar commented 4 years ago

I'm not sure I understand your comment. the string "foo" is a valid character string. It just happens to be a malformed email address.

A different test:

use v5.20;
use strict;
use warnings;
use Email::MIME;

my $e = Email::MIME->new(<<'EOF');
From: foo
To: bar
Subject: test

boom
EOF

say "parsed:\n",$e->as_string;

$e = Email::MIME->create(
    header_raw => [
        From => 'foo',
        To => 'bar',
        Subject => 'test',
    ],
    body => "boom\n",
);

say "\ncreate, raw:\n",$e->as_string;

$e = Email::MIME->create(
    header_str => [
        From => 'foo',
        To => 'bar',
        Subject => 'test',
    ],
    body => "boom\n",
);

say "\ncreate, str:\n",$e->as_string;

this prints:

parsed:
From: foo
To: bar
Subject: test

boom

create, raw:
Date: Sun, 23 Aug 2020 12:23:30 +0100
MIME-Version: 1.0

boom

Argument contains empty address at /home/dakkar/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0/Email/MIME/Encode.pm line 71.
Argument contains empty address at /home/dakkar/perl5/perlbrew/perls/perl-5.24.0/lib/site_perl/5.24.0/Email/MIME/Encode.pm line 71.

create, str:
From: 
To: 
Subject: test
Date: Sun, 23 Aug 2020 12:23:30 +0100
MIME-Version: 1.0

boom

So, independently of header_raw or header_str, malformed address lists are lost; also, sometimes the entire header is lost. But not when the email object is build by parsing a string.

pali commented 4 years ago

I'm not sure I understand your comment. the string "foo" is a valid character string. It just happens to be a malformed email address.

I mean that it is not valid UNICODE content for To header. As you wrote it is malformed.

So, independently of header_raw or header_str, malformed address lists are lost; also, sometimes the entire header is lost. But not when the email object is build by parsing a string.

Well, now I see that API is more complicated. header_raw is method on object. But parameter for ->create method is called just header, not header_raw. So try following code if you want to pass preformatted 7bit headers as is:

my $e = Email::MIME->create(
header => [
        From => 'foo',
        To => 'bar',
        Subject => 'test',
    ],
    body => "boom\n",
);

say "\ncreate, raw:\n",$e->as_string;
dakkar commented 4 years ago

Ok, I was using the wrong attribute. Fine.

But my headers are correct unicode/character strings. They contain malformed addresses, but they are valid strings. They can be encoded (as it's clear from my examples: the MIME encoding of To: foo is To: foo, it's 7-bit clean, there's nothing special to do)

My problem is that Email::MIME::Header::AddressList does not round-trip, and loses data:

$ perl -MEmail::MIME::Header::AddressList -E 'say Email::MIME::Header::AddressList->from_string(shift)->as_string' 'foo, foo@bar'
Element at index 0 contains empty host portion of address at -e line 1.
foo@bar

I don't have the code with me right now, it's work code, I'll check tomorrow if I can work around that problem.

pali commented 4 years ago

But my headers are correct unicode/character strings. They contain malformed addresses, but they are valid strings. They can be encoded (as it's clear from my examples: the MIME encoding of To: foo is To: foo, it's 7-bit clean, there's nothing special to do)

Intension of header_str is to provide UNICODE string on input and let module to do encoding/decoding.

From, To, Cc and other address headers needs special handling of content and doing special MIME encoding. It is little bit complicated how it works.

First content of these headers needs to parsed/split into tokens according to RFC2822 grammar, then tokens individually encoded by RFC2047 (+separate encoding for IDN domains) and after that encoded tokens are combined back to one string via rules defined in RFC2822 grammar.

And your input To: foo cannot be encoded to MIME according to those RFC documents.

Note that even 7-bit clean ASCII input does not have to be 1:1 mapping after applying above MIME encoding (there are e.g. 7-bit characters which needs to be escaped...).

My problem is that Email::MIME::Header::AddressList does not round-trip, and loses data:

Yes, because this module is RFC compliant (or rather intension is to be) and for invalid/malformed data it throws warning and tries to parse as much as possible.

Email::MIME::Header::AddressList provides object representation of your input. Via object API you can access to also invalid/malformed members (as it can be useful), but ->as_string on invalid/malfromed objects throws warning as module does not how to format malformed object to RFC compabitle format (because it is not possible).

All this is to ensure that email generators (like Email::MIME) would output only RFC compliant emails as which is requirement by RFC2822 for such email software/library.

pali commented 4 years ago

Simple code how to access all attributes of parsed email addresses on input (including those invalid/malformed which was able to parse at least partially):

$ perl -MEmail::MIME::Header::AddressList -E 'say "is_valid=" . ($_->is_valid ? 1 : 0) . " user=" . ($_->user // "(undefined)") . " host=" . ($_->host // "(undefined)") . " phrase=" . ($_->phrase // "(undefined)") . " comment=" . ($_->comment // "(undefined)") foreach Email::MIME::Header::AddressList->from_string(shift)->addresses()' 'foo, foo@bar'
is_valid=0 user=foo host=(undefined) phrase=(undefined) comment=(undefined)
is_valid=1 user=foo host=bar phrase=(undefined) comment=(undefined)

Method ->as_string works only with valid objects and for invalid it returns nothing (undef in scalar context) and throws warning.

dakkar commented 4 years ago

ok, using the objects directly does what I need, thank you for that hint! I'm closing this, but please consider making it clear in the documentation that Email::MIME::Header::AddressList->from_string($str)->to_string does not round-trip