rjbs / Email-Simple

the Email-Simple perl distribution
9 stars 9 forks source link

Email::MIME v1.926 doesn't decode soft line breaks in quoted-printable content #13

Closed adeconsulting closed 7 years ago

adeconsulting commented 8 years ago

As a very simple example, a raw message string which includes the following headers, is used to create an Email::MIME object:

MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

I then use the body_str() method to retrieve the decoded content:

my $parsed = Email::MIME->new($msg_raw); my $content = $parsed->body_str();

Everything seems to be decoded correctly, e.g. "=3D" and "=20" are replaced with "=" and " " respectively, however, the soft line break characters "= " remain.

The sample email is attached.

Thanks for any assistance with this issue!

_SampleEmail_QuotedPrintable_notDecodingCompletely.txt

rjbs commented 7 years ago

I don't think this is a bug. I'm looking at the sample data. Here's a single line from the file:

The undersigned has a good faith belief that use of FOX's property in the m= anner described herein is not authorized by FOX, its agents or the law. Al= so, we hereby state that the information in this notification is accurate a= nd, under penalty of perjury under the laws of the State of California and = the United States, that the undersigned is authorized to act on behalf of F= OX with respect to this matter.=20

The = characters are not "soft line breaks" because they do not appear at EOL. See RFC 2045 §6.7 for the definition of MIME's quoted-printable. There are a number of relevant passages, including:

(2)   An "=" followed by a character that is neither a
    hexadecimal digit (including "abcdef") nor the CR
    character of a CRLF pair is illegal.  This case can be
    the result of US-ASCII text having been included in a
    quoted-printable part of a message without itself
    having been subjected to quoted-printable encoding.  A
    reasonable approach by a robust implementation might be
    to include the "=" character and the following
    character in the decoded data without any
    transformation and, if possible, indicate to the user
    that proper decoding was not possible at this point in
    the data.