zbateson / mail-mime-parser

An email parser written in PHP
https://mail-mime-parser.org/
BSD 2-Clause "Simplified" License
456 stars 57 forks source link

Plain text part of a DSN is treated as attachment #137

Open ThomasLandauer opened 4 years ago

ThomasLandauer commented 4 years ago

I have this:

MIME-Version: 1.0
From: <postmaster@example.com>
Date: Wed, 27 Dec 2017 12:33:40 +0100
Content-Type: multipart/report; report-type=delivery-status;
    boundary="__TOP_LEVEL_BOUNDARY__"

--__TOP_LEVEL_BOUNDARY__
Content-Type: multipart/alternative; differences=Content-Type;
    boundary="__DSN_FIRST_PART_BOUNDARY__"

--__DSN_FIRST_PART_BOUNDARY__
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Delivery Error

--__DSN_FIRST_PART_BOUNDARY__
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html>
<body>
<p>Delivery Error</p>
</body>
</html>=

--__DSN_FIRST_PART_BOUNDARY__--

--__TOP_LEVEL_BOUNDARY__
Content-Type: message/delivery-status

Reporting-MTA: dns;mail.example.com
Received-From-MTA: dns;mail.example.com
Arrival-Date: Wed, 27 Dec 2017 11:33:40 +0000

Original-Recipient: rfc822;office@example.com
Final-Recipient: rfc822;office@example.com
Action: failed
Status: 5.1.10
Diagnostic-Code: smtp;550 5.1.10 RESOLVER.ADR.RecipientNotFound; Recipient not found by SMTP address lookup

--__TOP_LEVEL_BOUNDARY__
Content-Type: message/rfc822

Date: Wed, 27 Dec 2017 12:33:39 +0100
From: <somebody@example.com>
MIME-Version: 1.0
Content-Type: multipart/mixed;
    boundary="__ORIGINAL_MESSAGE_BOUNDARY__"

--__ORIGINAL_MESSAGE_BOUNDARY__
Content-Type: multipart/alternative;
    boundary="_=_swift_1514374419_1ce670f0230b7cd79cecbfa5c7688d4a_=_"

--_=_swift_1514374419_1ce670f0230b7cd79cecbfa5c7688d4a_=_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

Hello folks!

--_=_swift_1514374419_1ce670f0230b7cd79cecbfa5c7688d4a_=_
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<html>
<body>
<p>Hello folks!</p>
</body>
</html>

--_=_swift_1514374419_1ce670f0230b7cd79cecbfa5c7688d4a_=_--

--__ORIGINAL_MESSAGE_BOUNDARY__
Content-Type: application/pdf; name="foobar.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="foobar.pdf"

ABCD...

--__ORIGINAL_MESSAGE_BOUNDARY__--

--__TOP_LEVEL_BOUNDARY__--

It looks like a normal Delivery Status Notification (DSN, see RFC 3461) to me (I didn't double-check if everything's valid, though).

My question is: How many attachments are there?

The third part of the DSN is the original message, which contained an attached PDF. The rest (i.e. the first two parts of the DSN) is just text.

Thunderbird gives 2 attachments:

  1. The entire original message. This needs to be an attachment IMO, to allow users to save it as .eml (and extract the PDF from there).
  2. The PDF. Though this it technically not an attachment of the main message (but rather an attachment of the first attachment), presenting it in this way does make some sense to me, to allow easier access to it.

Your library gives 2 other attachments:

php-mime-mail-parser gives 3 attachments:

At https://github.com/zbateson/mail-mime-parser/issues/87#issuecomment-474031958 you said:

If an email specifically tells me a part is an attachment, I consider it an attachment. Otherwise it's inline and should be part of the message.

But as far as I see it, the message/delivery-status part doesn't tell you it's an attachment ;-)

So what's your reasoning?

zbateson commented 4 years ago

Aah, yeah, I think my comment was wrong but the code would have to be consulted... definitely missing some documentation there.

I believe my comment in #87 only applies for text/plain text/html, and anything that's not text/plain text/html is an attachment. I'll look it up though and update docblocks/put something on the main README and/or webpage.