stalwartlabs / mail-parser

Fast and robust e-mail parsing library for Rust
https://docs.rs/mail-parser/
Apache License 2.0
289 stars 39 forks source link

RFC8621 nonconformance #67

Open sftse opened 9 months ago

sftse commented 9 months ago

When parsing the example legacy/034.eml the returned parts are

html_body: [3, 4, 5], text_body: [2, 4, 5], attachments: [4, 5]

This seems at odds with RFC8621

o attachments: "EmailBodyPart[]" (immutable)

  A list, traversing depth-first, of all parts in "bodyStructure"
  that satisfy either of the following conditions:

  *  not of type "multipart/*" and not included in "textBody" or
     "htmlBody"

  *  of type "image/*", "audio/*", or "video/*" and not in both
     "textBody" and "htmlBody"

  None of these parts include subParts, including "message/*" types.
  Attached messages may be fetched using the "Email/parse" method
  and the "blobId".

  Note that a "text/html" body part [[HTML](https://datatracker.ietf.org/doc/html/rfc8621#ref-HTML)] may reference image parts
  in attachments by using "cid:" links to reference the Content-Id,
  as defined in [[RFC2392](https://datatracker.ietf.org/doc/html/rfc2392)], or by referencing the Content-Location.

Attachments 4, 5 are image/png and fit neither of the criteria listed.

sftse commented 9 months ago

I'm not sure why the RFC includes the second criterion though. Prior to that, the criteria for textBody and htmlBody are

o textBody: "EmailBodyPart[]" (immutable)

  A list of "text/plain", "text/html", "image/*", "audio/*", and/or
  "video/*" parts to display (sequentially) as the message body,
  with a preference for "text/plain" when alternative versions are
  available.

o htmlBody: "EmailBodyPart[]" (immutable)

  A list of "text/plain", "text/html", "image/*", "audio/*", and/or
  "video/*" parts to display (sequentially) as the message body,
  with a preference for "text/html" when alternative versions are
  available.

This seems to suggest image/*, audio/* and video/* are always in both textBody and htmlBody, so the second condition for something being an attachment would always be false.