purebred-mua / purebred-email

A fast email parsing library implemented in Haskell
https://hackage.haskell.org/package/purebred-email
GNU Affero General Public License v3.0
23 stars 4 forks source link

FileParseError "<path to email>" "string" #23

Closed romanofski closed 5 years ago

romanofski commented 5 years ago

Opening some of my older emails in purebred, I run into the error above. I have not investigated what exactly is causing it. Mail in question is unfortunately personal so I can't just post it here, but happy to forward :)

romanofski commented 5 years ago

Ah I think noteworthy is the following I found opening up the mail itself:

X-Amavis-Alert: BAD HEADER SECTION, Duplicate header field: "MIME-Version"

and structure looks like this:

  I     1 <no description>                                                                               [multipa/related, 7bit, 175K] 
  I     2 ├─><no description>                                                                     [text/html, base64, iso-8859-1, 25K] 
  I     3 ├─>1-659919855.jpg                                                                                 [image/jpeg, base64, 25K] 
  I     4 ├─>446644101.jpg                                                                                   [image/jpeg, base64, 15K] 
  I     5 ├─>1-30643500.gif                                                                                    [image/gif, base64, 62] 
  I     6 ├─>1-1069223518.PNG                                                                                 [image/png, base64, 74K] 
  I     7 └─>187552305.jpg                                                                                   [image/jpeg, base64, 33K] 
  I     8 <no description>                                                                            [multipa/alternativ, 7bit, 2.8K] 
  I     9 ├─><no description>                                                                        [text/plain, quoted, utf-8, 1.2K] 
  I    10 └─><no description>                                                                         [text/html, quoted, utf-8, 1.3K]
frasertweedale commented 5 years ago

Could you "sanitise" the mail but still cause reproduction?

romanofski commented 5 years ago

@frasertweedale Will try and report back

romanofski commented 5 years ago

Yeah it's definitely not the double MIME-Version header.

romanofski commented 5 years ago

I've truncated the binary bits of the mail and obfuscated some of the "sensitive items". This should reproduce the issue: reproducerParserFailureString.txt

frasertweedale commented 5 years ago

Thanks @romanofski. This is failing at:

Fail "\nContent-Type: multipart/alternative; boundary=\"===============507826...

Which is just after the end of the final multipart/related part, i.e.

------=_Part_25370_322769725.1493831323929--

--===============8936515836623095288==
Content-Type: multipart/alternative; boundary="===============5078260389750...

I note that there are two blank lines after the final boundary for the preceding body part. I need to check if that is legal, but I think it is, in which case this is indeed a bug in the parser.

frasertweedale commented 5 years ago

Hmm, removing the extra newline does not make a difference. More investigation needed...

frasertweedale commented 5 years ago

Perhaps it is related to multipart-within-multipart scenario...

frasertweedale commented 5 years ago

Minimal reproducer:

Content-Type: multipart/mixed; boundary="===============8936515836623095288=="
MIME-Version: 1.0 
Date: Thu, 4 May 2017 03:08:43 +1000 (EST)
From: foo@bar
To: my@email.test
Subject: Pretty subject

--===============8936515836623095288==
Content-Type: multipart/alternative; boundary="===============5078260389750838226=="

--===============5078260389750838226==
Content-Type: text/plain

Bla bla bla 
--===============5078260389750838226==--
--===============8936515836623095288==--
frasertweedale commented 5 years ago

Putting a gap between the two close delimiters makes correct parse, i.e.:

Content-Type: multipart/mixed; boundary=boundary1
MIME-Version: 1.0
Date: Thu, 4 May 2017 03:08:43 +1000 (EST)
From: foo@bar
To: my@email.test
Subject: Pretty subject

--boundary1
Content-Type: multipart/alternative; boundary=boundary2

--boundary2
Content-Type: text/plain

Bla bla bla
--boundary2--

--boundary1--

Investigating whether the two close delimiters separated by single newline is legal or not...