purebred-mua / purebred-email

A fast email parsing library implemented in Haskell
https://hackage.haskell.org/package/purebred-email
GNU Affero General Public License v3.0
23 stars 4 forks source link

FileParseError "<path to mail>" "Failed reading: satisfy" #24

Closed romanofski closed 2 years ago

romanofski commented 5 years ago

I ran into another FileParseError when opening up an older mail (April 2016). Have not investigated further. Structure looks like this:


 I     1 <no description>                                                                        [text/html, base64, iso-8859-1, 31K] 
  I     2 1-659919855.jpg                                                                                    [image/jpeg, base64, 89K] 
  I     3 476654895.jpg                                                                                      [image/jpeg, base64, 29K] 
  I     4 1-30643500.gif                                                                                       [image/gif, base64, 62] 
  I     5 1-1021401154.gif                                                                                   [image/gif, base64, 2.4K] 
  I     6 1127529074.gif                                                                                      [image/gif, base64, 12K] 
  A     7  welcome pack - 21-                                                                           [applica/pdf, base64, 6.2M]
romanofski commented 5 years ago

I think this one seems to be caused by a malformed From header:

From foo@bar.test  Sun Apr 24 20:20:44 2016                                                          
Return-Path: <foo@bar.test>

There is actually a another, correctly formatted From header.

frasertweedale commented 5 years ago

So... close this not-a-bug then?

romanofski commented 5 years ago

Hm... well mutt seems to ignore the junk and parses it just fine (maybe, it displays the mail at least). Would a solution be to ignore random bytestrings which are not correctly formatted headers or is this a minefield?

frasertweedale commented 5 years ago

It's malformed... there is no end in sight if we try to handle every kind of malformed email we encounter.

frasertweedale commented 5 years ago

Minimal reproducer:

Content-Type: multipart/mixed; boundary="===============8936515836623095288=="
MIME-Version: 1.0 
Date: Thu, 4 May 2017 03:08:43 +1000 (EST)
From: foo@bar
To: my@email.test
Subject: Pretty subject

--===============8936515836623095288==
Content-Type: multipart/alternative; boundary="===============5078260389750838226=="

--===============5078260389750838226==
Content-Type: text/plain

Bla bla bla 
--===============5078260389750838226==--
--===============8936515836623095288==--
frasertweedale commented 5 years ago

I've got to leave it there for today. Getting closer though :)

romanofski commented 5 years ago

Fair enough if you want to throw it out. Btw. I think you mistakenly commented on the wrong bug :)

frasertweedale commented 5 years ago

Yes I did. Whups :) Closing this now.

tomjaguarpaw commented 2 years ago

I have just hit this. For the record, it seems the the From header reported by @romanofski is part of the mbox format. Strangely if, in mutt, I issue <pipe>cat > /tmp/out<enter> then /tmp/out ends up containing this mbox From header. I wonder if that's a bug in mutt.

frasertweedale commented 2 years ago

@tomjaguarpaw can you please attach a complete mail? Maybe we can add some (optional?) preprocessing to handle messages stored in mbox format.

tomjaguarpaw commented 2 years ago

Sure, here's what one looks like. It would be great if, instead of, parse (message mime) one could write, for example, parse (mbox (message mime)).

From redacted@redacted Tue Nov  9 08:00:02 2021
Received: from tom (uid 1000)
        (envelope-from redacted@redacted)
        id 807fa
        by cloudinit-builder (DragonFly Mail Agent v0.11);
        Tue, 09 Nov 2021 08:00:02 +0000
Date: Tue, 9 Nov 2021 08:00:02 +0000
From: Redacted <redacted@redacted>
To: Tom <tom@cloudinit-builder>
Subject: Test
Message-ID: <20211109080002.GC18419@cloudinit-builder>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.10.1 (2018-07-13)

Example
frasertweedale commented 2 years ago

I have just hit this. For the record, it seems the the From header reported by @romanofski is part of the mbox format. Strangely if, in mutt, I issue <pipe>cat > /tmp/out<enter> then /tmp/out ends up containing this mbox From header. I wonder if that's a bug in mutt.

FWIW, the pipe feature in my mutt package (mutt-2.1.3 from official FreeBSD repos) does not include the mbox preamble when piping the message. When saving the message to local file, it does include the mbox preamble.

Thinking it through, I think this is best handled by a separate library that understands mbox format and can extract the message(s) from it. Let me see how much work it would be to implement the mbox format.

frasertweedale commented 2 years ago

Relevant docs / specs:

tomjaguarpaw commented 2 years ago

I'm using Mutt 1.10.1 from Debian 10. Perhaps the time has come to upgrade to 11.

frasertweedale commented 2 years ago

@tomjaguarpaw after some consideration, I'm not really keen to implement this myself. It does not seem very complicated, but I view it as a distraction from our core work. If you want to tackle it, I would be happy to review and adopt it under the purebred project umbrella, and (co)maintain it.

tomjaguarpaw commented 2 years ago

Fair enough. It's easy enough for me to work around for now.

frasertweedale commented 2 years ago

OK, I'm going to close it again. Thanks for your contributions, @tomjaguarpaw.