joeyates / imap-backup

Backup and Migrate IMAP Email Accounts
MIT License
1.33k stars 74 forks source link

Replacing ">From" as email separator with RFC 4155 standard? #109

Closed tinyapps closed 2 years ago

tinyapps commented 2 years ago

Thanks so much for crafting imap-backup and especially for the recent iCloud fix.

When saving messages, the RFC 4155 standard mentioned by the Postsack developer may be worth considering as the message delineator rather than ">From" (which can cause the sender to appear as blank when importing, as described in Issue #103, "No sender included in emails."):

Each message in the mbox database MUST be immediately preceded by a single separator line, which MUST conform to the following syntax:

  • The exact character sequence of "From";

  • a single Space character (0x20);

  • the email address of the message sender (as obtained from the message envelope or other authoritative source), conformant with the "addr-spec" syntax from RFC 2822;

  • a single Space character;

  • a timestamp indicating the UTC date and time when the message was originally received, conformant with the syntax of the traditional UNIX 'ctime' output sans timezone (note that the use of UTC precludes the need for a timezone indicator);

  • an end-of-line marker.

joeyates commented 2 years ago

Hi @tinyapps

Thanks for your kind words.

I think there may be some confusion as to what imap-backup is writing to disk. The intention is to follow the 'mboxrd' standard.

The best source for the mboxrd standard seems to be this page: https://web.archive.org/web/20201231011750/https://jdebp.eu/FGA/mail-mbox-formats.html

What follows is a description of the process and its motivation.

From_ line

As you state, each message is preceded by a 'separator line', in the format you indicate.

Other lines in the Original Message Starting with 'From '

Prepending 'From ' to messages, though, is problematic as one needs to decide how to treat other lines in the message which start with 'From '.

If these lines were written to the mbox file as received, they could be interpreted as separator lines. This would mean that it would not be possible to split the mbox file back into its component messages. The process of saving messages to disk would be 'irreversible'.

The 'mboxo' format half solves this problem by prepending a '>' to any existing lines in the message which start with 'From '. Unfortunately, this system is not reversible, as it can no longer distinguish between a line '>From ' which existed in the original message from lines modified by the 'mboxo' process.

mboxrd solves the reversibility problem by prepending an extra '>' to lines that already begin with one or more '>' characters.

Thanks to this additional action, the mboxrd serialization process is reversible. To obtain the original message, one does 2 things:

  1. Remove the 'From_ line' at the start of the message,
  2. Remove 1 '>' character from the start of any remaining line that starts with 1 or more '>' characters plus 'From '.

TL;DR

When writing to disk a compatible implementation of mboxrd does 3 things:

  1. adds a From_line at the start of the message,
  2. modifies any existing line that starts with 'From ', prepending a '>',
  3. modifies any line matching /^>+ From/, prepending another '>'.

This is what imap-backup does, so I believe it is behaving correctly.

tinyapps commented 2 years ago

Thanks so much for your patient response, @joeyates.

One of the main drawbacks of mboxrd/delineating messages with ">From" is that the sender will not appear in at least some mail clients after importing from an mbox generated by imap-backup; in Thunderbird, such messages do not display the "From" field when opened, while in Mail.app, the From field displays "No Sender".

Removing the leading angle bracket before importing (e.g., via sed) resolves the issue, but there may be edge cases where a message body has a line starting with "From:", causing the importer to mark it as the start of a new message. Such cases might be avoided by separating messages with the RFC 4155 standard instead.

joeyates commented 2 years ago

Hi @tinyapps

I've changed the Thunderbird export in line with your suggestions.

While imap-backup's own format remains the same (mboxrd), during export to Thunderbird, '>' prefixes are removed from header and body lines starting 'From '.

Now, Thunderbird should correctly recognize the 'From' header.

I've released it as version 4.2.0.

tinyapps commented 2 years ago

Thanks so much for the fast update, @joeyates. Just tested the Thunderbird export feature in 4.2.0 (importing into Thunderbird 91.4.1) and senders appeared correctly. Do you accept by donations, and if so, how best to send?

joeyates commented 2 years ago

Good to hear that's sorted out.

I've been thinking about donations, and now that DHH has come out against them, I think I'll definitely sort something out to start taking them :)