uchicago-library / attachment-converter

Attachment Converter: tool for batch converting attachments in an email mailbox
GNU General Public License v2.0
8 stars 3 forks source link

Revisit Mr. Mime as an email parsing backend #47

Closed bufordrat closed 1 year ago

bufordrat commented 2 years ago

Revisit Mr. Mime as an email parsing backend

When Owen Price Skelly began work on this project, the library we were using for an email parsing backend was Mr. Mime. In the fall of 2021, we switched to using OCamlnet as our email parsing backend, mainly due to challenges getting everything up and running with Mr. Mime.

The main challenge was that Mr. Mime, astonishingly, does not come with a function to serialize parsetrees back into email strings. We learned the reason for that after conversing with the author of the library, and it was interesting. The reason is that Mr. Mime mainly exists in order to help developers write mail user agents in OCaml, in a way that is compatible with the Mirage unikernel ecosystem. But a mail user agent just needs to parse emails; once it has the data it needs, it works with that directly. There's no need to serialize a parsetree back into a well-formed email. So coincidentally, our use case never really came up.

Getting back to Attachment Converter, we would ideally like it to have multiple email parsing backends. There are a few reasons for this. One is that that should give us the flexibility to try parsing each email twice. Since the email specification is so complex, it is inevitable that any two email parsing libraries will differ at least slightly in what emails they consider to be syntactically well-formed. If Attachment Converter could try parsing with one backend and then with another in the event that that fails, it would be able to convert a larger range of attachments than otherwise. Who knows what we will come across in the wild? It's probably best to be as prepared as we can.

Starter Code Part 1: Practicum Code

In the original Winter 2020 practicum version, the project did not yet have the structure it currently has. However, for your reference, I have put the original Mr. Mime code into a new branch called [mrmime-starter-code](https://github.com/uchicago-library/attachment-converter/tree/mrmime-starter-code). mrmime_starter_code.ml contains the source for the email parsing code using Mr. Mime and mrmime_todos.org contains notes on how far the Mr. Mime version of the email parsing backend progressed.

Here's a quick summary of what those notes say the starter code can do:

The remaining core functionality would involve:

In other words, it would involve implementing a module inhabiting the CONVERT signature for Mr. Mime. Maybe call it Conversion_mrmime, to follow our prior naming scheme?

Starter Code Part 2: From The Author Himself

Romain Calascibetta was gravious enough to provide us with example code that creates a new email from scratch. In principle, we should be able to use this to build the original back email up again from the converted attachment.

Here is a link to that code: https://github.com/mirage/mrmime/blob/master/examples/attachment.ml