uchicago-library / attachment-converter

Attachment Converter: tool for batch converting attachments in an email mailbox
GNU General Public License v2.0
8 stars 3 forks source link

Preliminary MBOX-level conversion #22

Closed bufordrat closed 2 years ago

bufordrat commented 2 years ago

Preliminary MBOX-level conversion

Thus far, we have been developing Attachment Converter to work on a per-email basis. In this issue, we'll take our first baby steps towards making the application do what it is actually going to do, namely: provide a map from an input MBOX to an output MBOX.

The final version will most likely have certain optimizations in it. For example, it might stream the emails in from the input MBOX one at a time, so that each conversion process won't have more than one email loaded into memory at a time, and it might also perform as many conversions as it can in parallel. (Conceptually that's easy, given that no conversion will depend on the output on any other conversion, but we all know that parallelism can be harder than it looks to in fact pull off safely. So I guess I'll say I'm catiously optimistic!)

However, this preliminary version doesn't yet need to be optimized in either of those ways. It can load the entire input MBOX into memory, and it can perform every conversion it's going to perform sequentially. The focus for this go-round should just be on making the logic correct:

Once we have working code to promote the per-email functionality of the application to the MBOX level, we can then tackle the fun task of making it as efficient as we can. The code can go either in lib/lib.ml or in a separate file called lib/mbox.ml. If it goes in lib/lib.ml, it should probably in its own module. (The assignee could call the module Mbox, or something to that effect.)

One final note: don't forget about the to_mbox function, introduced into the code as part of #11. It ought to be useful for getting this code up and running.