Feature: Add a '-recurse' flag to 'mox import maildir'

andreasheil commented 5 months ago

As it is common that mail accounts have lots of subfolders it would be nice to have mox recurse into subdirs when importing.

As an example (dovecot): If I want to use mox import on this:

/vmail/example.com/accountOne/
/vmail/example.com/accountOne/.Inbox.Folder 1/
/vmail/example.com/accountOne/.Inbox.Folder 2/
/vmail/example.com/accountOne/.Sent/
/vmail/example.com/accountOne/.Junk/
/vmail/example.com/accountOne/.Trash/

I have to import every single folder:

./mox import maildir accountOne Inbox /vmail/example.com/accountOne
./mox import maildir accountOne 'Inbox/Folder 1' '/vmail/example.com/accountOne/.Inbox.Folder 1'
./mox import maildir accountOne 'Inbox/Folder 2' '/vmail/example.com/accountOne/.Inbox.Folder 2'
./mox import maildir accountOne Sent /vmail/example.com/accountOne/.Sent
./mox import maildir accountOne Junk /vmail/example.com/accountOne/.Junk
./mox import maildir accountOne Trash /vmail/example.com/accountOne/.Trash

Not mentioning translated named for some folders like accountOne/.Entw&APw-rfe (German: Entwürfe) for Draft.

mjl- commented 3 months ago

hi @andreasheil, it's been a while, but this seems like a useful feature, thanks for reporting.

it seems the examples you gave are in "maildir++" format. with the leading dot, and further dot-separating (instead of slash-separating) folder names. https://en.wikipedia.org/wiki/Maildir mentions it as maildir++, and https://www.courier-mta.org/maildir.html mentions the dot-details. an alternative seems to be submaildirs under the main maildir, as described in https://doc.dovecot.org/configuration_manual/mail_location/Maildir/.

it seems safe to automatically import submaildirs following either scheme: as long as they have cur,new,tmp subdirs, there shouldn't be any confusion about whether a directory is a maildir.

the imaputf7 parsing could be enabled with a cli flag.

the maildir import code currently expects a single directory to treat as maildir. we can import maildir/mbox from zip & tar files. tar files are streaming, so we may have to relax the "needs all of new/cur/tmp directories" rule into "needs just cur or new", importing any message file those subdirs when we encounter them. because the code expects a single directory, we'll need to do some refactoring. the web-based zip/tar import code can already import from multiple mbox files, so it's not entirely new.

the cli import command could also benefit from an "import mbox" variant that imports messages from all ".mbox" files in a directory into mailboxes. but that could be out of scope.

i don't have a lot of time available at the moment, so it will take a while. if you want to give implementing it a try, i can give some pointers.

andreasheil commented 3 months ago

Ok, sounds good. I would like to implement that. Some pointers would be nice.

mjl- commented 2 months ago

Excellent. I looked at the code just now. We have import.go (at the root) that implements the cli import commands. And we have webaccount/import.go, handling the import through the webaccount interface. Then there is store/import{,_test}.go implementing iterators over a single mbox or single maildir.

It turns out webaccount/import.go, which imports from zip/tar/tgz, can already handle multiple mbox and multiple maildirs. That code could be used as an example. It doesn't decode imaputf7 from mailbox names. It could be useful to add a checkbox "Decode UTF-7 in mailbox names" to the webaccount import frontend, and pass it to webaccount/import.go so it can decode names. A checkbox around https://github.com/mjl-/mox/blob/3bbd7c7d9b9b25c10ecab3ea23e32e1e7f922b86/webaccount/account.ts#L1240 and parsing imaputf7 at the beginning of importFile around https://github.com/mjl-/mox/blob/3bbd7c7d9b9b25c10ecab3ea23e32e1e7f922b86/webaccount/import.go#L688. We already have an utf7 decoder at https://github.com/mjl-/mox/blob/3bbd7c7d9b9b25c10ecab3ea23e32e1e7f922b86/imapserver/utf7.go#L28, just needs to be exported.

The cli import commands would need most work, "mox import mbox" and "mox import maildir". There are also unlisted variants "mox ximport mbox" and "mox ximport maildir": They operate without a running mox, but really just set up the process so it looks like a mox is running.

All cli-based importing starts at import.go's importctl(), https://github.com/mjl-/mox/blob/3bbd7c7d9b9b25c10ecab3ea23e32e1e7f922b86/import.go#L160. Above that code are function cmd that are the cli command entrypoints, parsing flags. They call ctlcmd functions that write to the mox server ctl unix domain socket, served by ctl.go, which ends up calling our importctl().

It seems to me that importing multiple mbox/maildirs in a directory is strictly better than importing a single mbox/maildir, so we don't need to keep the old behaviour. We can just modify importctl() to take an account and directory, no longer a mailbox name, but introduce a flag for decoding imaputf7, and just import anything that looks like an .mbox file or messages from a cur/new maildir directory. That means we can reduce to a single command "mox import" instead of the two variants for mbox/maildir.

The cli import code will work more like webaccount/import.go. It makes sense to start looking at that code first, and copy it, or refactor it for reuse. the webaccount import code currently processes mbox/maildirmessages from a zip or tar. Those zip or tar files could be abstracted into a Go interface that keeps returning next files to process. Then we just need an implementation that iterates over the directory we are importing, shouldn't be too hard. I think store.NewMaildirReader would become unused, but it will still be useful to keep around.

The webaccount import code sends updates over SSE (counts for a mailbox), while the cli command just writes textual updates, that will need some changes. The SSE-event-sending code is at the start of importMessages, at https://github.com/mjl-/mox/blob/3bbd7c7d9b9b25c10ecab3ea23e32e1e7f922b86/webaccount/import.go#L291. The cli variant could write lines like it does now (https://github.com/mjl-/mox/blob/3bbd7c7d9b9b25c10ecab3ea23e32e1e7f922b86/import.go#L145), or it's probably better to use a ctlwriter to stream formatted output to the command (https://github.com/mjl-/mox/blob/3bbd7c7d9b9b25c10ecab3ea23e32e1e7f922b86/ctl.go#L144 and https://github.com/mjl-/mox/blob/3bbd7c7d9b9b25c10ecab3ea23e32e1e7f922b86/ctl.go#L128). The formatted lines could print every 1000 messages, or when it encounters a new mailbox, again similar/same to the webaccount import code.

Hope this helps. I can explain more about the mechanisms of the ctl socket if that helps. Can also do interactive chat/call session to clarify.

andreasheil commented 2 months ago

Thanks @mjl- I'll start right away with this.

mjl- / mox

Feature: Add a '-recurse' flag to 'mox import maildir' #134