seanbreckenridge / HPI

Human Programming Interface - a way to unify, access and interact with all of my personal data [my modules]
https://beepb00p.xyz/hpi.html
MIT License
69 stars 6 forks source link

add my.mail.mbox module: to parse mbox files #30

Closed seanbreckenridge closed 2 years ago

seanbreckenridge commented 2 years ago

See discussion in #15

low priority but would be nice to add

The addon mentioned for Thunderbird (importexporttols) now supports periodic Backups but only in MBOX format.

I've seen this solution to parse MBOX files: https://github.com/chronicle-app/chronicle-email/blob/master/lib/chronicle/email/mbox_extractor.rb

@krillin666 feel free to re/post-describe the process here -- I'll probably have to do it once myself to test the format and probably want to create a doc of some kind for handling email files -- its not obvious how to set it up right now

seanbreckenridge commented 2 years ago

Currently you can use my.mail.imap with thunderbird by configuring it using the 'Plain Text Format', and running that on a schedule

https://addons.thunderbird.net/en-US/thunderbird/addon/importexporttools-ng/

I use it with https://github.com/LukeSmithxyz/mutt-wizard/, which just parses my local mbsync/neomutt files -- those update every 5 minutes

krillin666 commented 2 years ago

Thank you for opening the issue. The problem with that addon on Thunderbird is that scheduled automatic backups only output MBOX files and not plain text which my.mail.imap can parse

seanbreckenridge commented 2 years ago

Seems that theres module thats part of the stdlib to parse mbox files?

https://docs.python.org/3/library/mailbox.html

And seems that its just a collection of imap or other messages (with some caveats/differences? but hopefully the stdlib should account for that) in a single file, so should be able to open it, split it into all the messages, and then pass it off to my.mail.imap, which would parse the inputs as normal:

mbox = mailbox.mbox(mboxfile, factory=lambda f: email.message_from_binary_file(f, policy=email.policy.default), create=False)
for msg in mbox:
    ...

From https://stackoverflow.com/a/61708272/9348376

Looks pretty promising

seanbreckenridge commented 2 years ago

If anything in the instructions here look wrong, let me know @krillin666

just writing up the instructions to make sure Im exporting and parsing the correct files.

Looks like:

image

krillin666 commented 2 years ago

If anything in the instructions here look wrong, let me know @krillin666

just writing up the instructions to make sure Im exporting and parsing the correct files.

Looks like:

image

The instructions are very clear ! Regarding your screenshot, the MBOX-type files are the ones without extension. So maybe its best to put a filter in the source for all the other files I'm seeing there: .msf, .sbd and .dat

If I can help with anything more, please say so !

seanbreckenridge commented 2 years ago

Ive pushed something to the branch -- would appreciate a test from you to make sure nothings broken?

still a bit of work to be done but my.mail.mbox is parsing stuff without errors, so --

I think you have a local clone of this setup? else uninstall and reinstall this repo, cloning it locally, so you can:

git pull
git checkout -b mail.mbox origin/mail.mbox

To checkout the changes (if you have this installed as editable you just have to checkout the branch, otherwise, you need to pip install . in the directory)

If you have issues installing, see the troubleshooting doc

then -- setup a config block as described here:

https://github.com/seanbreckenridge/HPI/blob/mail.mbox/doc/MAIL_SETUP.md

test if files are being matched:

$ hpi query my.mail.mbox.files -s   
"/home/sean/Downloads/importExportBackup/0z3cru8i.default-release-20220318-2157/ImapMail/imap.gmail.com/INBOX"
"/home/sean/Downloads/importExportBackup/0z3cru8i.default-release-20220318-2157/ImapMail/imap.gmail.com/[Gmail].sbd/All Mail"
"/home/sean/Downloads/importExportBackup/0z3cru8i.default-release-20220318-2157/ImapMail/imap.gmail.com/[Gmail].sbd/Sent Mail"
"/home/sean/Downloads/importExportBackup/0z3cru8i.default-release-20220318-2157/ImapMail/imap.gmail.com/[Gmail].sbd/Important"
"/home/sean/Downloads/importExportBackup/0z3cru8i.default-release-20220318-2157/Mail/Local Folders/Trash"
"/home/sean/Downloads/importExportBackup/0z3cru8i.default-release-20220318-2157/Mail/Local Folders/Unsent Messages"

and you should be able to test with something like:

hpi --debug query my.mail.mbox.raw_mail -o repl

hpi --debug query my.mail.mbox.mail -o repl

if fails to import, just use python to see whats going on:

 $ python3 -i
Python 3.10.2 (main, Jan 15 2022, 19:56:27) [GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import my.mail.mbox
seanbreckenridge commented 2 years ago

Ive also added a my.mail.all module -- which combines results from both of them (see the mail setup docs for more info)

that works if you have one module setup, or both -- so it can be seen as a sort of entrypoint to the whole module -- it catches any errors/defaults to an empty list of messages if you don't have one of the modules setup

seanbreckenridge commented 2 years ago

On possible issues configuring all.py module, created a related issue https://github.com/karlicoss/HPI/issues/223

krillin666 commented 2 years ago

Hey @seanbreckenridge thanks so much for all the work ! I just followed your instructions to setup the mbox source and all seems to work as expected. Note that I refrained from using the all.py module because I do prefer the mbox source due to the automation in Thunderbird. To use this in Promnesia you still have to update the source there right ?

seanbreckenridge commented 2 years ago

Yeah, I plan to make this work with promnesia

once this is merged in I'll use the my.mail.all source in the promnesia, which will grab Email entries from either imap or mbox depending on which one youre using/have configured (you're even able to disable sources in your config, to avoid warning messages, if youre not planning on using them)

Planning on moving some of this parsing code from promnesia to my.mail.common -- it could be useful for people who don't plan on using promnesia -- just to extract the text body or small descriptions of email

seanbreckenridge commented 2 years ago

Thanks for testing, I want to figure out this issue, but will probably merge this in the next few days regardless of progress on that end

krillin666 commented 2 years ago

Fantastic ! I really appreciate the work done here. When merged and integrated to promnesia I'm sure it'll be one of the most useful sources for Thunderbird users who happen to also use promnesia. Please let me know when this is integrated to promnesia, thank you for your time !

seanbreckenridge commented 2 years ago

@krillin666 merged it into master here -- and updated the import to use my.mail.all in promnesia:

https://github.com/seanbreckenridge/promnesia/commit/ff26be42b12ccb66a1a682c452f3506c57a75631

I also updated the file to be sources.mail instead of sources.imap -- with a stub file fallback module warning you to update the import

If youre getting errors using my.mail.all, for now, before this issue has been fixed, make sure your config matches whats described here -- for now you need both modules in your class mail in your config block -- otherwise my.mail.all errors

krillin666 commented 2 years ago

Hi @seanbreckenridge ! I've just tested and everything is working correctly, including the promnesia import. Thank you so much for your work !