seanbreckenridge / HPI

Human Programming Interface - a way to unify, access and interact with all of my personal data [my modules]
https://beepb00p.xyz/hpi.html
MIT License
69 stars 6 forks source link

How to use IMAP function πŸ“§ #15

Closed krillin666 closed 2 years ago

krillin666 commented 2 years ago

Hello,

Thanks for this amazing extension of HPI that I just discovered. I was trying to setup this with Promnesia for my emails but I'm getting zero indexing:

[INFO    2021-10-18 23:17:26 promnesia extract.py:49] extracting via promnesia_sean.sources.imap:index () {} ... ...
[INFO    2021-10-18 23:17:26 promnesia extract.py:82] extracting via promnesia_sean.sources.imap:index () {} ...: got 0 visits

I am using this Thunderbird Addon and I've tried to export with using:

All these export are in my .local/share/mail path, I even wrote the path literally in the __init___ file instead of using path,join but nothing works.

Thank you so much !

seanbreckenridge commented 2 years ago

Hmm -- Like most modules in HPI, this supports any type of path (absolute, a Path object, a string, something like ~/.local/share/mail); Mine looks like this. That shouldn't really matter that much, since any folder you give the imap module, it will drill down and search everything, unless your mail are in hidden folders. Just to compare though, my folder looks like this:

 pwd
/home/sean/.local/share/mail
$ ls -1
seanbrecke@gmail.com

The part which determines which files are used is a recursive glob, so it should just search every folder listed in your configuration and try every file. You could also just try the following, to confirm its not matching anything...

$ python3
Python 3.9.7 (default, Aug 31 2021, 13:28:12)
[GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import my.imap
>>> list(my.imap.mailboxes())
>>> list(my.imap.files())

Those should tell you what its computed as the target files

I'm not sure what format the Thunderbird export tool uses, but it seems to be EML -- which I don't think is a raw email file, as far as I understand. Am not totally sure, formats in email have always been confusing to me. I definitely do know you can sync with IMAP with thunderbird, but not sure if it stores all your mail locally -- I can install it later today to see if I can figure that out.

As a visual comparison, here is what one of my locally synced IMAP files looks like

If you have something similar to that -- pointing it at the top folder which has all of those should work

seanbreckenridge commented 2 years ago

I tested out the addonn, and I think I've got it to work. Using the 'Plain Text Format', export that to a folder somewhere.

It takes a while to do so:

image

I just put that in ~/Downloads/mailexport for this demo.

In my config, I put:

# locally synced IMAP mailboxes using mbsync
class imap:
    # path[s]/glob to the the mailboxes/IMAP files
    mailboxes = "~/Downloads/mailexport/"

And then:

>>> import my.imap
>>> next(my.imap.mail())._serialize()
{'filepath': PosixPath('/home/sean/Downloads/mailexport/Inbox_20211019-0046/messages/20211011-Re_[ActivityWatch_aw-watcher-window] Update macOS window title logic (#49)-13430.txt'), 'bcc': [], ...

If the imap.mail function works, promnesia should work fine, since its a thin wrapper around that function which extracts info

This does mean you'd have to periodically do an export, but there isn't a great way around that with thunderbird. For context, I use mutt-wizard, which uses mbsync under the hood, so my mail gets synced with a local folder once every 5 minutes.

seanbreckenridge commented 2 years ago

Ah - may also be some issue with the different date format that the thunderbird addonn uses, looking into that

seanbreckenridge commented 2 years ago

Alright yeah -- the dates in the emails that the thunderbird addonn created weren't RFC 2822 compliant, so I created a wrapper to parse them manually if it wasn't able to do so: https://github.com/seanbreckenridge/HPI/commit/c4d87b783c83b9a69168d2f8178a108c3fdd4617

Also updated the promnesia module, so you may have to git pull/reinstall that, in addition to this repo

Added dateparser to the deps, so pip install dateparser

Using the addon export, I now get visits from promnesia:

$ hpi query promnesia_sean.sources.imap | jq length
105

If mail was parsing for you before, it may have actually been this line causing the issues -- since the mail objects didn't have any datetimes, promnesia would ignore them. Hopefully thats fixed

seanbreckenridge commented 2 years ago

Ah, I also just remembered, since it takes about 30 minutes to run on my machine, I cache this once per month, so it picks up new URLs periodically. If you want me to make that configurable, let me know

So if its still seems not to be working for you, you may have to delete the sqlite cachew file between testing if promnesia is working. That'd be in ~/.cache/my or ~/.cache/cachew. For me, thats:

rm ~/.cache/cachew/promnesia_sean.sources.imap:index

To figure out where that is, run:

python3 -c 'from my.core.cachew import cache_dir; print(cache_dir())'

Let me know if you have any other issues, hopefully this isn't all too confusing

krillin666 commented 2 years ago

Wow that was fast πŸ˜…. First of all, let me thank you for fixing this and showing the appropriate steps.

I've now been able to index on of my Inboxs to test the display in Promnesia. However, it was not what I was expecting and maybe you can improve them (I can try too but I'm not a coder).

Before using your IMAP source I was using the plaintext source from Promnesia and it was at least displaying surrounding text of the email body next to the URL. With your source I just get the file name.

Nevertheless, yours has the advantage of having the email date! What I was trying to accomplish before (and thought that yours implemented) was to display the surrounding text but also the Date, From, Subject, To fields in the promnesia plugin.

I've two more comments:

  1. When I installed your Promnesia package the sources folder is not installed and I had svn checkout https://github.com/seanbreckenridge/promnesia/trunk/promnesia_sean/sources inside the promnesia_sean folder.
  2. I've came up with this question pertaining to security and maybe I should direct it to karli but I think you are in the position to answer it too: Using sources that index sensitive information (like email) is it possible to add a prompt option on the promnesia index for a password to encrypted folders which could be given to some sources (like the IMAP, or the plaintex/auto) ? We could put it in the config.py file but that defeats the purpose because in that case would be scrapable to an attacker

Thank you so much again for your help and work !

seanbreckenridge commented 2 years ago

displaying surrounding text of the email body next to the URL

Theres an option in my promnesia module to display the text as the body, but with 8000 emails, the sqlite database tends to grow pretty fast (was something like 30GB on my end, since it copies the text for every URL it finds)

See https://github.com/seanbreckenridge/promnesia/blob/master/promnesia_sean/sources/imap.py#L30

To enable that, in your config you can do something like:

Source(imap, body_as_context=True) instead of just the imap like in my config

Perhaps extracting a few lines around the message is preferable? Would increase the complexity/runtime a bit though. Will think about this

svn checkout

Hmm, am not sure if this is related to it being a namespace package, but I sorta doubt it. Don't have any experience with svn

question pertaining to security

Yeah, I've thought about this a bit as well. The best solution that perhaps I've come across for something like this is to use something like pass, which PGP encrypts your password, and then use that to store a decryption key for some local zip? PGP can typically be stored in a keychain while the computer is active, so it sort of acts as an initial barrier so everything isn't just plaintext. But then again, this could also just be solved by encrypting your main drive. It adds a bunch of complexity, but core.structure already exists, which abstracts away some of the unzipping/extraction.

Since everything is local-first, I sort of don't see a huge issue, but if you want to bring it up, best place would probably be here

krillin666 commented 2 years ago

Thank for the guidance on the security part ! As for the svn checkout I just used it to pull the folder sources from your git repo. The important thing here, is that when following your install procedure the promnesia_sean folder installed in my PC does not contain the sources folder πŸ˜…

Thanks for the tip on the IMAP. I understand now how your database grew so large, it is displaying not only the text of the whole email but all text from threads (when they exist). Is it not possible to just index the surronding text as the promnesia.auto promnesia.markdown, promnesia.plaintext (etc) do ? This way it would prevent the database from growing so huge and also just provide the relevant text in the Promnesia plugin to not clutter the side bar.

In relation to extracting and displaying the From, To, Subject do you have any idea how to implement this ? I think maybe a simple open() and then iterate through the lines (line.strip()) with conditionals to store which field would suffice ?

krillin666 commented 2 years ago

Since all emails in plaintext begin like so:

   1   β”‚ Subject: Promnesia
   2   β”‚ From: John Doe
   3   β”‚ Date: 06/10/2021, 09:28
   4   β”‚ To: Someone

It would suffice (???) to use something simple like this for each email file :

From = ""
To = ""
Subject = ""
Date = ""
with open(email_file) as in_file:

    for line in in_file:

        if "From" in line.strip():
            From = line.strip()
            # etc etc

        #If we want to obtain only the text after the From, To:
        From.split(":",1)[0]
        # etc etc
        # Maybe we can just keep it ?

I'm sure you'll know the best way though !

seanbreckenridge commented 2 years ago

relation to extracting and displaying the From, To, Subject

I think its already doing this?

2021-10-19--10_44_33

Relevant code is here

It displays that as the Locator description, not the body -- don't think that should make a difference though, I think thats always shown

I'll take a look at the markdown/plaintext modules from promnesia to see how they do it and see if I can figure out surrounding text; will leave this issue open for that purpose

seanbreckenridge commented 2 years ago

It would suffice (???) to use something simple like this for each email file :

As just an FYI, mail-parser (library which wraps the stdlib email lib) is what my.imap uses, and that already parses all that info out of it nicely, so no need to try to do it manually:

https://github.com/seanbreckenridge/HPI/blob/master/my/imap.py#L72-L95

krillin666 commented 2 years ago

relation to extracting and displaying the From, To, Subject

I think its already doing this?

2021-10-19--10_44_33

Relevant code is here πŸ‘canonical: github.com/seanbreckenridge/promnesia/blob/master/promnesia_sean/sources/imap.pyΓ—?sources : firefox18/10/2021, 21:21:36

It displays that as the Locator description, not the body -- don't think that should make a difference though, I think thats always shown

I'll take a look at the markdown/plaintext modules from promnesia to see how they do it and see if I can figure out surrounding text; will leave this issue open for that purpose

Ok I see. But what I was trying to achieve was to have something like:


From: Sean
To: Krillin
Subject: IMAP issue

"Relevant text around url from email body here"

I see that the date is nicely displayed already on the Promnesia addon in the same location as for browser history, so no need to extract that line πŸ‘

Thank you again and looking out how this goes πŸ˜€

seanbreckenridge commented 2 years ago

Since I think the config issues around IMAP/loading the text files has generally been solved here, gonna move it to an issue on the promnesia repo

krillin666 commented 2 years ago

I tested out the addonn, and I think I've got it to work. Using the 'Plain Text Format', export that to a folder somewhere.

It takes a while to do so:

image

I just put that in ~/Downloads/mailexport for this demo.

This does mean you'd have to periodically do an export, but there isn't a great way around that with thunderbird. For context, I use mutt-wizard, which uses mbsync under the hood, so my mail gets synced with a local folder once every 5 minutes.

Sorry for commenting on a closed issue. I've found a way to automate this process for Thunderbird users, however it relies on using MBOX type files. The addon mentioned for Thunderbird (importexporttols) now supports periodic Backups but only in MBOX format. I've seen this solution to parse MBOX files: https://github.com/chronicle-app/chronicle-email/blob/master/lib/chronicle/email/mbox_extractor.rb

Maybe this could be ported to promnesia, don't know how laborious that would be. Thank you !

seanbreckenridge commented 2 years ago

Fine for you to comment here, all good

https://github.com/seanbreckenridge/HPI/blob/master/CHANGELOG.md

Ah I see, I updated the modules name/structure here so now the imap file is now my.mail.imap

I think it would make sense to add something like my.mail.mbox and parse those files there? Its probably in a totally different format. Or maybe some of it could be reused

I can create an issue to track it :+1: