terhechte / postsack

Visually cluster your emails by sender, domain, and more to identify waste
MIT License
418 stars 14 forks source link

Improvements to filters to help finding huge emails #28

Open Byron opened 2 years ago

Byron commented 2 years ago

When receiving emails with attachments, the mailbox may get bloated.

Especially in conjunction with eventually deleting emails as in #16 , it would be useful to add filters specific to the size of an email.

Possible filters

Items marked with MVP seem desirable to have in the first incarnation.

Possible GUI Improvements

Extended statistics about displayed emails

Along with knowing how many emails are currently displayed, it would certainly be useful how much storage space they require. Maybe this information could be added to the top of the GUI.

Screen Shot 2022-01-04 at 10 02 22 AM

Real-time updates when changing the filter

This would allow to remove the 'apply' button and would probably make it fun to play with sliders. It might also be an interesting exercise in async programming to make it deal with possibly longer filtering times gracefully. If filtering through 650k emails gets sluggish, one might consider throwing multiple threads into the mix.

terhechte commented 2 years ago

Hey, thanks for the feedback. I agree that having a size filter would be useful. Initially I wanted to collect attachment information during the mail parsing (e.g. amount of attachments, maybe even file type) but this led to a slowdown in email parsing. So I wonder if just having the size of the email (which would be very fast) is enough, or if it is useful to also scan detailed attachment information (maybe as an optional setting?). It certainly would be nice to be able to see all emails containing PDF files > 2 MB.

Byron commented 2 years ago

Great to hear!

So I wonder if just having the size of the email (which would be very fast) is enough […]

This looks like an MVP to me, it's fast, it's easy to implement, and it's a first step in this direction.

I wonder if it would be fast to learn about 'attachments - yes or no' instead of parsing them in detail. That would certainly help to differentiate huge emails with swaths of texts and those who have a big attachment.

Personally I wouldn't worry too much about parsing speed as it's already fast enough seemingly (except for the initial silence where it seems to extract an archive), and there seem to be some opportunities for improvements as well.

terhechte commented 2 years ago

Awesome, this sounds like a good initial approach. Once the app implements its own imap client (via imap) this can be accomplished with two imap queries:

Byron commented 2 years ago

That's an interesting proposition. To me postsack is a batch processor with the performance benefits that go with it. imap seemed to be useful for features like the eventual deletion of emails, and querying information mail-by-mail seems too slow to be used in a filter.

Maybe I misunderstood and imap would be another avenue for the importer to extract mail information though.