lefcha / imapfilter

IMAP mail filtering utility
MIT License
844 stars 93 forks source link

[support needed] Deduplicating emails #285

Open arminposchmann opened 7 months ago

arminposchmann commented 7 months ago

Hi all, sorry to ask a question that was asked, but i did not find a solution: I am using an imap server that offers only one INBOX but several mailaliases. In case a mail comes in with two or more aliases, this mail is duplicated for each of the aliases. What i can do with imapfilter is moving and copying to other subfolders depending on the to or cc field, but this moves all the messages. So i need a way to first dedup the messages. How can i achieve this ? What i found by searching is a extension that covers a similar case:

messages = myaccount.INBOX:selectall() results = Set {} for , message in ipairs(messages) do mailbox, uid = table.unpack(message) messageId = mailbox[uid]:fetch_header('Message-Id') if seen[messageId] then table.insert(results, uid) else seen[messageId] = true end end results:delete_messages()

i tried it but this won't run because i am obviously missing something. can anyone explain what i am missing

lefcha commented 5 months ago

I think this example has some assumptions and missing parts, a more correct and complete would be this one (ref: https://github.com/lefcha/imapfilter/issues/106#issuecomment-388796226):

seen = {}
duplicates = Set {}
results = account["Inbox"]:select_all()
for _, message in ipairs(results) do
        mailbox, uid = table.unpack(message)
        messageId = mailbox[uid]:fetch_field("Message-Id")
        -- Remove prefix to ignore Id/ID difference.
        messageId = string.sub(messageId, 12)
        if seen[messageId] then
                table.insert(duplicates, {mailbox, uid})
        else
                seen[messageId] = true
        end
end
duplicates:mark_seen()
duplicates:move_messages(account["dups"])

But instead of doing mark_seen() and move_messages() at the end, you can do:

duplicates:delete_messages()
dngray commented 1 week ago

So i need a way to first dedup the messages

I wonder if running mail-deduplicate first is an idea?