ui / django-post_office

A Django app that allows you to send email asynchronously in Django. Supports HTML email, database backed templates and logging.
MIT License
1k stars 270 forks source link

disk space usage by duplicate attachments #465

Open patroqueeet opened 4 months ago

patroqueeet commented 4 months ago

symptom

when I send a personalised mail to many users with the same attachment, the attachment file will be duplicated for each mail and use a large amount of disk space when sent to thousands.

expected solution

have only one Attachment object for a single file linked by many Emails.

workaround as of now

run script to detect and consolidate frequently:

import hashlib
import os

for a in Attachment.objects.all():
    attachments = Attachment.objects.filter(name=a.name).exclude(pk=a.pk)
    if attachments.count() > 1:
        md5 = hashlib.md5()
        if not os.path.exists(a.file.path):
            continue
        md5.update(a.file.file.read())
        hash0 = md5.hexdigest()
        for attachment in attachments:
            md5a = hashlib.md5()
            md5a.update(attachment.file.file.read())
            hash = md5a.hexdigest()
            if hash0 == hash and attachment.name == a.name:
                print(f"{attachment} ({attachment.pk}) is duplicate of {a} ({a.pk})")
                for email in attachment.emails.all():
                    print(f"for {email.pk} add {a} ({a.pk}) and delete {attachment} ({attachment.pk})")
                    if os.path.exists(attachment.file.path):
                        os.remove(attachment.file.path)
                    email.attachments.add(a)
                    if attachment.id:
                        attachment.delete()