un1t / django-cleanup

Automatically deletes old file for FileField and ImageField. It also deletes files on models instance deletion.
MIT License
1.11k stars 79 forks source link

When copying Model-Instances as per django documentation, adjusting either deletes the File for the other #98

Closed Grosskopf closed 1 year ago

Grosskopf commented 1 year ago

Hi, The Django documentation mentions this method for copying Model-Instance: https://docs.djangoproject.com/en/4.0/topics/db/queries/#copying-model-instances as per default Django behaviour this results in a non-unique filename. If one of them is afterwards adjusted, #51 takes place and The file is deleted from the other copy. Could you please mention in your documentation/enforce in some way Uniqueness in FileFields so that Files with the same name are not deleted as easily? :)

Grosskopf commented 1 year ago

I'm currently considering adding this pre_save signal:

def pre_save_file_field_containing_model(
        sender: Type[Model], instance: Model, **kwargs):
    """
    Handle the pre-save signal of a Model with a FileField.
    These Models should all check, if the FileField they contain contains
    a non-unique FileField and enforce uniqueness.
    """
    if instance.pk is not None:
        return
    file_fields = list(get_fields_for_model(get_model_name(sender)))
    for file_field in file_fields:
        if getattr(instance, file_field):
            other_exist = sender.objects.filter(
                **{file_field: getattr(instance, file_field)}).exists()
            if other_exist:
                file = getattr(instance, file_field)
                new_path = file.storage.get_available_name(file.path)
                shutil.copyfile(file.path, new_path)
                new_path = path.relpath(new_path, file.storage.location)
                setattr(instance, file_field, new_path)

Propably it's not the best way to write this in a long shot but it gets the job done, enforcing uniqueness in future copy operations

vinnyrose commented 1 year ago

This comes up occasionally, past issues mention this #85, #51, maybe others I can't find. I can update documentation to call out that we don't support this out of the box and provide recommendations.

Grosskopf commented 1 year ago

Thank you :)

A specific mention of "unique" or that this happens when copying instances would have been awesome to stop this in future from happening but a bit of text on this topic allready helps a lot :)

For anyone finding this Issue later down the line because they duplicated a model as per django documentation and now noticed that a file is missing:

What we did after figuring this out

We ran this code to find out how many hundreds of files were affected across the years:

from django_cleanup import cache
from django.db.models import Count, Q
import os
import shutil
count_not_found = 0
filenames = []
for model in cache.cleanup_models():
    model_name = cache.get_model_name(model)
    file_fields = list(cache.get_fields_for_model(model_name))
    for file_field in file_fields:
        instances = list(model.objects.all().exclude(Q(**{file_field:''})|Q(**{file_field:None})))
        for instance in instances:
            file = getattr(instance, file_field)
            if not os.path.exists(file.path):
                count_not_found += 1
                print(f'{model_name} | {file_field}  | {str(instance.pk)} | {file.path}')

We attached the above mentioned Signal to one of our apps, that helps stop this from happening.

carefull on this one, this can result in more file-loss

We regenerated our few hundred files from years of backups and after trying it on local testing copies of the live system because the warnings are shown after the file is already gone (renamed) ran this code to get the few hundred duplicate files unique:

from django_cleanup import cache
from django.db.models import Count, Q
import os
import shutil
count = 0
count_files = 0
filenames = []
for model in cache.cleanup_models():
    model_name = cache.get_model_name(model)
    file_fields = list(cache.get_fields_for_model(model_name))
    for file_field in file_fields:
        files = model.objects.values(file_field).annotate(count=Count('id')).order_by(file_field).filter(count__gt=1).exclude(Q(**{file_field:''})|Q(**{file_field:None}))
        for file in files:
            filename = file.get(file_field)
            instances = list(model.objects.filter(**{file_field: filename}))
            first_file = getattr(instances[0], file_field)
            if filename in filenames:
                print('warning: found file twice')
                print(filename)
                print(instances)
            else:
                filenames.append(filename)
            if not os.path.exists(first_file.path):
                continue
            print(filename)
            count_files += 1
            save_objects = True
            for instance in instances:
                file_field_instance = getattr(instance, file_field)
                storage = file_field_instance.storage
                current_base_path = file_field_instance
                new_path = storage.get_available_name(file_field_instance.path)
                shutil.copyfile(file_field_instance.path, new_path)
                updated_path = os.path.relpath(new_path, storage.location)
                if len(updated_path) < 100:
                    setattr(instance, file_field, updated_path)
                else:
                    save_objects = False
                count += 1
                print(f'{model_name} | {file_field}  | {str(instance.pk)}')
            if save_objects:
                for instance in instances:
                    instance.save()
github-actions[bot] commented 1 year ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.