Closed Grosskopf closed 1 year ago
I'm currently considering adding this pre_save signal:
def pre_save_file_field_containing_model(
sender: Type[Model], instance: Model, **kwargs):
"""
Handle the pre-save signal of a Model with a FileField.
These Models should all check, if the FileField they contain contains
a non-unique FileField and enforce uniqueness.
"""
if instance.pk is not None:
return
file_fields = list(get_fields_for_model(get_model_name(sender)))
for file_field in file_fields:
if getattr(instance, file_field):
other_exist = sender.objects.filter(
**{file_field: getattr(instance, file_field)}).exists()
if other_exist:
file = getattr(instance, file_field)
new_path = file.storage.get_available_name(file.path)
shutil.copyfile(file.path, new_path)
new_path = path.relpath(new_path, file.storage.location)
setattr(instance, file_field, new_path)
Propably it's not the best way to write this in a long shot but it gets the job done, enforcing uniqueness in future copy operations
This comes up occasionally, past issues mention this #85, #51, maybe others I can't find. I can update documentation to call out that we don't support this out of the box and provide recommendations.
Thank you :)
A specific mention of "unique" or that this happens when copying instances would have been awesome to stop this in future from happening but a bit of text on this topic allready helps a lot :)
For anyone finding this Issue later down the line because they duplicated a model as per django documentation and now noticed that a file is missing:
We ran this code to find out how many hundreds of files were affected across the years:
from django_cleanup import cache
from django.db.models import Count, Q
import os
import shutil
count_not_found = 0
filenames = []
for model in cache.cleanup_models():
model_name = cache.get_model_name(model)
file_fields = list(cache.get_fields_for_model(model_name))
for file_field in file_fields:
instances = list(model.objects.all().exclude(Q(**{file_field:''})|Q(**{file_field:None})))
for instance in instances:
file = getattr(instance, file_field)
if not os.path.exists(file.path):
count_not_found += 1
print(f'{model_name} | {file_field} | {str(instance.pk)} | {file.path}')
We attached the above mentioned Signal to one of our apps, that helps stop this from happening.
We regenerated our few hundred files from years of backups and after trying it on local testing copies of the live system because the warnings are shown after the file is already gone (renamed) ran this code to get the few hundred duplicate files unique:
from django_cleanup import cache
from django.db.models import Count, Q
import os
import shutil
count = 0
count_files = 0
filenames = []
for model in cache.cleanup_models():
model_name = cache.get_model_name(model)
file_fields = list(cache.get_fields_for_model(model_name))
for file_field in file_fields:
files = model.objects.values(file_field).annotate(count=Count('id')).order_by(file_field).filter(count__gt=1).exclude(Q(**{file_field:''})|Q(**{file_field:None}))
for file in files:
filename = file.get(file_field)
instances = list(model.objects.filter(**{file_field: filename}))
first_file = getattr(instances[0], file_field)
if filename in filenames:
print('warning: found file twice')
print(filename)
print(instances)
else:
filenames.append(filename)
if not os.path.exists(first_file.path):
continue
print(filename)
count_files += 1
save_objects = True
for instance in instances:
file_field_instance = getattr(instance, file_field)
storage = file_field_instance.storage
current_base_path = file_field_instance
new_path = storage.get_available_name(file_field_instance.path)
shutil.copyfile(file_field_instance.path, new_path)
updated_path = os.path.relpath(new_path, storage.location)
if len(updated_path) < 100:
setattr(instance, file_field, updated_path)
else:
save_objects = False
count += 1
print(f'{model_name} | {file_field} | {str(instance.pk)}')
if save_objects:
for instance in instances:
instance.save()
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
Hi, The Django documentation mentions this method for copying Model-Instance: https://docs.djangoproject.com/en/4.0/topics/db/queries/#copying-model-instances as per default Django behaviour this results in a non-unique filename. If one of them is afterwards adjusted, #51 takes place and The file is deleted from the other copy. Could you please mention in your documentation/enforce in some way Uniqueness in FileFields so that Files with the same name are not deleted as easily? :)