ic-labs / django-icekit

GLAMkit is a next-generation Python CMS by the Interaction Consortium, designed especially for the cultural sector.
http://glamkit.com
MIT License
47 stars 11 forks source link

Two file uploads with the same filename will overwrite each other #161

Open cogat opened 7 years ago

cogat commented 7 years ago

I thought django ensured uploaded image files were unique?

Ideally desired behaviour is to ensure unique filename, unless you're replacing a file in the same imagefield, but just ensuring uniqueness will do.

mrmachine commented 7 years ago

Looks like django-storages changes this behaviour, and it is surprising (at least, having not read the docs closely).

https://github.com/jschneier/django-storages/blob/e20f98512e15354a0d1d96ef6c813903adbdcd9f/storages/backends/s3boto3.py#L577-L582

In ixc-whitenoise we already have HashedMediaMixin which assigns a unique name to every file based on its content, so that we can cache it forever on CloudFront or just set far future expiry headers.

https://github.com/ic-labs/django-icekit/blob/d27b6394e6f81a4e12187dc050752deb1532dc43/icekit/utils/storage.py#L58

I think we should make this the default in ICEkit. It will mean gibberish looking filenames, but for the most part we shouldn't care what media filenames are. When we do, we should retain the original filename (or an explicitly chosen/editable name) in our file library and set that during download via HTTP headers.

cogat commented 7 years ago

My little bit of googling:

https://docs.djangoproject.com/en/1.8/ref/files/storage/#django.core.files.storage.Storage.get_available_name http://stackoverflow.com/questions/2673647/enforce-unique-upload-file-names-using-django

Rather than hash the whole filename, it would be nicer to append some small amount of gibberish (or perhaps the asset id?) to the original filename? We don't care what image filenames are, but we do care what download filenames are - I'm guessing we can use the HTML 5 download attribute to set it to the original name (http://stackoverflow.com/questions/3102226/how-to-set-name-of-file-downloaded-from-browser)

mrmachine commented 7 years ago

When we don't care about the filename, a hash of the file content also allows us to de-duplicate the data and avoid the save (upload to S3) operation entirely (compared to appending a counter or ID or hash to the original filename). The downside is human readability when browsing the file system directly (not via our file library).

We should be able to store the original name (in the file library) to use for downloads, either way. In most cases, a unique suffix might be acceptable for downloads, and de-duplication might not be required, but ideally we should be able to use the exact original or specified name (even if it's not unique).