ic-labs / django-icekit

GLAMkit is a next-generation Python CMS by the Interaction Consortium, designed especially for the cultural sector.
http://glamkit.com
MIT License
47 stars 11 forks source link

Use public-read for S3-enabled storage of thumbnails by default #217

Open jmurty opened 7 years ago

jmurty commented 7 years ago

When the ENABLE_S3_MEDIA setting is applied, continue to use "private" S3 storage for media assets by default, but switch to using "public" S3 storage for thumbnails generated by the thumbnailing system.

Here "private" storage means that images are only accessible through time-limited signed S3 URLs, while "public" means that thumbnails will be accessible through non-time-limited standard S3 URLs and the thumbnail images will be stored with the public-read ACL behind the scenes.

As part of this change, the S3DefaultStorage storage class is renamed to S3DefaultPrivateStorage to be explicit (the original class name remains available for backwards-compatibility) and the new S3DefaultPublicStorage class is added.

coveralls commented 7 years ago

Coverage Status

Coverage decreased (-0.04%) to 72.892% when pulling 7719b26e2e89c2fd2be327d47de612cd50135a58 on add-public-read-s3-storage-for-thumbnails into 1efc39c31b8b191112079f892a473b1faeb960b9 on develop.

mrmachine commented 7 years ago

@james you might be right that S3's default behaviour is a bit of a distraction, but we might also have tunnel vision by focusing on thumbnails (in the context of ICEkit, at least)... the default storage backend stores files, images, and thumbnails (repurposed images)...

without S3, we have typically configured the media directory (where the file storage saves all files) as public, served by nginx, and we've only either used a private storage class (attached to model fields in a project, not in a lib like ICEkit) that saves files outside this directory and is protected with a Django view (in the project) or nginx internal redirects...

since we started moving everything to S3, I wanted to keep the config with the project (and not require S3 buckets to be manually configured or special nginx config), and I wanted to avoid setting whole buckets public when we have often stored things like DB dump and log backups on S3, so I decided to stick with S3's private by default... I had just fixed up our use of django-storages to make it consistent (previously sometimes we got signed URLs, sometimes we didn't) and I thought, if we just use the storage backend whenever we access files, we don't need to care about the S3 object being private... and that works fine, except for the case which has come up where a direct link is cached for longer than the expiry (which is max 7 days)...

somewhat separately to that, we now have singular concrete image and file libraries in icekit, which themselves have no concept of public/private file access and use a single storage backend, so we can't easily change the storage class for a single project, and we can't easily store both public and private files in these image/file libraries... this is probably a feature limitation of icekit which may be somewhat orthogonal to the immediate problem, and not one we often run up against... but now that we are using S3 storage for everything, our options to deal with it are more limited because nginx internal redirects for protection won't work anymore without local file access...

what I am disagreeing with is the assumption that a "thumbnail" (an image repurposed by easy-thumbnails) should always be public... that's only one use-case where we want to protect the original high res file and allow smaller (traditional "thumbnails") to be public, and not all repurposed images are small, anyway... I think a repurposed image should have the same public/private status as its original...

in general I am in favour of "safe by default" (e.g. private) and explicit exceptions in code in the cases where we know it is appropriate... I don't care if it's an original image, or a zip/pdf/other file, or a thumbnail repurposed by easy thumbnails... I think private by default, and having to explicitly render a public permalink via template tag or other method, without having to hack the concrete models in ICEkit, aligns with this "safe by default" pattern and should be more easily defined/customised/overridden in project codebase ...

so before we change ICEkit, perhaps we should lay out all the other known use cases for public/private restrictions on image/file library content, and then discuss changes to support them... the most common one might be that all media can probably be public, but if implementing that as a default storage backend makes it harder to keep some files private because the concrete image/file library models are in ICEkit, then it might not be the best way forward...

jmurty commented 7 years ago

Thanks for the feedback @mrmachine.

Leaving aside the implementation details of S3 as a backend, and thumbnails as a specific kind of media content, it sounds like we might need to take a step back and properly (or at least more clearly) handle different classes of stored assets. So we need:

The key goal in everything above is to percolate the concepts of two storages – private and public – all through ICEkit, and select the appropriate one to use at points in code where it matters.

The next step might be to add explicit configuration options for the various ICEkit features – such as IIIF Image API, collection image processing, whatever else – so you can configure whether you want to use the public or private storage engine (or some variation) for the different use cases.

As an aside, I don't think we should use easy-thumbnails as a general toolkit for repurposing images anyway. I tried this with the IIIF Image API work and it simply didn't give us the control we needed, and was too opaque and poorly document for me to easily understand what it did do. The intended use of easy-thumbnails is to generate smaller representations of images for inclusion in web pages. Using it for anything other than this is asking for trouble, or at least confusion.

mrmachine commented 7 years ago

easy-thumbnails has additional limitations when paired with a general purpose image library and remote storage:

So I'm not wedded to easy-thumbnails, but I don't know that anything else will have a better solution to the above problem, and besides that issue, easy-thumbnails appears to be pretty widely used and works reasonably well.

I agree that having ICEkit itself be able to store both public and private files, and possibly determine which via django-guardian integration, sounds like a promising way to go.

I think having the user assign public permissions in the file or image library change form page (checkbox, or django-guardian integration), and having one storage backend that abides by that for files, images and thumbnails, without having to use special template tags or Python code for each generated link, would be ideal.