guardian / grid

The Guardian’s image management system
https://www.theguardian.com/info/developer-blog/2015/aug/12/open-sourcing-grid-image-service
Apache License 2.0
1.44k stars 120 forks source link

upsert images by projection #4116

Closed andrew-nowak closed 1 year ago

andrew-nowak commented 1 year ago

What does this change?

Add new endpoint and matching Thrall HTML forms which allow to submit an image to be "upserted by projection".

For context, "projection" is the process of reforming the image data by ingesting the original image file, and then "replaying" the metadata changes via the Grid UI, and attaching all of the other Grid-specific data (leases, photoshoots, usages, etc.). ie. rebuilding the data stored in Elasticsearch without referring to Elasticsearch. Projection was originally implemented to make migration useful, but it would occasionally be helpful to update (or insert!) a single image into the Elasticsearch index "fresh".

In our specific case, we have removed some images from the index but not yet removed their image files out of S3, but now want to restore seamlessly. Running a full migration for a handful of images is quite heavy, and the list of images to migrate is generated by querying the current index, so wouldn't even really help here! So we implement the feature at long last to unblock ourselves.

There is a UI flow, which will fetch the projection and diff it against what's in the index (if there is something there! if there's not the diff will be 100% additions :) ), but also services with API keys can make requests directly against the endpoint which does the work if that's useful

Potential other things to do:

I haven't added a link to this pages from the Thrall dashboard - should I? (probably yes) While implementing this I figured out what was preventing CSRF protections on the other Thrall dashboard endpoints. I think I'll raise another PR after this one which removes the +nocsrfs

How should a reviewer test this change?

Deploy to TEST if not already. Delete an image from Elasticsearch without deleting from S3 and dynamo (not possible in Grid UI! must delete directly against elasticsearch. feel free to ask for help if you've not done this before). Check the image is now "not-found" in Grid UI. Open /upsertProject on the Thrall endpoint, and paste the image ID into the form and submit. Check if the diff looks like you expect (should be all additions), and then submit again. Is the image back on Grid? With all the data it previously had?

This PR I think fixes all problems with the CSRF protection which made them unusable on the other Thrall endpoints - do you see any problems?

How can success be measured?

We can restore images we wanted to restore

Tested? Documented?

twrichards commented 1 year ago

Worth forcing them through the projection diff endpoint first!

prout-bot commented 1 year ago

Seen on leases, media-api (merged by @andrew-nowak 10 minutes and 18 seconds ago) Please check your changes!

prout-bot commented 1 year ago

Seen on auth, metadata-editor, thrall, cropper, collections, kahuna (merged by @andrew-nowak 10 minutes and 32 seconds ago) Please check your changes!

prout-bot commented 1 year ago

Seen on image-loader, usage (merged by @andrew-nowak 10 minutes and 38 seconds ago) Please check your changes!