The 'reaper' has been turned off for many years and its implementation was never the most efficient. Thanks to the work in previous PRs (notably https://github.com/guardian/grid/pull/3926) we can easily filter for images which meet the criteria for 'reaping'.
The new approach will first 'soft delete' reapable images for two weeks, so they're not visible in the grid UI (unless the is:deleted filter is applied, note all queries have -is:deleted by default) and so there's a place to view the candidates for hard deletion for a period of time before they're actually hard deleted.
This PR introduces, four new media-api endpoints...
/images/nextIdsToBeSoftReaped and /images/nextIdsToBeHardReaped to retrieve next X ids of reapable images (to facilitate the two phases of reaping) where X is the 'size' query param (max. 1000)`
/images/batchSoftDelete and /images/batchHardDelete which take a JSON array of IDs in the body, then...
deletes from S3 (main image, thumbs and optimised PNG [if applicable]) in bulk using the S3 bulk delete API (max. 1000)
processes just the ES mutations in thrall (via mixin of a new BatchExternalThrallMessage message type) making use of the 'update-by-query' and 'delete-by-query' ES operations respectively
The 'reaper' has been turned off for many years and its implementation was never the most efficient. Thanks to the work in previous PRs (notably https://github.com/guardian/grid/pull/3926) we can easily filter for images which meet the criteria for 'reaping'.
The new approach will first 'soft delete' reapable images for two weeks, so they're not visible in the grid UI (unless the
is:deleted
filter is applied, note all queries have-is:deleted
by default) and so there's a place to view the candidates for hard deletion for a period of time before they're actually hard deleted.This PR introduces, four new
media-api
endpoints.../images/nextIdsToBeSoftReaped
and/images/nextIdsToBeHardReaped
to retrieve next X ids of reapable images (to facilitate the two phases of reaping) where X is the 'size' query param (max. 1000)`/images/batchSoftDelete
and/images/batchHardDelete
which take a JSON array of IDs in the body, then...thrall
(via mixin of a newBatchExternalThrallMessage
message type) making use of the 'update-by-query' and 'delete-by-query' ES operations respectively... these will be primarily utilised by a re-write of the scheduled 'reaper' lambda (see https://github.com/guardian/grid/pull/4135).
Note, thanks to #4128 we have a way to restore images even once they've been hard deleted 🪄