guardian / grid

The Guardian’s image management system
https://www.theguardian.com/info/developer-blog/2015/aug/12/open-sourcing-grid-image-service
Apache License 2.0
1.44k stars 120 forks source link

script to fetch all ElasticSearch IDs #4114

Closed twrichards closed 1 year ago

twrichards commented 1 year ago

We need to cross reference the IDs available in the grid UI (i.e. ElasticSearch) before reaping loads of files from S3. There's no direct way of doing this in ElasticSearch, it has to be done with a query, which is most efficient as a 'scan and scroll' (see https://stackoverflow.com/a/30855670) so this adds a script to do just that and write to file - for example a CSV file for upload to AWS Athena (see #4111 )

Seems to work nicely for TEST (finished in a couple of mins)... image

prout-bot commented 1 year ago

Seen on auth, image-loader, metadata-editor, thrall, cropper, collections, kahuna (merged by @twrichards 9 minutes and 36 seconds ago) Please check your changes!

prout-bot commented 1 year ago

Seen on leases, usage, media-api (merged by @twrichards 9 minutes and 42 seconds ago) Please check your changes!