guardian / crossword-uploader

Repository for the AWS lambda functions used to upload crosswords
0 stars 0 forks source link

Have the uploaders also archive files that are not xml and not pdf #32

Closed andrew-nowak closed 11 months ago

andrew-nowak commented 1 year ago

Because they will never get uploaded by either.

What does this change?

31 was fantastic - it cleared up a bunch of xml and pdfs that had lingered in the bucket in an unprocessable state. However there's still a whole bunch of files in there which aren't targets for either uploader (ie. don't have xml or pdf extensions) so aren't targeted for archival. This PR adds an extra step to the XML lambda, which will detect any objects without an xml or pdf extension and archive those as part of a cleanup.