wellcomecollection / editorial-photography-ingest

0 stars 0 forks source link

divide large shoots by maximum total size #33

Closed paul-butcher closed 1 week ago

paul-butcher commented 2 weeks ago

What does this change?

Fixes https://github.com/wellcomecollection/editorial-photography-ingest/issues/31

Previously, because most shoots were relatively small, I wrote all this to download, zip, then upload, in discrete steps. I have now encountered some shoots that were too large for Lambda to cope with in that way.

In order to handle this, I now download only as many files as can fit (alongside the zip they become), then upload that partial shoot, before cleaning up and downloading the next batch.

This could have been optimised further by doing it all in memory, but I have chosen to work with the filesystem partly because it makes it easier to diagnose problems when run locally.

How to test

Push the remaining shoots onto the restore queue. They should all now transfer successfully.

How can we measure success?

This should mark the end of having to make code changes to accommodate the vagaries of this process.

Have we considered potential risks?

It's possible that the size I have set is too big for Archivematica to handle. If that's the case, then I can revisit the numbers.