uga-libraries / web-aip

Downloads web content captured using Archive-It and gets it ready to be transformed into AIPs for preservation.
Creative Commons Attribution Share Alike 4.0 International
3 stars 1 forks source link

Use existing WARCs when restart #28

Open amhanson9 opened 1 year ago

amhanson9 commented 1 year ago

If the API times out or the script breaks in the middle of creating an AIP, it currently has to be deleted before the script runs again in order for it to be correctly finished. For AIPs with a lot of WARCs, this can mean a lot of wasted time. Is there a way to have the script be smarter about a restart and use existing metadata and WARC files if they were logged as successful? Or is it safer to start over?

amhanson9 commented 1 year ago

The reset_aip() function already exists to delete everything about the seed if it exists but did not complete in the previous iteration of the script. Can this be reworked to instead delete what had an error and leave what didn't?