sul-dlss / was-registrar-app

Rails app to organize downloaded web archiving data and trigger preassembly/accessioning when appropriate
0 stars 0 forks source link

Register .arc.gz files #499

Closed lwrubel closed 2 years ago

lwrubel commented 2 years ago

Work from #486 surfaced the need to register .arc.gz files in addition to warc.gz files. These are older web archives created with Archive-It in the FOIA collection. We may have .arc.gz files in the future if further backfilling occurs.

Example filename: ARCHIVEIT-924-STANFORD-FOIA-20090126205218-00370-crawling015.us.archive.org.arc.gz

One example of code to edit: https://github.com/sul-dlss/was-registrar-app/blob/a27ed35d3deb6a6819f811080e88ccc4e8066845/app/services/web_archive_glob.rb#L6

Adjust WRA to register these files.

mjgiarlo commented 2 years ago

@justinlittman Is there remaining work to do on this or is it done (looking at the merged PRs above)?

justinlittman commented 2 years ago

I think it is done, unless someone wants to test prior to closing.

mjgiarlo commented 2 years ago

We believe this is done. Closing. Peter will run another couple tests and if any errors found will re-open.