sul-dlss / was-registrar-app

Rails app to organize downloaded web archiving data and trigger preassembly/accessioning when appropriate
0 stars 0 forks source link

Rake task for listing WARCs #564

Closed edsu closed 1 year ago

edsu commented 1 year ago

Why was this change made? 🤔

To help resolve a production issue we wanted to see what WARC files were available from Archive-It for a given collection. This adds a task to support fetching metadata for WARC files that are available from the WASAPI provider for a given collection druid:

RAILS_ENV=production bin/rake warcs[druid:yh729xg0576]

The CSV will contain filename, md5, sha1, size, crawl_time, crawl_start, store_time, and location.

How was this change tested? 🤨

Running the rake task in development with a copy of the production database and configuration.

edsu commented 1 year ago

After getting this working (and thanks for the reviews) I'm having second thoughts about merging it.

The current code has a symmetry in the way that WasapiWarcLister and SdrWarcLister return filenames. I could update SdrWarcLister to return objects but that seems like a bridge too far, at least right now.

I'm leaning towards putting what I've learned here to use in creating a wasapi library & client, which (if it works) could eventually be used by was-registrar-app (if it makes sense).

mjgiarlo commented 1 year ago

@edsu 💬

After getting this working (and thanks for the reviews) I'm having second thoughts about merging it.

The current code has a symmetry in the way that WasapiWarcLister and SdrWarcLister return filenames. I could update SdrWarcLister to return objects but that seems like a bridge too far, at least right now.

I'm leaning towards putting what I've learned here to use in creating a wasapi library & client, which (if it works) could eventually be used by was-registrar-app (if it makes sense).

Sounds reasonable to me. Worth bringing up as a slack discussion?

edsu commented 1 year ago

This was helpful for getting info from Archive-It as part of an FR task this week, but I feel like it clutters up otherwise easy to read auditing code.

I'm gonna create another tool for getting information from WASAPI, and maybe (someday) integrate it with was-registrar-app.