sul-dlss / libsys-airflow

Airflow DAGS for migrating and managing ILS data into FOLIO along with other LibSys workflows
Apache License 2.0
5 stars 0 forks source link

Retrieve and process Digital Bookplate Metadata #1217

Closed jermnelson closed 2 months ago

jermnelson commented 2 months ago

Fixes #1177

In this approach, each purl URL is in a separate task that retrieves the Cocina and extracts the title (label), image filename, and fund name. I initially throttled the number of active tasks to 5 and retrieving and extracting 817 druids took 22:25 minutes. I will do a second run with active DAG runs set to 10 to see if that improves the performance.

The second run with 10 active tasks and running on Stanford's network brought the time down to 6:41 minutes.