sul-dlss / dlme-airflow

This is a new repository to capture the work related to the DLME ETL Pipeline and establish airflow
Apache License 2.0
1 stars 0 forks source link

Harvest a list of IIIF items #531

Open jacobthill opened 1 month ago

jacobthill commented 1 month ago

Archive.org now supports IIIF but there are few, if any, collections we would want to grab in whole. There are many objects that we would want to harvest but it would require manually reviewing lists of items and compiling a list of item level IIIF manifests. The might be solved by the work @aaron-collier is currently doing to harvest batched IIIF collections for AUB.

This will be critical for building some important browse categories where we need to select specific items that are only available in archive.org.

This would also solve https://github.com/sul-dlss/dlme-airflow/issues/540, https://github.com/sul-dlss/dlme-airflow/issues/541, and would enable us to add several Stanford collections that don't have collection level IIIF manifests.

aaron-collier commented 1 week ago

@jacobthill I'm a bit confused on this one. How will we know which objects to harvest if they aren't in a collection manifest? The work for AUB is based on the collection manifest offering the list of objects to query, so it doesn't sound like that will be a good solution here. Thanks.

jacobthill commented 1 week ago

I will provide a list of items in the catalog. Here is an example with iiif https://github.com/sul-dlss/dlme-airflow/pull/571