sul-dlss / dlme-airflow

This is a new repository to capture the work related to the DLME ETL Pipeline and establish airflow
Apache License 2.0
1 stars 0 forks source link

Penn Museum urls no longer resolve #532

Open jacobthill opened 2 days ago

jacobthill commented 2 days ago

The Penn Museum pulled down the metadata that we used to harvest. There is a "download data" button on the about page of their website but it is the whole collection. We are only interested in the Near Eastern and Egyptian sections. This metadata is also missing thumbnail urls. The collections have a lot of records with not images. We only want to load records with images so we need a way to filter out the ones without images.

jacobthill commented 2 days ago

Wayne doesn't have a contact at Penn. I emailed Elizabeth Waraksa to see if she does.

jacobthill commented 7 hours ago

I have updated the catalog in this PR with the new url and, now necessary, filters but it failed locally. Need to try in dev once the PR is merged and document issue here.