sul-dlss / dlme-airflow

This is a new repository to capture the work related to the DLME ETL Pipeline and establish airflow
Apache License 2.0
1 stars 0 forks source link

Harvest Michigan metadata in airflow #530

Open jacobthill opened 1 month ago

jacobthill commented 1 month ago

Michigan has a list that is manually downloaded here. The old harvest script is here. We need to grab the catalog_url from each record and harvest the metadata. Let's discuss whether airflow is a good use for this or if we should come up with a manual solution. In either case, Michigan is one of the few collections left where the traject config is written to transform xml data and we want to stop supporting xml. We need to solve this before we can delete all of the xml macros.