sul-dlss / dlme-airflow

This is a new repository to capture the work related to the DLME ETL Pipeline and establish airflow
Apache License 2.0
1 stars 0 forks source link

Schedule all OAI, IIIf, and CSV collections to run regularly #129

Closed jacobthill closed 2 years ago

jacobthill commented 2 years ago

All OAI, IIIf, and CSV collections should be scheduled to run every month and send an email to jtim@stanford.edu.

jacobthill commented 2 years ago

During the workcycle any provider that is passing harvest when manually triggered can be automated so it runs daily at midnight in the location of the provider. All of these will fail down the line at transform or reporting but this will be a good way to see how were doing, gather some data around how resilient the harvest task is for each provider, and probably help us surface some other issues we will run into once this is turned on in production. I'm not sure if we can automate the DAG to start at different tasks but if we can, then once harvest passes 7 times in a row, we can start skipping it and starting the DAG at transform. Once transform passes 7 times in a row, we can skip it and move to the next task, etc. This way we don't keep hitting the data provider's API once we've built up confidence that the harvest tasks are resilient. When alls tasks have passes 7 times in a row, we can automate them to run once a month at midnight (I can configure this in the catolog and manage this once these values are configurable).

aaron-collier commented 2 years ago

Closed by https://github.com/sul-dlss/dlme-airflow/pull/238