sul-dlss / libsys-airflow

Airflow DAGS for migrating and managing ILS data into FOLIO along with other LibSys workflows
Apache License 2.0
5 stars 0 forks source link

Upload Instance UUIDs for Data Export should Trigger Selection DAG #1042

Closed jermnelson closed 2 months ago

jermnelson commented 3 months ago

Currently the data_export_upload_view creates the instanceIds file in the correct directory but then the various vendor selection DAGs do not have a way to query FOLIO for the MARC records or apply any transformations to those records in the MARC files.

We should add a TriggerDagRunOperator for the vendor selection DAG with a new param with a file path to the instanceIds csv file. If this file path is present, each vendor selection DAG should have a branch operator that skips the fetch_folio_record_ids and save_ids_to_file task and goes directly to the fetch_marc_records (or equivalent task) task .

shelleydoljack commented 3 months ago

This didn't seem to work. I uploaded a csv file for gobi new and the dag run skipped all the tasks. Link to dag run https://sul-libsys-airflow-dev.stanford.edu/dags/select_gobi_records/grid?dag_run_id=manual__2024-06-07T21%3A03%3A35.243540%2B00%3A00&tab=graph

It should have went from check_record_ids to fetch_marc_records_from_folio, right?

shelleydoljack commented 3 months ago

I think the downstream task is getting skipped because of the default trigger rule of all_success, https://airflow.apache.org/docs/apache-airflow/2.8.3/core-concepts/dags.html#trigger-rules:

It’s important to be aware of the interaction between trigger rules and skipped tasks, especially tasks that are skipped as part of a branching operation. You almost never want to use all_success or all_failed downstream of a branching operation. Skipped tasks will cascade through trigger rules all_success and all_failed, and cause them to skip as well.