hybedot / dvd-rental-pipeline

3 stars 0 forks source link

Why do you need different DAGs to extract and load different table? #1

Open lawal-hash opened 2 months ago

lawal-hash commented 2 months ago

We can use the task group and dynamic mapping API in airflow, this ensures we have a single DAG and each table task (extract-load) is done in parallel, which archives the same objective as what you had implemented. However, I believe this suggestion will enforce DRY.

hybedot commented 1 month ago

Having different DAG for extract and load enables easy management of ingestion of the tables. It also allow tables to have different ingestion schedules. As business requirements may require you load some tables hourly, daily and weekly.

Doing all extract and load in parallel especially when you have a lot of tables may not be practical in prod environment. Managing your compute resources will be difficult.