Closed pandemicsyn closed 2 years ago
Apologies as logging this as Draft PR rather than an issue but wanted to link the code change as well incase it helps clarify.
@pandemicsyn I was wondering this as well during the demo. If theyre to be run in order like Taylor described i.e. splitting a single command into tasks, then I'd expect them to have dependencies in Airflow like your second screenshot. Otherwise for tap-csv target-postgres dbt:run
that gets split into the yaml below, you would have dbt run potentially at the same time or before the EL completes:
jobs:
- name: g-to-p-job
tasks:
- tap-gitlab target-postgres
- dbt:run
@pandemicsyn great call out and I'd made a note to follow up about this. Yes, I would expect them to be sequential like you have them in the second picture. The parallel scenario we would eventually support by having nested arrays eventually I think.
Perfect, sounds like we're all in agreement! I'm gonna go ahead and merge this :shipit:
After the convo and confusion yesterday about tasks, schedules, jobs and airflow. I wanted to make sure that we're actually generating and representing the tasks in Airflow correctly and I'm seconding guessing a bit what we have.
Today we create 1 dag with indepenent tasks as created in: https://github.com/meltano/files-airflow/pull/18
So given a yaml like:
That yields something like:
Are we sure we don't want the tasks to be linked e.g. task 1 depends on task 0 upstream instead like in the example below.