Data-aware Scheduling and Dataset concept added to Airflow
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
New to this release of Airflow is the concept of Datasets to Airflow, and with it a new way of scheduling dags:
data-aware scheduling.
This allows DAG runs to be automatically created as a result of a task "producing" a dataset. In some ways
this can be thought of as the inverse of TriggerDagRunOperator, where instead of the producing DAG
controlling which DAGs get created, the consuming DAGs can "listen" for changes.
A dataset is identified by a URI:
.. code-block:: python
from airflow import Dataset
The URI doesn't have to be absolute
dataset = Dataset(uri='my-dataset')
Or you can use a scheme to show where it lives.
dataset2 = Dataset(uri='s3://bucket/prefix')
To create a DAG that runs whenever a Dataset is updated use the new schedule parameter (see below) and
pass a list of 1 or more Datasets:
.. code-block:: python
with DAG(dag_id='dataset-consmer', schedule=[dataset]):
...
And to mark a task as producing a dataset pass the dataset(s) to the outlets attribute:
If you have the producer and consumer in different files you do not need to use the same Dataset object, two
Dataset()\s created with the same URI are equal.
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/waqasbhatti/cdips-pipeline/network/alerts).
Bumps apache-airflow from 2.3.4 to 2.4.0.
Release notes
Sourced from apache-airflow's releases.
... (truncated)
Changelog
Sourced from apache-airflow's changelog.
... (truncated)
Commits
0bcdba0
Update Release Note for 2.4.0a166fb8
Move the deserialization of custom XCom Backend to 2.4.0 (#26392)38d3d4f
Require dag_id arg for dags list-runs (#26357)0967259
Clear autoregistered DAGs if there are any import errors (#26398)62322ef
Add min attrs version (#26408)c7ea01d
Better validation of Dataset URI during dag parse (#26389)3871f00
Fix UI redirect (#26409)1ea86de
Work around pyupgrade edge cases (#26384)3e5397e
Add more to the ignore list for non-core changes (#26355)5a0a8f1
Fix pre-commit for checking revision heads map (#26373)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/waqasbhatti/cdips-pipeline/network/alerts).