kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
92 stars 89 forks source link

`Kedro-Airflow` configuration #229

Closed sbrugman closed 1 year ago

sbrugman commented 1 year ago

Description

When using the kedro to Airflow dags conversion, configuration differs per DAG, there is currently no out-of-the-box way to automatically provide parameters such as the schedule_interval, owner per pipeline. This is useful when the dags are generated and deployed in a devops pipeline without manual intervention.

Happy to contribute this feature once there is consensus on an implementation.

Possible Implementation

Preferably, configuration such as conf/base/airflow.yml or conf/base/airflow/[PIPELINE].yml is passed on to the template rendering as kwargs. The benefit is that the configuration is in one place, and it's consistent within the Kedro framework.

Is any of the other CLI kedro [command] having access to the kedro config?

Another implementation is to allow argument to kedro airflow create that supports passing parameters (e.g. --param key=value). The user then has to take responsibility of passing the parameters.

Possible Alternatives

Alternatively, the user generates templates for each pipeline. This requires no modification of the plugin, but puts a lot of burden on the user.

noklam commented 1 year ago

I have a few questions @sbrugman

  1. Is it possible to implement in a way that we don't need to keep updating the list of argument like schedule_interval, owner?
  2. It shouldn't be hard to read config from airflow.yml, this can be done but it is fundamentally the same as passing it via the CLI

    Another implementation is to allow argument to kedro airflow create that supports passing parameters (e.g. --param key=value). The user then has to take responsibility of passing the parameters.

This is same as 2, I think 1 is the more important question here, whether how we pass or parse the argument is trivial.

sbrugman commented 1 year ago

Am I understanding your question correctly that if the user add an additional parameter, can we make it in a way that the template does not have to be updated?

The default_args and DAG arguments can be generated from a dictionary in the config. Then only when the user adds a parameter that are not for the dag or the same for each node (default_args), the template needs modification.

The arguments that already in the template file, would be configured dynamically. The list itself and its defaults will stay the same. Changing the values is up to the user.

There is small a difference in functionality between reading from the Kedro config (e.g. airflow.yml) and passing explicit parameters. This lies in that Kedro offers multiple config patterns. (I would prefer this option)

noklam commented 1 year ago

@sbrugman in that case I think this is a good improvement.

Regard to config, it should be quite easy to do, using the after_contrxt_created hook you can use kedro config.

The CLI argument should override config value if provided.

merelcht commented 1 year ago

Closed by https://github.com/kedro-org/kedro-plugins/pull/233