opendatahub-io / opendatahub-community

Apache License 2.0
26 stars 34 forks source link

Elyra now part of ODH, but Airflow optional support needs to be there #45

Open shalberd opened 1 year ago

shalberd commented 1 year ago

Airflow has been made an optional tier-2 part of ODH in summer of 2022.

https://github.com/opendatahub-io-contrib/airflow-on-openshift

Recently, Elyra became a part of ODH via overlay. Even more recently, Elyra itself has been taken over by RedHat (from IBM).

https://github.com/opendatahub-io/notebooks/pull/58#issuecomment-1562378131

Since ODH has a top-tier focus on Kubeflow Pipelines, ODH wants to focus on Kubeflow Pipelines only in Elyra.

Elyra has for a long time had Airflow support in all sorts of ways

Airflow-specific operators

https://medium.com/ibm-data-ai/getting-started-with-apache-airflow-operators-in-elyra-aae882f80c4a

Generic pipelines

https://medium.com/ibm-data-ai/automate-your-machine-learning-workflow-tasks-using-elyra-and-apache-airflow-adf297adc455

, though Airflow 2.x support is still lacking, but will come, some tweaks needed for e.g. generic pipeline to DAG rendering, libraries have changed :-)

So it would be bad if the pipeline editor and runtime support for Airflow were removed. At least allow for optionally enabling it via Configmap or ENV variable, based on this

Background:

We plan to use both: data science pipelines / Kubeflow Pipelines for pure ML development and Airflow for more of an ETL / data engineering set of tasks.

LaVLaS commented 1 year ago

So it would be bad if the pipeline editor and runtime support for Airflow were removed.

This statement should be clarified to show that runtime support for Airflow was not removed from the Elyra package in the ODH Elyra notebook images that are built and supported as part of ODH Core. We only restrict the Elyra PipelinesProcessor to kfp (Data Science Pipelines) since that is what ODH supports.

There is no official support for Airflow in ODH as the integration is currently an ODH Contrib component (https://github.com/opendatahub-io-contrib/airflow-on-openshift) with no guarantee that the deployment works.

At least allow for optionally enabling it via Configmap or ENV variable, based on this

Since ODH does not officially support Airflow, you can still build and import a custom notebook image into ODH Dashboard that has Airflow pipelines processor enabled. Based on the offline comment by @harshad16, you can build an Elyra notebook image with the Airflow pipelines processor by modifying jupyter_elyra_config.py and building the notebook

If the Elyra Airflow notebook image works with the deployment of airflow-on-openshift in odh-contrib then you could submit a PR for review to odh-contrib/workbench-images

shalberd commented 1 year ago

@LaVLaS @harshad16 Airflow itself is no problem, I for now started talking to the Red Hat folks on what makes it run as a whole successfully (never use the mucked up postgres image that comes with it, use a decent way of running postgres like crunchy postgres via OLM) and some more.

https://github.com/opendatahub-io-contrib/airflow-on-openshift/issues/7#issuecomment-1599585068