opendatahub-io-contrib / workbench-images

Various custom Workbenches and Runtimes for Open Data Hub and OpenShift Data Science
MIT License
35 stars 23 forks source link

Elyra workbench with Airflow support #49

Open mamurak opened 8 months ago

mamurak commented 8 months ago

The current ODH/RHOAI workbenches come with Elyra-KFP for out-of-the-box integration with Data Science Pipelines. Upstream Elyra supports Airflow as an additional backend. I'm proposing a new community workbench image for the explicit purpose of developing and submitting Airflow pipelines through Elyra, with the option of integrating Github or Gitlab based git servers.

shalberd commented 8 months ago

Hi @mamurak, good to hear from you again. Yeah, I think we have to i.e. explicitely enable gitlab option during pip install, correct? At least that is what I did in my custom image build. And yes, I had to manually modifiy the list of runtimes in the array to again enable airflow. With Elyra 4.x, Airflow 2.x support for generic pipelines will come, by the way and Airflow 1.x support will be removed. I've been working with a lot of good people on that who have great input. Elyra overall will get more attention again, @LaVLaS told me that once we can show that Elyra is up to snuff regarding current Airflow, it might even make it back into the official notebook images (aside contrib) some time in future.

shalberd commented 3 months ago

@mamurak @harshad16 I am working on this and being quite successful, will make a PR soon, probably in 2-3 weeks. i.e. elyra[gitlab] only without all the kfp fluff in here. i.e. https://github.com/opendatahub-io-contrib/workbench-images/blob/main/snippets/ides/1-jupyter/files/utils/jupyter_elyra_config.py#L11

instead of

c.PipelineProcessorRegistry.runtimes = ['kfp']

in the airflow snippet

c.PipelineProcessorRegistry.runtimes = ['airflow']

or even

c.PipelineProcessorRegistry.runtimes = ['kfp','local']

and in the pipfile and pipfile.lock and requirements-jupyter.txt

https://github.com/opendatahub-io-contrib/workbench-images/blob/main/snippets/bundles/1-minimal/py39/Pipfile#L18

instead of

"elyra[kfp-tekton]" = "~=3.15.0"

for that airflow case then

"elyra[gitlab]" = "~=3.15.0"

I've been using a 3.16-dev locally built wheel file cause of changes for Airflow 2 support, but this shows the general direction.

Will also propose what I found out to complete Elyra without kfp (got that working) as a PR by me (not @akchinSTC, but his initial work was invaluable/very useful to me) in Elyra https://github.com/elyra-ai/elyra/pull/3144

koep commented 3 months ago

@thomas-gremm :clinking_glasses:

koep commented 1 month ago

hi @shalberd , I was wondering if you could share an update on your PR?

shalberd commented 1 month ago

August 15, still some loose ends to tie up, but gettin' there. See my communication in Slack. Y'all are not the only ones who think it is worth it having Airflow / Airflow 2 support in Workbench and runtime Images. Key is bootstrapper.py in https://github.com/opendatahub-io-contrib/workbench-images/blob/main/snippets/ides/5-runtime/files/utils/bootstrapper.py, among other things.

We have disabled Pipelines in Open Data Hub Dashboard and do everything with a, for now, custom Elyra wheel file supporting Airflow 2.x for generic pipeline nodes DAG code rendering. I can confirm this work nicely https://github.com/elyra-ai/elyra/pull/3167/files Just want to also add all the other Airflow 2 aspects (parsing Airflow 2.x wheel file for operators via AST https://codedamn.com/news/python/python-abstract-syntax-trees-ast-manipulating-code-core and assembling the package catalog Elyra GUI fields correctly, see https://github.com/elyra-ai/elyra/pull/3208 For now, with my custom built wheel file, I am just not using the package catalog functions ... The goal is to get all this with no more Airflow 1.x, but airflow 2.x, support into Elyra 4. Tracker: https://github.com/elyra-ai/elyra/issues/3165

About "with the option of integrating Github or Gitlab based git servers." yeah, we have Gitlab, too. So I added elyra = {extras = ["gitlab"] to Pipenv, for example. Ok, file ref :-) elyra = {extras = ["gitlab"], file = "elyra-3.16.0.dev0-py3-none-any.whl"}

I will first make a PR here around August 14 disregarding the custom build and specialities for Airflow 2 support. Focusing on multi-runtime support in interactive-image-builder and some more tweaks here. Later, expect full Airflow 2 support in Elyra by October. Anyone who can help with how to parse AST style code in https://github.com/elyra-ai/elyra/pull/3208 is welcome to comment.