opendatahub-io / ai-edge

ODH integration with AI at the Edge usecases
Apache License 2.0
9 stars 18 forks source link

Port Stefan's Tekton pipeline to DataScience pipelines #20

Closed israel-hdez closed 1 year ago

israel-hdez commented 1 year ago

Description

Stefan's pipeline uses Tekton, but the goal is to use DataScience pipelines, which is OpenDataHub's mechanism.

DataScience pipelines are an abstraction layer on top of Tekton, so it shouldn't be hard to port a Tekton pipeline to a DataScience pipeline.

User Story

References

A/C

TBD

grdryn commented 1 year ago

@israel-hdez is it also correct to say that Elyra is an abstraction layer on top of DataScience pipelines? My (limited) understand from @LaVLaS is that we should be converting it to Elyra, which will create the data science pipeline, which will create the tekton pipeline, does that sound right?

israel-hdez commented 1 year ago

@israel-hdez is it also correct to say that Elyra is an abstraction layer on top of DataScience pipelines?

Yes. I'm not an expert, but this is what was mentioned. So, I would agree.

My (limited) understand from @LaVLaS is that we should be converting it to Elyra, which will create the data science pipeline, which will create the tekton pipeline, does that sound right?

Sounds right, but I think this point will need some clarification.

After reading article [1], Elyra's audience seems to be Jupyter Notebook users. It seems to be more suited for the E2E workflow: load train data -> train model -> test model -> save model to storage -> containerize model -> deploy model.

Stefan's exaplme is just taking an already trained model and containerizing it.

With this in mind, #2 goal is not clear for me:

As I've said, I'm not an expert of the pipelines part, so I don't know if what I've wrote really makes sense :-)

[1] https://developer.ibm.com/articles/create-ai-pipelines-using-elyra-and-kubeflow-pipelines/

grdryn commented 1 year ago

As I've said, I'm not an expert of the pipelines part, so I don't know if what I've wrote really makes sense :-)

This applies to everything that I say too, so we're in the same boat :)

That's a very useful article that you've linked, and it makes some things a little clearer to me.

You're right that Elyra is targeted at Jupyter Notebook users. What personas would be considered Jupyter Notebook users? Would it just be data scientists, or also MLOps engineers?

I wonder how common the e2e flow that you describe is, from data prep all the way to deployment :thinking: In my thinking, maybe this should be split into some smaller pipelines? For example, I wouldn't expect to want to run every step again just to deploy a pre-existing model to a new target location/device, right?

How about this as a possible split:

Whether the latter two here (which, in my understanding, are the important parts for us right now) should be DS pipelines or Elyra pipelines, I'm not sure :thinking:

@LaVLaS @piotrpdev any thoughts on any of this?

israel-hdez commented 1 year ago

How about this as a possible split:

It makes sense to me. And I also agree that the latter two are the goals for the PoC, and that it is fuzzy if those should be DS or Elyra.

LaVLaS commented 1 year ago

@israel-hdez is it also correct to say that Elyra is an abstraction layer on top of DataScience pipelines?

That is correct. Elyra abstracts pipelines in a JupyterLab interface. DSP abstracts Tekton to a Python SDK. Ideally any default Tekton (Pipeline, PipelineRun, TektonTask) object should be supported all the way up the abstraction layer. If not then it is a bug or feature request

You're right that Elyra is targeted at Jupyter Notebook users. What personas would be considered Jupyter Notebook users? Would it just be data scientists, or also MLOps engineers?

Those are the key personas for Workbench (Jupyter notebook) users based on industry defaults. Elyra allows those users to stay within the safety of the JupyterLab interface but Elyra is not a hard requirement

I wonder how common the e2e flow that you describe is, from data prep all the way to deployment thinking In my thinking, maybe this should be split into some smaller pipelines? For example, I wouldn't expect to want to run every step again just to deploy a pre-existing model to a new target location/device, right?

Like @israel-hdez, the current example model is exported from AzureML so we are highlighting integration with AzureML into ODH --> MLOps Pipeline so we don't necessarily need to rely on Elyra since the first part ( load train data -> ... -> save model to storage) is already handled for us

piotrpdev commented 1 year ago

KFP/KFP-Tekton does not support running already existing Tasks/ClusterTasks and probably wont in the future, this puts us in a predicament because it means we cannot us buildah to build the container; openshift client to interact with the cluster; etc.

I was told the following in Slack:

@pplaczek I believe DSP is not a good fit for your use case. DSP / the Kubeflow Pipelines engine uses Tekton as a backend, but it's not based on Tekton concepts. KFP comes with its own domain specific language that does not cover Tekton. Data Science Pipelines is more centered around custom ML tasks defined either through the KFP SDK or Elyra. If you already have Tekton pipelines and want to reuse existing Tekton tasks, I would recommend sticking with Tekton/OpenShift Pipelines. My 2 cents. -- Max Murakami

@LaVLaS To proceed, I see the following options: Approach Pros Cons
Modify the YAML generated by Elyra/KFP-Tekton SDK to use the existing Tekton Task/ClusterTasks e.g. using sed Get to use existing TektonHub tasks Not supported by DSP and probably won't be in the future; requires text processing; a little janky
Rewrite the necessary ClusterTasks using Pipeline Components Supported by DSP and Elyra Will take some time to do; have to update these when the underlying image and/or API has breaking changes; could be complicated when storage and networking gets involved
Use OpenShift Pipelines directly without DSP We have Stefan's code which already achieves what we need; Using Tekton directly provides us more features and flexibility Have to come up with an easy way for the MLOps Engineer to use OSP, e.g. using some kind of web console or triggering an OSP pipeline using a KFP pipeline (see below)
Some mixed approach, e.g. using a ResourceOp to run a DSP pipeline and OSP pipeline in conjunction If done well, could be the best of all worlds Mixing different pipeline abstractions could be confusing and/or not work
LaVLaS commented 1 year ago

@piotrpdev Thank you for the breakdown of options. Use OpenShift Pipelines directly without DSP is the best option based on the fact that we already have the pipeline to build the image and ODH is still discussing a pathway to deliver an Intelligent Application into production (or in our case the Edge).