kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.5k stars 1.57k forks source link

Argo events for triggering pipelines #651

Open swiftdiaries opened 5 years ago

swiftdiaries commented 5 years ago

It'd be great if we could trigger pipelines automatically wrt events. Use Case 1: When a model is uploaded to an object store -> trigger a step (pipeline) to deploy. Use Case 2: When data arrives at a local volume / external storage -> trigger a pipeline to train.

This is related to https://github.com/kubeflow/pipelines/issues/604.

I'd love to see this feature and help out in the implementation with some PRs as well (if it's on the roadmap)

paveldournov commented 5 years ago

@swiftdiaries - yes, this feature is on the roadmap. Let's collaborate on the design.

/assign @vicaire

swiftdiaries commented 5 years ago

Awesome ! Looking forward to this :)

vicaire commented 5 years ago

I will follow up on this thread as soon as we start tackling this. Thanks.

vicaire commented 5 years ago

@swiftdiaries

It's a bit short but I provided an outline of how we plan to support event-driven pipelines here: https://docs.google.com/document/d/1O5n02SzMYmLH0cMkykxHWWWe7eMzaP1vk7Y3fBbLoD8/edit#heading=h.mhe3tnle0c9o

(See event-driven pipelines and data-driven pipelines)

In a nutshell:

WDYT?

swiftdiaries commented 5 years ago

Sorry for the late reply.

The overall idea is sound. I found this thread on kubeflow-discuss quite interesting on how Argo Events is integrated with Argo Workflow at GitHub.

Also, what is the status for this? If there are tasks to be done, happy to work together on this one

vicaire commented 5 years ago

@swiftdiaries,

The metadatastore is currently being designed with collaboration from the KF community.

We could start by looking at the best way to integrate Argo events with KFP for common use cases. Adding the "help wanted" flag. Contributions/Proposals are welcome.

vicaire commented 5 years ago

Note, resolving this issue should enable support for continuous online learning, as requested in https://github.com/kubeflow/pipelines/issues/1053

animeshsingh commented 4 years ago

Do we need to make it specifc to Argo events? Can it be designed in generic way to support something like KNative eventing? @vicaire please include us if there are any backdoor design discussions going at this end

VaibhavPage commented 4 years ago

@jingzhang36 Is this feature being actively worked on?

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

jondoering commented 3 years ago

Any updates on this feature?

Bobgy commented 3 years ago

/reopen looks like someone cares

no one is working on this.

I am curious what makes it different from using KFP SDK triggered by the event

k8s-ci-robot commented 3 years ago

@Bobgy: Reopened this issue.

In response to [this](https://github.com/kubeflow/pipelines/issues/651#issuecomment-669215207): >/reopen >looks like someone cares > >no one is working on this. > >I am curious what makes it different from using KFP SDK triggered by the event Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
marcjimz commented 3 years ago

+1 on this as an issue. When data lands on a specific volume, an event should be trigger. Should this logic live in KFP?

Secondly, when an event is created, we would need a listener service to trigger the corresponding KFP pipeline. Is this sufficient?

@swiftdiaries

It's a bit short but I provided an outline of how we plan to support event-driven pipelines here: https://docs.google.com/document/d/1O5n02SzMYmLH0cMkykxHWWWe7eMzaP1vk7Y3fBbLoD8/edit#heading=h.mhe3tnle0c9o

(See event-driven pipelines and data-driven pipelines)

In a nutshell:

  • We will have a metadata store storing info about the data generated by a workflow (metadata).
  • Events can also be stored in that metadata stored from various sources (webhook, pub/sub, etc.) using piece of infrastructure decoupled from the rest of the system.
  • An event-driven CRD will let users specify a workflow to execute each time new data of a particular type is added to the metadata store.

WDYT?

+1 on this, would like to see both the event trigger and data trigger configuration make it to KFP. Is Argo events the only solution here or should we use something more generic to Kubeflow?

imagr-pat commented 3 years ago

+1 for this issue.

We would like to be able trigger pipeline runs from GCP pubsub events

Bobgy commented 3 years ago

@imagr-pat for GCP pubsub events, it's possible to add a cloud function that listens to it and runs a kfp client, does it work for you?

codebeard1 commented 3 years ago

plus 1 for me on this issue as well.

Ideally I would like to see native Kafka support for event based triggering of Kubeflow pipelines. This way we don't have to use something outside like Nifi or Airflow to have to trigger pipelines based upon an event. This is all to ensure there is better native support for online learning which is event driven based upon the mini-batches of training data that constantly flow into the pipelines to re-train and re-deploy a model.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

maganaluis commented 3 years ago

hold

albertshx commented 3 years ago

Look forward to seeing this feature so we don't need AWS lambda or Cloud Function to chain relevant pipelines ~~ A big thank you ~~

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jeanphilippelingrand commented 3 years ago

Our team would like to integrate with SQS queue. The use case is the following. We would have data pipeline on airflow and ml pipeline on kubeflow. The integration would allow to run the ml pipeline once the data pipeline is completed.

midhun1998 commented 2 years ago

+1 on this issue. This issue could solve the CD approach partially too. We could have an argo workflow which could do CD for us. This workflow could be triggered using GitHub webhook.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

xin-hao-awx commented 2 years ago

@chensun is this has been designed? or you guys are open to a community design proposal?

WaterKnight1998 commented 2 years ago

Any news on this?

I am looking for event/data driven pipelines that get triggered when new data arrives

droctothorpe commented 2 years ago

+1

satriawadhipurusa commented 1 year ago

+1

rsr23 commented 1 year ago

+1

magdalenakuhn commented 1 year ago

Any update on this? we’re also quite interested in this! Current workaround would be to use an AWS lambda function or Google Cloud Function like described here https://amygdala.github.io/gcp_blog/ml/kfp/mlops/tfdv/gcf/2021/02/26/kfp_tfdv_event_triggered.html#event-triggered-pipeline-runs that simply executes kfp.Client().run_pipeline()

charlesmelby commented 1 year ago

another request for updates on this issue!

titoeb commented 9 months ago

+1

github-actions[bot] commented 6 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.