kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.57k stars 1.61k forks source link

[feature] CRD for Recurring Run #6001

Open munagekar opened 3 years ago

munagekar commented 3 years ago

Feature Area

/area backend

What feature would you like to see?

A CRD for recurring run. To make it possible to delete recurring runs with kubectl. Currently it is possible to delete and create recurring runs only with sdk or kubeflow gui, there is no state for recurring runs on the cluster.

Kubernetes has CronJobs, similarly pipeline runs are tracked with workflow, I would like a similar CRD for recurring run. Argo has cron-worklfow and cron-workflow status could these be used instead ?

What is the use case or pain point?

It is easier for dev-ops to track recurring-run with kubectl and see the status.

Is there a workaround currently?

Use the sdk or gui.


Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

zijianjoy commented 3 years ago

@munagekar Welcome contributing to create a RecurringRun CRD for this feature!

NikeNano commented 3 years ago

/assign I will take a look at this.

NikeNano commented 3 years ago

To get some more background, what is the difference that you expect from this CRD compare to the scheduledworkflow CRD that Kubeflow pipelines uses to today for recurring runs? @munagekar

munagekar commented 3 years ago

@NikeNano Thank you for looking into this.

I was not aware of the scheduled workflow crd, when I created the issue. This CRD largely solves the issue.

However, kubeflow pipelines stores scheduled workflows in a db instead of relying on etcd. This can lead to inconsistency when creating & deleting recurring runs from the kubectl. The scheduled workflows shown in the UI do not match those on the cluster.

REF: https://github.com/kubeflow/pipelines/issues/4862

NikeNano commented 3 years ago

Hmm do you mean that etcd should replace the Mysql database? I think in that case it will be completely seperated from kubeflow pipelines and all the tooling around it

As I understand the CRD:s will be stored in etcd since they are custom resources? But I think I might miss your point, could you elaborate @munagekar

UPDATE: missed to check your ref @munagekar, will do it.

NikeNano commented 3 years ago

After reading up and checking the code I definitely see your point @munagekar. I will continue to dig in to what is required to actually get ride of using Mysql for recurring runs.

I think we can continue the discussion from https://github.com/kubeflow/pipelines/issues/4862#issuecomment-906323221 here. What was the original reason for using Mysql, it seems to have the same information as on the CRD when I check now @Bobgy ?

NikeNano commented 3 years ago

I will write a design doc during the week.

Bobgy commented 3 years ago

To explain a bit of history, KFP decided to use mysql as source of truth instead of Kubernetes API, because many KFP features rely heavily on

However, it seems reasonable to assume the number of enabled recurring run configs should be at a size that can be handled by Kubernetes API. The tooling for gitops + kubernetes API seems like a natural fit for managing recurring runs. That's why I think https://github.com/kubeflow/pipelines/issues/4862#issuecomment-906090718 is a brilliant idea.

It's still worth investigating whether Kubernetes API's capability for filter, order and pagination is enough for recurring run API or not. That's what I'd like to see in the design doc. Even if there's sth still missing, we may now consider a solution that takes Kubernetes API as source of truth, so that new recurring run custom resources can be auto-synced to KFP DB.

Bobgy commented 3 years ago

cc @NikeNano, what I mentioned in https://github.com/kubeflow/pipelines/pull/6207#issuecomment-906807494 is exactly meant for this use-case. If users can manually author recurring run spec and apply them to cluster using gitops, then it's more user-friendly to let them specify pipeline names + default pipeline version, or pipeline name + version name, compared to IDs.

NikeNano commented 3 years ago

I have started to work on the design doc, will update it during the next days but leave it here if you like to add something already now: https://docs.google.com/document/d/1En7UCME3PabqPwaJZSk0GSx61B8BPdkhW_kW9t7Xdkg/edit?usp=sharing

NikeNano commented 3 years ago

Will update it during the weekend, have a better understanding of the persistance agent now.

NikeNano commented 2 years ago

Would be great to get some feeback from you guys @Bobgy and @munagekar

munagekar commented 2 years ago

@NikeNano I went through the design doc. The proposed changes sound good.

Like you mentioned in your design doc, if advanced filtering for scheduled jobs is a requirement, it might make sense to replicate changes from kubernetes to mysql, otherwise relying on kubernetes api makes sense and should be faster to implement.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

moey920 commented 1 year ago

Hello, you are always very helpful. Thank you for your efforts. Has the ability to control scheduledworkflows with kubectl been developed? It's hard to keep track of the exact progress. As shown below, when scheduledworkflows are deleted with kubectl, they are not deleted in kubeflow web, but the resources are deleted and the pipeline does not run according to the schedule. I want to check the synced deletion status on the web, what should I do? kubectl delete scheduledworkflows --all

image image
viktorsobol commented 1 year ago

Hi guys, @Bobgy @NikeNano

Thank you for the thoughtful discussion. Is there any progress on this issue?

thesuperzapper commented 1 year ago

@chensun @connor-mccarthy I think this is a good idea for KFP: symmetrically syncing between the ScheduledRun CRD and the MySQL backend, because right now, if you PATCH the CRD (or CREATE one, not using the KFP API), you will end up in an out-of-sync state.

This feature will also allow people to use git-ops tools like ArgoCD to manage which pipelines are scheduled in each namespace.

zijianjoy commented 1 year ago

Following up on this feature request: I think we should consider the scope of Kubernetes Resource Management (KRM) support in KFP in general. I suspect that if we implement RecurringRun CRD today, we will receive similar requests for Run/Pipeline and other objects in the future.

Again, welcome contributions on this CRD for KFP topic!

Reference: We are currently using a mock API group and CRD to represent KFP objects: https://github.com/kubeflow/pipelines/blob/f626629f79c833b159336fe9963d44b77071c14f/backend/src/apiserver/common/const.go#L18-L20.

viktorsobol commented 1 year ago

Thank you guys, @thesuperzapper and @zijianjoy Regarding contribution, I see that the design proposal already was provided - https://docs.google.com/document/d/1En7UCME3PabqPwaJZSk0GSx61B8BPdkhW_kW9t7Xdkg/edit?pli=1#heading=h.rebcfzcla50x

Does it still require an agreement?

NikeNano commented 1 year ago

Hi guys, @Bobgy @NikeNano

Thank you for the thoughtful discussion. Is there any progress on this issue?

Hey, I sadly don't have time to work on this so feel free to pick this up. Would be great so see this be used.

thesuperzapper commented 1 year ago

@zijianjoy @NikeNano this issue is related to supporting updates on Scheduled Runs (which is not currently supported), see my comment in that issue for reference:

thesuperzapper commented 1 year ago

Hey everyone, I have released a reference repository that demonstrates using GitOps to manage the schedules and definitions of Kubeflow Pipelines.

You can find it at deployKF/kubeflow-pipelines-gitops

It has limitations caused by Kubeflow Pipeline not allowing updates to "recurring runs" (only deleting and recreating), but I think the solution it uses will work for most production use-cases.

github-actions[bot] commented 3 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

thesuperzapper commented 3 months ago

This is still very critical and has not been resolved as of Kubeflow Pipelines 2.2.0.

Creating this CRD will require us to support updating recuring runs via the REST API first, so if you have ideas about that, please discuss them on:

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

thesuperzapper commented 1 month ago

This is still needed.