Closed inc0 closed 1 year ago
I'm not sure this is needed. Currently KF Pipelines use Argo Workflow CRD without changes. Pipelines do not extend it - there are no extra pipeline-specific fields.
If we decide to replace Argo, then we'll create a new CRD.
not everyone uses python requires to learn whole new API and DSL
I do not think KF Pipelines requires you to do that. Pipelines Python SDK just allows some people to write
preprocess = load_component(...)
train = load_component(...)
@pipeline
def mlapp():
train(preprocess(train_set).output)
instead of writing the YAML manually.
So, if I'd submit argo workflow, it will be picked up by pipelines immediatly? How, for example, will it save metrics?
Hi inc0@, having a CRD for pipeline is being considered. We are planing to implement this in multiple steps:
How, for example, will it save metrics?
To provide metrics the workflow task must have an output artifact called 'mlpipeline-metrics'.
So, if I'd submit argo workflow, it will be picked up by pipelines immediatly?
You have to submit the workflow against the pipelines API.
You can use either python client (kfp.Client(...).run_pipeline(...)
) or CLI.
https://github.com/kubeflow/pipelines/tree/master/backend/src/cmd/ml
Note that it's not considered a supported mode of operation. It may break in future.
@Ark-kun, having a CRD for pipeline is something that we are considering. Let's please keep this open.
Adding to this, having a Pipelines CRD would also provide a path for multi-user pipelines, as Kubernetes CRDs have built-in authentication and authorization via the API Server, like any other Kubernetes Object. As such, maybe there is some overlap with https://github.com/kubeflow/pipelines/issues/1223
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
/lifecycle frozen
I think this is something we'd want to consider for the long term.
Chiming in here, more background in this Slack thread.
Our use case at Zillow is to be able to deploy monitoring alongside scheduled pipelines. We use Datadog internally and have created a K8s operator for creating Datadog Monitors (essentially alerts triggered by metrics over thresholds), it just reconciles the state of the resources with teh Datadog API.
We would like to be able to use a standard kubectl apply
(or better a kubectl apply -k
with kustomize
) to deploy both a ScheduledWorkflow
CRD, see these samples alongside these custom DatadogMonitor
CRD resources. This is an extensible pattern and in teh future we are planning to produce a Datadog Dashboards operator so we could dynamically create dashboards on a per ScheduledWorkflow
basis (useful for defining and monitoring SLOs for instance).
This would also allow us to unify our CICD pipeline with KFServing. Essentially we have the same pattern where we generate a set of resource manifests using kustomize
and in that case it's an InferenceService
+ a set of DatadogMonitors
. As we have an underlying core K8s team they already have CICD pipelines for running kubectl apply -k
super easily internally so instead of the custom CICD pipelines we need to maintain atop the kfp
CLI/SDK tooling the current public interfaces KFP exposes this would allow us to align wholly with the rest of our company reducing maintenance overheads!
@alexlatchford for clarification, does the use case only applies to ScheduledWorkflow
?
Sounds to me one time pipeline runs do not need a CRD interface.
I think we'd ideally prefer to just use the same CICD pipeline regardless so I'd imagine we'd use the ScheduledWorkflow
in this mode just to unify the deployment process.
Is this something that is still possible? it would be nice to have pipeline CRDs to be able to integrate pipelines with GitOps without loosing all UI capabilities
- First, we will create a pipeline spec that will combine and Argo workflow + additional data needed for ML pipelines.
- Initially, this spec will be processed by the pipeline API server and turned into an Argo workflow.
- Later on, we could turn this pipeline spec into a standalone CRD.
- The long term expectation is that the pipeline CRD will let us combine multiple orchestration CRDs useful for ML (Argo workflow, HP tuning, etc.) and let users specify additional, optional, ML metadata.
@vicaire as I understand steps 1 + 2 have been completed, are there still plans to introduce a standalone CRD? Having to rely on Python SDK and submitting files to Kubeflow API instead of Kubernetes API makes Kubeflow a really hard sell. In our case, dedicated CI/CD workflows need to be developed, we can't rely on any of the tooling (e.g. helm-secrets
) that works virtually with any other thing deployed onto Kubernetes too.
Currently, there's no plan to make pipeline a CRD. In fact, we are moving to make pipeline platform-agnostic.
I have the same use case as kujon, rubenaranamorera & alexlatchford . We deploy things using a Flux based GitOps workflow. The lack of on option to declaratively define kubeflow pipelines as kubernetes resource objects that can be kubectl apply
'ed is a pain, and seems like a departure from K8s norms. It also seems inconsistent with other KF components like Kserve, were your have InferenceService resource objects etc.
Currently to use pipelines you need to run python SDK, only that generates argo workflow underneath etc. I think this is very limiting because:
I propose creating new CRD that will be effectively Argo workflow with additional options. For example
This would make transition to pipelines much easier as Operators are already well known pattern and it handles a lot of things for us, including RBAC multitenancy, API auth etc.