kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.55k stars 1.6k forks source link

[backend] Argo Workflows is Using Legacy Pod Patches #9110

Closed TobiasGoerke closed 5 months ago

TobiasGoerke commented 1 year ago

When executing any pipeline, the following lines are logged in the wait container:

# ...
time="2023-04-06T07:58:47.339Z" level=info msg="Create workflowtaskresults 403"
time="2023-04-06T07:58:47.339Z" level=warning msg="failed to patch task set, falling back to legacy/insecure pod patch, see https://argoproj.github.io/argo-workflows/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:kubeflow-user-example-com:default-editor\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"kubeflow-user-example-com\""
time="2023-04-06T07:58:47.356Z" level=info msg="Patch pods 200"
time="2023-04-06T07:58:47.362Z" level=info msg="Killing sidecars []"
time="2023-04-06T07:58:47.362Z" level=info msg="Alloc=7530 TotalAlloc=12844 Sys=24530 NumGC=4 Goroutines=12"

In addition to Argo considering this method to be legacy / insecure, we've had issue with pipeline steps being stuck / not being displayed as succeeded in the UI, depending on the files that were written to the OutputPath, as the resulting pod's annotation was malformed.

Fixing this RBAC issue is easily done by adding the resource workflowtaskresults to the aggregate-to-kubeflow-pipelines-edit ClusterRole.

However, this will cause Argo Workflow to no longer write the annotation workflows.argoproj.io/outputs to the pipeline's pods, which several Kubeflow components rely on, e.g.:

zijianjoy commented 1 year ago

Assign @Linchin to fix this issue by adding RBAC permission to aggregate-to-kubeflow-pipelines-edit.

cc @gkcalat

Sharathmk99 commented 1 year ago

I did face the same problem in Kubeflow 1.7.0. Thank you @zijianjoy for looking into it.

Linchin commented 1 year ago

Hi @TobiasGoerke, thank you so much for reporting the issue! I found the place to add the permission, which seems to be associated with permissions given to service account in a namespace. Could you please help me and share a minimum pipeline that has the original issue, i.e., "pipeline steps being stuck / not being displayed as succeeded in the UI"? Thank you!

TobiasGoerke commented 1 year ago

Hi @TobiasGoerke, thank you so much for reporting the issue! I found the place to add the permission, which seems to be associated with permissions given to service account in a namespace. Could you please help me and share a minimum pipeline that has the original issue, i.e., "pipeline steps being stuck / not being displayed as succeeded in the UI"? Thank you!

Glad to hear so! Unfortunately, I'm not able to reproduce the stuck pipeline issue anymore. I recall the output file's content being stored to the pod annotation workflows.argoproj.io/outputs, though.. The pipeline must have looked similar to this

# Attention: this example doesn't reproduce the issue
import kfp.dsl as dsl
from kfp.components import OutputPath, create_component_from_func

def example_step(test_path: OutputPath()):
    import pickle
    import numpy as np

    arr = [np.array([1, 2, 3]), np.array([1, 2, 3]), np.array([1, 2, 3])]

    with open(test_path, "wb") as f:
        pickle.dump(arr, f)

    print("Finished")

example_op = create_component_from_func(
    example_step, base_image=BASE_IMAGE, packages_to_install=["numpy"]
)

@dsl.pipeline(name="Test Pipeline")
def pipeline():
    example_op()

plain_pipeline_result = client.create_run_from_pipeline_func(
    pipeline, arguments={}, experiment_name=EXPERIMENT_NAME, namespace=NAMESPACE
)
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

proluoo commented 11 months ago

Hi @TobiasGoerke, thank you so much for reporting the issue! I found the place to add the permission, which seems to be associated with permissions given to service account in a namespace. Could you please help me and share a minimum pipeline that has the original issue, i.e., "pipeline steps being stuck / not being displayed as succeeded in the UI"? Thank you!

I met the same question, and how to deal with it?

TobiasGoerke commented 11 months ago

Hi @TobiasGoerke, thank you so much for reporting the issue! I found the place to add the permission, which seems to be associated with permissions given to service account in a namespace. Could you please help me and share a minimum pipeline that has the original issue, i.e., "pipeline steps being stuck / not being displayed as succeeded in the UI"? Thank you!

I met the same question, and how to deal with it?

Our users redesigned their pipelines and I lost track of the issue as stuck pipelines haven't resurfaced for us. If you're currently having this problem, I'm sure providing a reproducible example would lead to this issue getting fixed. Cheers!

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 5 months ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.