Created any pipeline that fails. I created this simple pipeline. Uploaded from the UI "Upload pipeline". Start a run of this pipeline with fail = true. The run fails as expected. Try to click the "Retry" button in the UI, the workflow will be stuck at pending, no retry will be done.
import os
from kfp import dsl
from kfp.dsl import Dataset, Model, Input, Output
from typing import Optional, List
@dsl.component(base_image="python:3.9")
def say_hello(name: str, number: int, opt_int: Optional[int]) -> List[str]:
hello_text = f'Hello, {name}! number={number} opt_int={opt_int}'
print(hello_text)
return ["hello", name, str(number), str(opt_int)]
#return hello_text
@dsl.component(base_image="python:3.9")
def say_hello_list(list_hello: List[str]) -> str:
hello_text = f'Hello list_hello={list_hello}'
print(hello_text)
return hello_text
@dsl.component(base_image="python:3.9", packages_to_install=['pandas==1.3.5', "numpy==1.*"])
def create_dataset(iris_dataset: Output[Dataset], fail: bool):
import pandas as pd
import random
r = random.random()
#print("Failing at random < 0.5. r=", r)
#if r < 0.5:
# raise ValueError(f"Failing at random! r={r}")
if fail:
raise ValueError("fail=true, failing!")
df = pd.DataFrame({"name": ["a", "b", "c"], "age": [11, 12, 13]})
with open(iris_dataset.path, 'w') as f:
df.to_csv(f)
@dsl.component(base_image="python:3.9", packages_to_install=['pandas==1.3.5', "numpy==1.*"])
def print_dataset(ds: Input[Dataset], a: int):
print(f"hello a={a}")
print(ds)
@dsl.pipeline
def hello_pipeline(recipient: str, number: int, opt_int: Optional[int], fail: bool) -> str:
hello_task = say_hello(name=recipient, number=number, opt_int=opt_int)
hello_list_task = say_hello_list(list_hello=hello_task.output)
create_dataset_task = create_dataset(fail=fail)
create_dataset_task.set_caching_options(False)
print_dataset(ds=create_dataset_task.output, a=44)
return hello_list_task.output
from kfp import compiler
compiler.Compiler().compile(hello_pipeline, os.path.basename(__file__).replace(".py", ".yaml"))
Expected result
Retry should work.
By the way, I also deployed before kubeflow pipelines 2.0.3, and retry did work as expected. So it must be something that broke in the recent releases.
Environment
How did you deploy Kubeflow Pipelines (KFP)? Standalone kubeflow pipelines using the AWS installation instructions here: https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env/aws
KFP version: 2.3.0
KFP SDK version: 2.9.0
Steps to reproduce
Created any pipeline that fails. I created this simple pipeline. Uploaded from the UI "Upload pipeline". Start a run of this pipeline with
fail = true
. The run fails as expected. Try to click the "Retry" button in the UI, the workflow will be stuck at pending, no retry will be done.Expected result
Retry should work. By the way, I also deployed before kubeflow pipelines 2.0.3, and retry did work as expected. So it must be something that broke in the recent releases.
Materials and Reference
Impacted by this bug? Give it a 👍.