[backend] Retry not working in 2.3.0

Environment

How did you deploy Kubeflow Pipelines (KFP)? Standalone kubeflow pipelines using the AWS installation instructions here: https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env/aws
KFP version: 2.3.0
KFP SDK version: 2.9.0

Steps to reproduce

Created any pipeline that fails. I created this simple pipeline. Uploaded from the UI "Upload pipeline". Start a run of this pipeline with fail = true. The run fails as expected. Try to click the "Retry" button in the UI, the workflow will be stuck at pending, no retry will be done.

import os
from kfp import dsl
from kfp.dsl import Dataset, Model, Input, Output
from typing import Optional, List

@dsl.component(base_image="python:3.9")
def say_hello(name: str, number: int, opt_int: Optional[int]) -> List[str]:
    hello_text = f'Hello, {name}! number={number} opt_int={opt_int}'
    print(hello_text)
    return ["hello", name, str(number), str(opt_int)]
    #return hello_text

@dsl.component(base_image="python:3.9")
def say_hello_list(list_hello: List[str]) -> str:
    hello_text = f'Hello list_hello={list_hello}'
    print(hello_text)
    return hello_text

@dsl.component(base_image="python:3.9", packages_to_install=['pandas==1.3.5', "numpy==1.*"])
def create_dataset(iris_dataset: Output[Dataset], fail: bool):
    import pandas as pd
    import random
    r = random.random()
    #print("Failing at random < 0.5. r=", r)
    #if r < 0.5:
#    raise ValueError(f"Failing at random! r={r}")
    if fail:
        raise ValueError("fail=true, failing!")
    df = pd.DataFrame({"name": ["a", "b", "c"], "age": [11, 12, 13]})
    with open(iris_dataset.path, 'w') as f:
        df.to_csv(f)

@dsl.component(base_image="python:3.9", packages_to_install=['pandas==1.3.5', "numpy==1.*"])
def print_dataset(ds: Input[Dataset], a: int):
    print(f"hello a={a}")
    print(ds)

@dsl.pipeline
def hello_pipeline(recipient: str, number: int, opt_int: Optional[int], fail: bool) -> str:
    hello_task = say_hello(name=recipient, number=number, opt_int=opt_int)
    hello_list_task = say_hello_list(list_hello=hello_task.output)
    create_dataset_task = create_dataset(fail=fail)
    create_dataset_task.set_caching_options(False)
    print_dataset(ds=create_dataset_task.output, a=44)
    return hello_list_task.output

from kfp import compiler
compiler.Compiler().compile(hello_pipeline,  os.path.basename(__file__).replace(".py", ".yaml"))

Expected result

Retry should work. By the way, I also deployed before kubeflow pipelines 2.0.3, and retry did work as expected. So it must be something that broke in the recent releases.

Materials and Reference

Impacted by this bug? Give it a 👍.

kubeflow / pipelines

[backend] Retry not working in 2.3.0 #11329

Environment

Steps to reproduce

Expected result

Materials and Reference