kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.59k stars 1.62k forks source link

[bug] Step in UI was completed, but didn't run at all #10623

Closed asaff1 closed 6 months ago

asaff1 commented 6 months ago

Environment

AWS EKS kubeflow 1.0.4

Steps to reproduce

Sometimes in the pipeline kubeflow will show that the step was run, while in fact it did not run. Retry won't work in that case. See here: image The step that did not run is above the failed "reports" step. (this is why the reports step was failed, it relies on the outputs of the train step). It is also quite weird that for this failed "train" step kubeflow doesn't show "results were taken from cache" - unlike other steps that did run successfully. The other steps that did run, show that "results taken for cache" - no idea why, these steps were fully run. What cache what used here? Can you explain how can I debug this? I don't want any cache for my runs. This is happening randomly. Don't know exactly why this happens. I can clone the entire run and then it will succeed. (kind of random..). Where can I check why the step didn't run?

Expected result

The step should not shown as like it was run.

Materials and reference

Labels


Impacted by this bug? Give it a 👍.

rimolive commented 6 months ago

Hello! as an effort to better triage the issues from the questions and troubleshooting, we added a discussions tab in the repo. Please move your topic to the new place.

/close

google-oss-prow[bot] commented 6 months ago

@rimolive: Closing this issue.

In response to [this](https://github.com/kubeflow/pipelines/issues/10623#issuecomment-2023895284): >Hello! as an effort to better triage the issues from the questions and troubleshooting, we added a [discussions](https://github.com/kubeflow/pipelines/discussions) tab in the repo. Please move your topic to the new place. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
asaff1 commented 6 months ago

@rimolive The discussions tab is empty... no one is active there. Can someone here can provide help for such issues? I am really stuck with these pipelines not running, and I don't know which component in kubeflow I need to look at.

rimolive commented 6 months ago

@asaff1 I'll just post here but we should keep interactions through the Discussions tab. The Discussions tab is recently enabled as per https://github.com/kubeflow/pipelines/pull/10557.

As you posted a question during the weekend, then people will probably answer your questions during their working hours, as the contributors are doing volunteering work or they work for their companies to maintain Kubeflow.

At last, let's keep the discussion in the Discussions tab.