kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

[bug] Kubeflow workflow runs are in "Pending Execution" state #10901

Open sureshmol opened 4 months ago

sureshmol commented 4 months ago

Environment

git clone https://github.com/kubeflow/manifests.git cd manifests while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 10; done

Kubeflow version: 1.8 Kubernetes version: 1.27

Steps to reproduce

After installation of kubeflow, All the pods are in running state:

NAME                                                    READY   STATUS    RESTARTS       AGE
admission-webhook-deployment-699d5c848b-222gr           1/1     Running   1 (11d ago)    70d
cache-server-5b6f65b559-rk7dl                           2/2     Running   8 (9d ago)     24d
centraldashboard-6b554569f9-chww4                       2/2     Running   0              70d
jupyter-web-app-deployment-8f4945994-7x6lv              2/2     Running   6 (11d ago)    70d
katib-controller-57b7bf8bfb-7zr5f                       1/1     Running   0              24d
katib-db-manager-5b6c7c77f7-djjmk                       1/1     Running   4 (11d ago)    70d
katib-mysql-77b9495867-cflps                            1/1     Running   0              70d
katib-ui-b6f59c479-5mtvc                                2/2     Running   2 (11d ago)    70d
kserve-controller-manager-785787684f-v48zv              2/2     Running   720 (3d ago)   70d
kserve-models-web-app-6557dbf457-nq9md                  2/2     Running   7 (11d ago)    70d
kubeflow-pipelines-profile-controller-6476b6cb9-p2q22   1/1     Running   0              24d
metacontroller-0                                        1/1     Running   1 (11d ago)    70d
metadata-envoy-deployment-78755fbcf5-jb4x4              1/1     Running   1 (11d ago)    43d
metadata-grpc-deployment-5644fb9768-72mq6               2/2     Running   12 (49d ago)   70d
metadata-writer-7b6c47cbdb-f9nrr                        2/2     Running   40 (9d ago)    70d
minio-55464b6ddb-ppwdd                                  2/2     Running   0              11d
ml-pipeline-5f9dddb4bb-fvzmm                            2/2     Running   6 (70d ago)    70d
ml-pipeline-persistenceagent-5854fc6785-48p24           2/2     Running   0              109m
ml-pipeline-scheduledworkflow-647b8f6db9-r4scn          2/2     Running   2 (11d ago)    70d
ml-pipeline-ui-785796db48-m98vg                         2/2     Running   2 (11d ago)    70d
ml-pipeline-viewer-crd-5d7b895d6d-nzj5s                 2/2     Running   4 (11d ago)    24d
ml-pipeline-visualizationserver-849c9844f-kjljj         2/2     Running   0              70d
mysql-7d8b8ff4f4-wghjv                                  2/2     Running   0              11d
notebook-controller-deployment-77584d69f-2g7w6          2/2     Running   2 (70d ago)    70d
profiles-deployment-785c5669f6-v892j                    3/3     Running   25 (11d ago)   70d
pvcviewer-controller-manager-569cd76d57-c2wbv           3/3     Running   710 (3d ago)   70d
tensorboard-controller-deployment-579b5d8c9d-87v7n      3/3     Running   2 (70d ago)    70d
tensorboards-web-app-deployment-b68bd88d5-cnf6j         2/2     Running   0              70d
training-operator-77dc7667fc-tqz4l                      1/1     Running   1 (49d ago)    70d
volumes-web-app-deployment-7986fc8f74-ghsrn             2/2     Running   6 (11d ago)    70d
workflow-controller-85d5498fd4-cv4nz                    2/2     Running   1 (137m ago)   137m

I have created a pipeline from UI using a yaml file and pipeline is successfully created. See image below of the pipeline:

image

When i create a Run for this pipeline. Run are going into Pending Execution State and when i open the run, graphs are not loading yet. See image below:

image image

But when I check for the workflow status using kubectl. Workflows are running started and completed successfully. Please see image below:

image

Expected result

I should be seeing the actual status of the workflows in the UI.

Can you please help me with this Bug, I am unable to figure out a solution for this

Impacted by this bug? Give it a 👍.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 1 month ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

imranrazakhan commented 1 month ago

@sureshmol Are you able to fix this issue? I am facing same.

imranrazakhan commented 1 month ago

/reopen

google-oss-prow[bot] commented 1 month ago

@imranrazakhan: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubeflow/pipelines/issues/10901#issuecomment-2366896989): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
google-oss-prow[bot] commented 1 month ago

@sureshmoligi: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to [this](https://github.com/kubeflow/pipelines/issues/10901#issuecomment-2372936857): >@imranrazakhan >No, I didn't find a fix yet. >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
sureshmol commented 1 month ago

@imranrazakhan No, I didn't find the fix yet. /reopen

google-oss-prow[bot] commented 1 month ago

@sureshmol: Reopened this issue.

In response to [this](https://github.com/kubeflow/pipelines/issues/10901#issuecomment-2372955997): >@imranrazakhan >No, I didn't find the fix yet. >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
RuiClaro commented 1 week ago

I also have this issue.

Kubernetes version: 1.29 Kubefow version: 1.9.0

You can see the run finished successfully, but hangs in the "Pending Execution" stage: image

The same happens with all runs, even with the failed ones: image

alexd2580 commented 15 minutes ago

I also have the same issue, how can this be debugged? KF: 1.8.16 k8s: 1.29.8 on Azure