kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.61k stars 1.63k forks source link

[backend] Pipeline v2 Advanced Example Not completing #10458

Closed Boes-man closed 7 months ago

Boes-man commented 9 months ago

I dont have permssions to move the issue here

Boes-man commented 9 months ago

Hello, I am trying to run this kf example pipeline (Build a more advanced ML pipeline). I modified it: removed endpoint and added compiler. I can now upload the pipeline and run it, but it is not completing. I have found some error message related to RBAC and "files not found" but I am not sure if its related or how to fix them. Thanks

iris-train-pl.py.txt iris-train-pl.yaml.txt

Screenshot 2024-02-08 at 1 21 46 pm Screenshot 2024-02-08 at 1 23 08 pm Screenshot 2024-02-08 at 1 22 59 pm

`Error message from pods

} │ │ I0208 00:43:52.024138 20 main.go:118] input ContainerSpec:{ │ │ "args": [ │ │ "--executor_input", │ │ "{{$}}", │ │ "--function_to_execute", │ │ "train_model" │ │ ], │ │ "command": [ │ │ "sh", │ │ "-c", │ │ "\nif ! [ -x \"$(command -v pip)\" ]; then\n python3 -m ensurepip || python3 -m ensurepip --user || apt-get install python3-pip\nfi\n\nPIP_DISABLE_PIP_VERSION_CHECK=1 python3 -m pip i │ │ "sh", │ │ "-ec", │ │ "program_path=$(mktemp -d)\n\nprintf \"%s\" \"$0\" \u003e \"$program_path/ephemeral_component.py\"\n_KFP_RUNTIME=true python3 -m kfp.dsl.executor_main --component │ │ "\nimport kfp\nfrom kfp import dsl\nfrom kfp.dsl import \nfrom typing import \n\ndef train_model(\n normalized_iris_dataset: Input[Dataset],\n model: Output[Model],\n n_neighb │ │ ], │ │ "image": "python:3.7" │ │ } │ │ I0208 00:43:52.024455 20 cache.go:139] Cannot detect ml-pipeline in the same namespace, default to ml-pipeline.kubeflow:8887 as KFP endpoint. │ │ I0208 00:43:52.024469 20 cache.go:116] Connecting to cache endpoint ml-pipeline.kubeflow:8887 │ │ I0208 00:43:52.057353 20 client.go:251] Pipeline Context: id:18 name:"iris-training-pipeline" type_id:11 create_time_since_epoch:1707347981672 last_update_time_since_epoch:1707347981672 │ │ I0208 00:43:52.096094 20 client.go:259] Pipeline Run Context: id:22 name:"7142d5a6-0b61-4e1f-9c63-ac17d7c2ae67" type_id:12 custom_properties:{key:"namespace" value:{string_value:"kubefl │ │ I0208 00:43:52.257403 20 driver.go:241] parent DAG: id:75 type_id:13 last_known_state:RUNNING custom_properties:{key:"display_name" value:{string_value:"for-loop-1"}} custom_properties: │ │ I0208 00:43:52.258490 20 driver.go:771] parent DAG input parameters map[pipelinechannel--neighbors-loop-item:number_value:3] │ │ F0208 00:43:52.258574 20 main.go:76] KFP driver: driver.Container(pipelineName=iris-training-pipeline, runID=7142d5a6-0b61-4e1f-9c63-ac17d7c2ae67, task="train-model", component="comp-tr │ │ time="2024-02-08T00:43:52.960Z" level=info msg="sub-process exited" argo=true error="" │ │ time="2024-02-08T00:43:52.960Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory" │ │ time="2024-02-08T00:43:52.960Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory" │ │ time="2024-02-08T00:43:52.960Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory" `

List of images (no sure how to finf kf version? 44 [docker.io/istio/proxyv2:1.17.5](http://docker.io/istio/proxyv2:1.17.5) 1 [docker.io/kubeflowkatib/katib-controller:v0.16.0-rc.1](http://docker.io/kubeflowkatib/katib-controller:v0.16.0-rc.1) 1 [docker.io/kubeflowkatib/katib-db-manager:v0.16.0-rc.1](http://docker.io/kubeflowkatib/katib-db-manager:v0.16.0-rc.1) 1 [docker.io/kubeflowkatib/katib-ui:v0.16.0-rc.1](http://docker.io/kubeflowkatib/katib-ui:v0.16.0-rc.1) 1 [docker.io/kubeflownotebookswg/centraldashboard:v1.8.0-rc.0](http://docker.io/kubeflownotebookswg/centraldashboard:v1.8.0-rc.0) 1 [docker.io/kubeflownotebookswg/jupyter-web-app:v1.8.0-rc.0](http://docker.io/kubeflownotebookswg/jupyter-web-app:v1.8.0-rc.0) 1 [docker.io/kubeflownotebookswg/kfam:v1.8.0-rc.0](http://docker.io/kubeflownotebookswg/kfam:v1.8.0-rc.0) 1 [docker.io/kubeflownotebookswg/notebook-controller:v1.8.0-rc.0](http://docker.io/kubeflownotebookswg/notebook-controller:v1.8.0-rc.0) 1 [docker.io/kubeflownotebookswg/poddefaults-webhook:v1.8.0-rc.0](http://docker.io/kubeflownotebookswg/poddefaults-webhook:v1.8.0-rc.0) 1 [docker.io/kubeflownotebookswg/profile-controller:v1.8.0-rc.0](http://docker.io/kubeflownotebookswg/profile-controller:v1.8.0-rc.0) 1 [docker.io/kubeflownotebookswg/pvcviewer-controller:v1.8.0-rc.0](http://docker.io/kubeflownotebookswg/pvcviewer-controller:v1.8.0-rc.0) 1 [docker.io/kubeflownotebookswg/tensorboard-controller:v1.8.0-rc.0](http://docker.io/kubeflownotebookswg/tensorboard-controller:v1.8.0-rc.0) 1 [docker.io/kubeflownotebookswg/tensorboards-web-app:v1.8.0-rc.0](http://docker.io/kubeflownotebookswg/tensorboards-web-app:v1.8.0-rc.0) 1 [docker.io/kubeflownotebookswg/volumes-web-app:v1.8.0-rc.0](http://docker.io/kubeflownotebookswg/volumes-web-app:v1.8.0-rc.0) 1 [docker.io/metacontrollerio/metacontroller:v2.0.4](http://docker.io/metacontrollerio/metacontroller:v2.0.4) 2 [gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1](http://gcr.io/kubebuilder/kube-rbac-proxy:v0.13.1) 1 [gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0](http://gcr.io/kubebuilder/kube-rbac-proxy:v0.8.0) 1 [gcr.io/ml-pipeline/api-server:2.0.1](http://gcr.io/ml-pipeline/api-server:2.0.1) 1 [gcr.io/ml-pipeline/cache-server:2.0.1](http://gcr.io/ml-pipeline/cache-server:2.0.1) 1 [gcr.io/ml-pipeline/frontend:2.0.1](http://gcr.io/ml-pipeline/frontend:2.0.1) 1 [gcr.io/ml-pipeline/metadata-envoy:2.0.1](http://gcr.io/ml-pipeline/metadata-envoy:2.0.1) 1 [gcr.io/ml-pipeline/metadata-writer:2.0.1](http://gcr.io/ml-pipeline/metadata-writer:2.0.1) 1 [gcr.io/ml-pipeline/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance](http://gcr.io/ml-pipeline/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance) 1 [gcr.io/ml-pipeline/mysql:8.0.26](http://gcr.io/ml-pipeline/mysql:8.0.26) 1 [gcr.io/ml-pipeline/persistenceagent:2.0.1](http://gcr.io/ml-pipeline/persistenceagent:2.0.1) 1 [gcr.io/ml-pipeline/scheduledworkflow:2.0.1](http://gcr.io/ml-pipeline/scheduledworkflow:2.0.1) 1 [gcr.io/ml-pipeline/viewer-crd-controller:2.0.1](http://gcr.io/ml-pipeline/viewer-crd-controller:2.0.1) 1 [gcr.io/ml-pipeline/visualization-server:2.0.1](http://gcr.io/ml-pipeline/visualization-server:2.0.1) 1 [gcr.io/ml-pipeline/workflow-controller:v3.3.10-license-compliance](http://gcr.io/ml-pipeline/workflow-controller:v3.3.10-license-compliance) 1 [gcr.io/tfx-oss-public/ml_metadata_store_server:1.5.0](http://gcr.io/tfx-oss-public/ml_metadata_store_server:1.5.0) 1 kserve/kserve-controller:v0.11.0 1 kserve/models-web-app:v0.10.0 1 kubeflow/training-operator:v1-855e096 1 mysql:8.0.29 1 python:3.7

Boes-man commented 9 months ago

@juliusvonkohout

rimolive commented 8 months ago

What is the KFP version you deployed?

Boes-man commented 8 months ago

Hi @rimolive, thanks for checking in. I am not sure, its a followon question i have :) I just git cloned the main branch and then used the manifest example install process. In the UI it show "dev local" (dont have a cluster up now, but its something like that). I did dump out List of images as per the last part of my original post. Hope that helps. Thanks

rimolive commented 8 months ago

I recommend you to follow the installation documentation in https://www.kubeflow.org/docs/components/pipelines/v2/installation/quickstart/. Applying manifests from main branch is just for dev purposes and not recommended for production/testing.

rimolive commented 8 months ago

Is there anything else you need about this issue?

Boes-man commented 7 months ago

Hi rimolive. think we can close for now. I have not been able to look at this further. Thanks

rimolive commented 7 months ago

Sure, no worries!

/close

google-oss-prow[bot] commented 7 months ago

@rimolive: Closing this issue.

In response to [this](https://github.com/kubeflow/pipelines/issues/10458#issuecomment-2044475502): >Sure, no worries! > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.