kubeflow / pipelines

Machine Learning Pipelines for Kubeflow
https://www.kubeflow.org/docs/components/pipelines/
Apache License 2.0
3.6k stars 1.62k forks source link

[backend] Output on GKE giving unclear error related to "container not found" #7676

Closed aronchick closed 5 months ago

aronchick commented 2 years ago

Environment

Steps to reproduce

Upload the following yaml as a pipeline - run it: https://gist.github.com/aronchick/25d6fca71df0ef86846c40bec5cbc2c3

Expected result

Runs correctly.

Actual

This step is in Error state with this message: failed to save outputs: Error response from daemon: No such container: 77f84d932adba6d4b35769fbba02bfab4b48cf2128985f3c105d3744978abf9f

(The container name changes every time, the container name does not appear in the yaml)


Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

aronchick commented 2 years ago

Just to be clear, this may be a GKE thing. I have an EKS deployment and it seems to run fine. But the error is very confusing regardless.

aronchick commented 2 years ago

For what it's worth, I solved it by changing how I deployed kubeflow from manifests. You can read more about it here - https://github.com/SAME-Project/same-project/issues/126

HOWEVER, I think this is still a bug in KFP. The error message should be clearer what's going wrong - not being able to find a container can't be correct.

aronchick commented 2 years ago

It appears this error has been touched on here - https://github.com/kubeflow/pipelines/issues/1471 - but, i'd propose the core issue is that the error message is not helpful. We should work on that.

zijianjoy commented 2 years ago

This is probably due to Emissary executor not being used in the latest version of Kubernetes cluster.

As an overall point of view, I am looking for advice about what kind of error message it should show instead of the current message. Open to contributor or discussion about how to provide a proper error message.

aronchick commented 2 years ago

Genuinely unclear what the error is. If emissary has not been setup correctly, can we detect it? Could we output that error?

zijianjoy commented 2 years ago

Follow instruction in https://www.kubeflow.org/docs/components/pipelines/installation/choose-executor/ to detect workflow executor setup.

github-actions[bot] commented 5 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 5 months ago

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.