Open hbelmiro opened 6 months ago
/assign @hbelmiro
Hi @hbelmiro, any update on this?
I bumped my company pipelines to make them compliant with KFP v2 and they are throwing these errors:
time="2024-06-07T17:29:06.435Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2024-06-07T17:29:06.436Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-06-07T17:29:06.436Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"
time="2024-06-07T17:29:06.436Z" level=error msg="cannot save parameter /tmp/outputs/condition" argo=true error="open /tmp/outputs/condition: no such file or directory"
Hi @leanaha. I still didn't have time to work on it. Feel free to send a PR if you know how to fix it. I can help with the review.
/unassign @hbelmiro
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Still relevant
/lifecycle frozen /remove-lifecycle stale
(Potential solve) may not be relevant.
We had similar issue in our cluster, based on Rancher Kubernetes engine 2. The issue where not Kubeflow pipelines itself, but the pipeline container not being able to communicate with the ml-pipeline controller. Due to network/network policies.
Applied something like this for the given Kubeflow profile namespace.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-to-ml-pipeline-controller
namespace: profile-namespace
spec:
policyTypes:
- Egress
egress:
- ports:
- port: 8887
protocol: TCP
to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kubeflow
- podSelector:
matchLabels:
app: ml-pipeline
app.kubernetes.io/name: kubeflow-pipelines
This may not be fine grained enough, but you get the idea.
Running recurring pipeline of say hello example:
Without networkPolicy
time="2024-08-16T10:38:06.360Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2024-08-16T10:38:06.360Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-08-16T10:38:06.360Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"
time="2024-08-16T10:38:06.360Z" level=info msg="/tmp/outputs/condition -> /var/run/argo/outputs/parameters//tmp/outputs/condition" argo=true
Error: exit status 1
With networkPolicy
time="2024-08-16T10:39:46.856Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2024-08-16T10:39:46.856Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-08-16T10:39:46.856Z" level=info msg="/tmp/outputs/cached-decision -> /var/run/argo/outputs/parameters//tmp/outputs/cached-decision" argo=true
time="2024-08-16T10:39:46.856Z" level=info msg="/tmp/outputs/condition -> /var/run/argo/outputs/parameters//tmp/outputs/condition" argo=true
Hope this solves the issue, for others.
/assign
When running a simple V2 pipeline more than once the following errors happen:
Pipeline sample:
This is related to https://github.com/kubeflow/pipelines/issues/9678#issuecomment-2071361425.
Impacted by this bug? Give it a 👍.