Open bj8sk opened 1 year ago
Thanks for raising this issue. @bj8sk it is possible to try the steps above in 1.27 client version?. We had made some improvements in wait
command in 1.27.
Thank you, tried that but still same result. I set up trace level, and see that after a few seconds we get the json response out with completed state, but it still waits with request like this, every five minutes or so (when I set --timeout=1800): https://server:443/apis/batch/v1/namespaces/my-namespace/jobs?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dmy-job&resourceVersion=190601200&timeoutSeconds=552&watch=true
/triage accepted
/assign @sreeram-venkitesh
@bj8sk I tried reproducing the issue in the following manner and kubectl wait
didn't hang for me. My kubectl client version is 1.29
. Can you try reproducing the issue with the latest version and check if the issue still happens? I had initially tried using your pi-with-ttl
Job, in which case the Job was getting Completed
instead of Failed
. Please let me know if the issue persists. Here are the details of how I tried reproducing the issue.
kubectl version
❯ kubectl version
Client Version: v1.29.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.4-eks-8cb36c9
The YAML for the Job I used to meet the Failed
condition
apiVersion: batch/v1
kind: Job
metadata:
name: sreerams-failing-job
namespace: sreeram-dev
spec:
ttlSecondsAfterFinished: 100
template:
spec:
containers:
- name: fail-container
image: busybox
command: ["/bin/sh", "-c"]
args: ["exit 1"]
restartPolicy: Never
backoffLimit: 0
Running kubectl get
to check on the Job and the Pod:
❯ k get jobs -n sreeram-dev
NAME COMPLETIONS DURATION AGE
sreerams-failing-job 0/1 81s 81s
❯ k get pods -n sreeram-dev
NAME READY STATUS RESTARTS AGE
sreerams-failing-job-wj57m 0/1 Error 0 87s
Here's what I used to wait for the job's failure
❯ k wait --for=condition=Failed job/sreerams-failing-job -n sreeram-dev --timeout=300s
job.batch/sreerams-failing-job condition met
What happened: Started a job. Want to wait for end result of job, either complete or failed, but waiting for failed hangs, even after job is deleted by k8s. Following the suggestion in SO post https://stackoverflow.com/a/60286538, I start two kubectl wait's. The job has ttlSecondsAfterFinished: 300 backoffLimit: 0
The wait for complete status works and returns, but even though the job is deleted after about 5 minutes, still the wait for Failed process hangs on for 30 minutes: kubectl wait job/my-job --for=condition=Failed --timeout=1800s
What you expected to happen: If the job is completed why would it wait for Failed, it should just return some error code to indicate failure to wait for failed state.
How to reproduce it (as minimally and precisely as possible):
Environment:
kubectl version
):