Closed dkarlovi closed 1 year ago
This issue is currently awaiting triage.
If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
/sig cli
/transfer kubectl
/triage accepted We would like to implement a non-zero exit code for a timeout/failure for sure, we will investigate the ability to ignore timeouts due to inactivity.
Hey, would this be a good issue for a new contributor? If so, I would like to try and give it a shot.
So I actually think that the root of the error here is that we have an --ignore-errors
flag, but it defaults to true all the time. I haven't checked further than just skimming through the logs code really quick, however if we look here we see the definition of the flag itself, and I'm pretty sure it defaults to true, meaning that the only way to not ignore errors would be passing --ignore-errors=false
I believe. Needs more investigation, but hopefully should be relatively straight forward if you want to give it a shot.
@mpuckett159 sure, I'll give it a try, thanks!
I'll look into changing the default to false instead of true.
/assign
@mpuckett159 do you still think worthwhile to do some research if it's possible to workaround timeouts?
Personally yes however it will likely be very low on the priority list, and honestly this will likely be very difficult to diagnose the specific root cause of because of the sheer number of "things" that could be generating the disconnection.
@mpuckett159 sounds good, maybe we can reword this issue then to focus on the exit code and add a new one to focus on the timeout which might or might not be fixed in the future.
Hey @mpuckett159 I had a small query, should I open the PR with the fix in this repo or in the main k8s one?
Yes, all PRs for fixing code in kubectl should be opened in the main kubernetes/kubernetes repo, and code changes should be done in the staging directory. Also be sure to sign the CLA. You can find more details on contributing and your first contribution here.
Thanks @mpuckett159 for letting me know! Opened a PR fixing the same.
Excellent I'll take a look.
By the way, I don't think --ignore-errors
actually defaults to true, unless I am reading this wrong:
$ kubectl logs --help | grep -A1 ignore-errors
--ignore-errors=false:
If watching / following pod logs, allow for any errors that occur to be non-fatal
I'm pretty sure cobra uses whatever the value of the variable is, and o.IgnoreLogErrors
appears to be an uninitialized bool when this runs:
cmd.Flags().BoolVar(&o.IgnoreLogErrors, "ignore-errors", o.IgnoreLogErrors, "If watching / following pod logs, allow for any errors that occur to be non-fatal")
@dkarlovi. Can you provide more details on this?
I am trying to reproduce and am unable. I am using my own cluster and not Azure AKS, so maybe there is something provider-specific. I am also using a newer client and server, so that could be a factor too.
Here is what I've tried:
$ kubectl run bb --image=busybox --command -- /bin/sh -c "date; sleep 3600; date"
$ kubectl logs -f bb
Fri May 12 23:20:56 UTC 2023
... and it runs for over 15 minutes, and I finally decided to try to make it exit, so I ran kubectl edit pod bb
and changed the image to busybox:1.35
, then my kubectl logs
command exits with exit code 0. It terminated normally because the pod stopped (restarted).
Now, if I force request timeout to something short, I get a timeout error and a nonzero exit code...
$ kubectl logs -f bb --request-timeout=5
Fri May 12 23:37:28 UTC 2023
error: net/http: request canceled (Client.Timeout or context cancellation while reading body)
$ echo $?
1
...but I don't think this is what this issue is referring to.
Any additional info would be welcome.
@brianpursley maybe something in the release fixed it in the meantime, let me retry and update this issue, will get back to you here on Monday. Thanks!
@brianpursley you're right, I can't reproduce it any longer either, it streamed the empty log for 1h correctly. Closing here, thanks for looking into this!
BTW if anyone ends up finding this issue, kubectl
1.25.0 did have this issue for me, but 1.29 did not, everything else being equal. It was fixed at some point after 1.25.0 but before 1.29.0, I don't know how.
What happened?
with a pod silent for a while, say 10 mins, after a while it gets disconnected.
What did you expect to happen?
The logs are streamed indefinitely.
Alternatively, if a timeout is the cause of the interruption, exit with non-zero status.
How can we reproduce it (as minimally and precisely as possible)?
Run a pod which does something like
and watch its logs with
logs -f
Anything else we need to know?
No response
Kubernetes version
Cloud provider
OS version
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, ...) and versions (if applicable)