kubectl follow logs stops after few seconds if there is a lot data to stdout

avneradania1 commented 9 months ago

What happened?

A lot of running jobs generates a massive amount of log to stdout. Steps:

Running K8s job that invokes a lot of lines to stdout.
kubectl logs -f -n Starts tailing pod output, till it stuck randomly. Note, it stuck till pod exit with a success or failure.

What did you expect to happen?

Expected result is to see all pod stdout on screen. This is not happening.

How can we reproduce it (as minimally and precisely as possible)?

invokes a pod that generates a massive amount of stdout.
run:
kubectl logs -f -n
after a while, this is randomly duration, it stops showing output

Anything else we need to know?

No response

Kubernetes version

```console $ kubectl version Client Version: v1.28.1 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.27.7 ```

Cloud provider

OS version

```console # On Linux: $ cat /etc/os-release # NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" $ uname -a # Linux 3.10.0-1160.99.1.el7.x86_64 kubernetes/kubernetes#1 SMP Wed Sep 13 14:19:20 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

k8s-ci-robot commented 9 months ago

There are no sig labels on this issue. Please add an appropriate label by using one of the following commands:

/sig <group-name>
/wg <group-name>
/committee <group-name>

Please see the group list for a listing of the SIGs, working groups, and committees available.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

k8s-ci-robot commented 9 months ago

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

tamilselvan1102 commented 9 months ago

/sig kubectl

k8s-ci-robot commented 9 months ago

@tamilselvan1102: The label(s) sig/kubectl cannot be applied, because the repository doesn't have them.

In response to [this](https://github.com/kubernetes/kubectl/issues/1548): >/sig kubectl > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

tamilselvan1102 commented 9 months ago

/transfer kubectl

k8s-ci-robot commented 9 months ago

@tamilselvan1102: You must be an org member to transfer this issue.

In response to [this](https://github.com/kubernetes/kubectl/issues/1548): >/transfer kubectl > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

HirazawaUi commented 9 months ago

/transfer kubectl

mpuckett159 commented 9 months ago

When logs are tailed kubectl uses the watcher utility, which relies on go channels, in this case a buffered go channel. We can see in the below link that the default size for the buffer is 100, and this is not currently configurable. Now, normally this would be fine because the logs coming in would be sent out, but because you are generating so much data, the logs are unable to exit the channel buffer at a faster rate than the logs are flowing in to the channel buffer, resulting in negative pressure and the channel to hang.

How the log function spawns multiple watchers

The default buffer size of the channel used by the watchers

If you run the command causing the issue with -v=9 flag that may give us a clearer indication that this is what is happening. Could you do that and post the resulting logs for that message (not the logs that are being output but the ones being returned as a result of the verbosity flag)?

Ritikaa96 commented 9 months ago

/sig cli

avneradania1 commented 9 months ago

@mpuckett159, Thanks for detailed response. I'm guessing you're asking for the beginning of v=9 output. if so, file attached with details.

kubectl-v9.pdf

Pawan-Bishnoi commented 6 months ago

would like to help here.

Do we want to provide an option to increase the buffer size?

k8s-triage-robot commented 3 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

Pawan-Bishnoi commented 3 months ago

not stale

HirazawaUi commented 3 months ago

/remove-lifecycle stale

Pawan-Bishnoi commented 2 months ago

@mpuckett159 does the logs shared above confirm your theory?

I can see that DefaultConsumeRequest uses the stream function of go-client.

I don't see watch package used in this code path. What am I missing? 🤔

Pawan-Bishnoi commented 2 months ago

I am surprised that more people aren't complaining about this 😄 This blocks me daily and I would love to help solve whatever limitation we are running into here.

avneradania commented 2 months ago

Agree, it looks like people does not look into stdout anymore, but on main Log servers such as Loki, Elk etc.

ardaguclu commented 2 months ago

I spent some time on this today and it was hard to reproduce. I think, checking the kubelet and api server logs if that happens can enlighten the issue. I'm not sure this issue is definitely happening on kubectl and we need to eliminate the other components first.

ardaguclu commented 2 months ago

Also another question would be; does this happen without tailing -f?

Pawan-Bishnoi commented 1 month ago

I doubt kubelet too. From whatever I could see, it wasn't happening on kubectl. Will dig some more on my next attempt.

kubernetes / kubectl