Closed amarshall closed 1 year ago
@amarshall: This issue is currently awaiting triage.
SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted
label.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
This may be fixed on the upcoming 1.23 release or by https://github.com/kubernetes/kubernetes/pull/105723.
You might be able to temporarily fix this by raising the ulimit on your mac.
@ash2k this might still be an issue here? https://github.com/kubernetes/kubectl/blob/a450ebd59c1e8917df23d37c6f05d8e16c3746aa/pkg/cmd/delete/delete.go#L392
/assign atiratree
@eddiezane I'm not sure, sorry.
@amarshall this seems like a duplicate of https://github.com/kubernetes/kubernetes/issues/91913. Can you please try again with latest kubectl?
sidenote: I have also tested this with plain kubernetes (not aws-iam) and have not observed such issue as there is only one connection opened.
Great reproduction example in https://github.com/kubernetes/kubectl/issues/1152
TY, I managed to reproduce #1152 on AWS EKS (1.21) with latest kubectl. Vanilla kubernetes works fine.
seeing multiple connections:
kubectl 430171 user 18u IPv4 2968415 0t0 TCP p1:49786->ec2-IP.eu-central-1.compute.amazonaws.com:https (ESTABLISHED)
kubectl 430171 user 20u IPv4 2968418 0t0 TCP p1:40788->ec2-IP.eu-central-1.compute.amazonaws.com:https (ESTABLISHED)
kubectl 430171 user 22u IPv4 2974513 0t0 TCP p1:49790->ec2-IP.eu-central-1.compute.amazonaws.com:https (ESTABLISHED)
after reaching ulimit -n
opened files limit I am getting connection errors:
Unable to connect to the server: dial tcp: lookup 123456789ABCDEF.gr7.eu-central-1.eks.amazonaws.com: too many open files
Unable to connect to the server: dial tcp: lookup 123456789ABCDEF.gr7.eu-central-1.eks.amazonaws.com: too many open files
Unable to connect to the server: dial tcp: lookup 123456789ABCDEF.gr7.eu-central-1.eks.amazonaws.com: too many open files
I am planning to work on the fix soon
This bug applies to all commands (create/apply/delete/describe) that use Exec credentials plugin
Description of the current code flow and behaviour for kubectl delete:
when running delete
with client obtained here
and initialized via
for each deletion new client in info is used
and also for each deletion new client is created
and new TransportConfig is created
which is then ammended with credentials Exec provider
Authenticator is cached so there are no duplicate queries to the executable until the token expires (even when used over multiple clients):
tlsCache is used for transports if custom transport is not specified: https://github.com/kubernetes/kubernetes/blob/267272efe0725e14b3c2c7bc6fa3dd64a922a6a7/staging/src/k8s.io/client-go/transport/transport.go#L50
c.TLS.GetCert
and c.Dial
(which are nil in normal kubeconfig): https://github.com/kubernetes/kubernetes/blob/267272efe0725e14b3c2c7bc6fa3dd64a922a6a7/staging/src/k8s.io/client-go/transport/cache.go#L136^ I think the fix should be just to enable the TLS caching for this Exec plugin use case:
GetCert should also return consistently when the server requests client certificate and the connections should be closed by Authenticator when new cert appears
so both of these should not affect the tlsCacheKey and change in transport returned from the cache.
posted a PR with a fix https://github.com/kubernetes/kubernetes/pull/108274
until that gets accepted a following workaround can be used:
TOKEN="`AWS_PROFILE=my-profile aws --region my-region eks get-token --cluster-name my-cluster | jq ".status.token" | sed 's/"//g'`"
kubectl delete pod -l mypod=foo --token=$TOKEN
I think the fix should be just to enable the TLS caching for this Exec plugin use case:
This approach was discussed in https://github.com/kubernetes/kubernetes/pull/108274#discussion_r812046486 and was not shown as viable.
I will try to refactor kubectl generically and use similar approach as https://github.com/kubernetes/kubernetes/pull/105490.
@atiratree I think you've got the right idea about kubectl needing to reuse a single transport across all requests.
It looks like anything that uses builder will have this problem where a new client is created for each invocation of Visit()
.
https://github.com/kubernetes/kubernetes/blob/0b8d725f5a04178caf09cd802305c4b8370db65e/staging/src/k8s.io/cli-runtime/pkg/resource/visitor.go#L430
It looks like getClient()
will always return a new client, and there is currently no way (currently) to tell the builder to use a specific client instance.
https://github.com/kubernetes/kubernetes/blob/0b8d725f5a04178caf09cd802305c4b8370db65e/staging/src/k8s.io/cli-runtime/pkg/resource/builder.go#L925-L945
(Well, the new client actually happens deep inside the call stack, but that is where it begins)
I wonder if a good approach would be to add a WithClient(client *http.Client)
function to the builder. Is this along the lines of what you were thinking? Is there any other way to tell the builder to use a specific client instead of creating its own?
@brianpursley Thanks, I had to get down to the ConfigFlags to support also the DiscoverClient. Please see https://github.com/kubernetes/kubernetes/pull/108459 for the implementation.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
kubectl
should definitely stop making an infinite number of clients, but until it does, https://github.com/kubernetes/kubernetes/pull/112017 should prevent connection leaks.
kubectl
should definitely stop making an infinite number of clients, but until it does, kubernetes/kubernetes#112017 should prevent connection leaks.
This fix is merged and backported. The next release of kubectl should include the fix.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
What happened:
kubectl delete leaks network connections when deleting multiple resources, causing warnings or errors
What you expected to happen:
kubectl to not leak network connections
How to reproduce it (as minimally and precisely as possible):
kubectl delete pod -l mypod=foo
Receive warning (Linux):
Receive error (macOS):
Anything else we need to know?:
Environment:
kubectl version
):cat /etc/os-release
): Linux and macOS