Error: forwarding ports: error upgrading connection: error dialing backend: - Azure Kubernetes Service

AndrisPM commented 5 years ago

Hi Team, We have upgraded our Kubernates Service cluster on Azure to latest version 1.12.4. After that we suddenly recognize that pods and nodes cannot communicate between anymore by private ip :

kubectl get pods -o wide -n kube-system -l component=kube-proxy
NAME               READY     STATUS    RESTARTS   AGE       IP           NODE
kube-proxy-bfhbw   1/1       Running   2          16h       10.0.4.4     aks-agentpool-16086733-1
kube-proxy-d7fj9   1/1       Running   2          16h       10.0.4.35    aks-agentpool-16086733-0
kube-proxy-j24th   1/1       Running   2          16h       10.0.4.97    aks-agentpool-16086733-3
kube-proxy-x7ffx   1/1       Running   2          16h       10.0.4.128   aks-agentpool-16086733-4

As you see the node aks-agentpool-16086733-0 has private IP 10.0.4.35 . When we try to check logs on pods which are on this node we got such error: Get https://aks-agentpool-16086733-0:10250/containerLogs/emw-sit/nginx-sit-deploy-864b7d7588-bw966/nginx-sit?tailLines=5000&timestamps=true: dial tcp 10.0.4.35:10250: i/o timeout

We got the Tiller ( Helm) on this node as well, and if try to connect to tiller we got such error from Client PC:

shmits-imac:~ andris.shmits01$ helm version
Client: &version.Version{SemVer:"v2.12.3", GitCommit:"eecf22f77df5f65c823aacd2dbd30ae6c65f186e", GitTreeState:"clean"}
Error: forwarding ports: error upgrading connection: error dialing backend: dial tcp 10.0.4.35:10250: i/o timeout

Does anybody have any idea why the pods and nodes lost connectivity by private IP ?

ramv76 commented 5 years ago

I am getting similar error too, did you get any resolution to this issue? Thank you.

yzargari commented 5 years ago

+1

psinger commented 5 years ago

Same issue here, any solutions?

giacomopastore commented 5 years ago

On Oracle Cloud with K8s v1.12.6 I've got this error:

Error: forwarding ports: error upgrading connection: error dialing backend: EOF

jwenz723 commented 5 years ago

Same issue, running on AWS EKS v1.11, deployed using terraform.

neoKushan commented 5 years ago

I have just encountered this issue myself. I take it nobody has any clue how to resolve?

Using AKS.

yue9944882 commented 5 years ago

sounds like a per-cloud-provider issue, try contact for support?

psinger commented 5 years ago

I think it had something to do with outgoing denied access by NSG, in my case in Azure. To test it try to allow all outbound connections.

dzsessona commented 5 years ago

same on aws eks . kubernetes version 1.12, deployed with terraform

GeekOnGadgets commented 5 years ago

@jwenz723 I am having the same issue, did you mange to find a solution. I have provisioned the resources using terraform

jwenz723 commented 5 years ago

@jwenz723 I am having the same issue, did you mange to find a solution. I have provisioned the resources using terraform

I am having a hard time remembering how I got past this issue, but I believe the issue ended up being with my AWS security groups creation. I had originally followed the guide here to get my eks cluster built using terraform. After I completed the steps in that guide I started refactoring some of the names to not include the word demo and when I did this I accidentally clobbered the security groups to only use 1 rather than 2 in this code (notice the value is supposed to be different for security_group_id and source_security_group_id):

resource "aws_security_group_rule" "eks-cluster-ingress-node-https" {
  description              = "Allow pods to communicate with the cluster API Server"
  from_port                = 443
  protocol                 = "tcp"
  security_group_id        = "${aws_security_group.eks-cluster.id}"
  source_security_group_id = "${aws_security_group.worker-nodes.id}"
  to_port                  = 443
  type                     = "ingress"
}

Maybe this will help you.

fox-run commented 5 years ago

bump

amandadebler commented 5 years ago

We are experiencing similar symptoms with all clusters created within the past week or so. I thought that it might have something to do with this: https://github.com/Azure/AKS/blob/master/CHANGELOG.md#release-2019-06-18

Except that our clusters use the Azure networking plugin, not Kubenet.

asubmani commented 5 years ago

It looks like a networking issue, except that there is no clarify on what exactly the port forwarding means. Our Kubernetes setup is protected by a firewall and we had to allow access to gcr.io and storage.gooleapi's just to install helm.

kubectl and helm are installed on my WSL2 Ubuntu 18. Local firewall is disabled. tiller is successfuly deployed on the cluster.

The port forwarding is it from a K8 node to pod? or is it from machine running the console to pod?

asubmani commented 5 years ago

I am using AKS 1.14.5, helm version v2.14.32 Using Azure CNI networking.

Most likely this is failing for me because my laptop is currently not joined to corp network and hence has no connectivity to pods.

I can deploy stuff using kubectl though.

sssrikumar commented 4 years ago

Had same issue, stopped firewall & its working.

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

fejta-bot commented 4 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /close

k8s-ci-robot commented 4 years ago

@fejta-bot: Closing this issue.

In response to [this](https://github.com/kubernetes/kubectl/issues/587#issuecomment-612789701): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >Send feedback to sig-testing, kubernetes/test-infra and/or [fejta](https://github.com/fejta). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

cifranix commented 4 years ago

I am getting similar error too, did you get any resolution to this issue? Thank you.

Was able to resolve this by opening port 10250 on the node in which it's trying to reach.

marwenbk commented 3 years ago

this issue randomly occur on microk8s dahboard-proxy , restarting the server may solve it

MrMYHuang commented 2 years ago

I met the same problem after a PC boot with rke2 v1.22.5+rke2r1 on Ubuntu 21.10 x86_64.

By the doc, https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/, port 10250 is used by kubelet. It suggests me to restart my rke2 k8s. After running sudo systemctl start rke2-server, the port-forward problem is fixed.

onukwilip commented 1 year ago

I'm kinda new to k8s and container orchestration. I also encountered this issue while i was implementing my hands-on project for the Docker and Kubernetes IBM course on Coursera. All i did was delete the deployment which was running on port 3000, then redeploy it. Hope this helps 🙂

piotr-kierklo-looker commented 1 year ago

I had this problem when my local kubectl version was too old, comparing to the Kubernetes version on the server/cluster

Just do kubectl version and compare the versions

kubernetes / kubectl

Error: forwarding ports: error upgrading connection: error dialing backend: - Azure Kubernetes Service #587