kubernetes / kubectl

Issue tracker and mirror of kubectl code
Apache License 2.0
2.83k stars 913 forks source link

Port-forward drops connection to pod after first connection #1169

Closed mkfdoherty closed 2 years ago

mkfdoherty commented 2 years ago

What happened:

When running Kubernetes v1.23.1 on Minikube with kubectl v1.23.2 I experienced the following unexpected behaviour when trying to create a port-forward to a pod running an arbitrary service.

kubectl version:

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.2", GitCommit:"9d142434e3af351a628bffee3939e64c681afa4d", GitTreeState:"clean", BuildDate:"2022-01-19T17:27:51Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/amd64"}                                                        
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:34:54Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}

What we see is that after the first netcat connection successfully closes we get a lost connection to the pod and the port-forward closes:

kubectl port-forward to pod output:

Forwarding from 127.0.0.1:5432 -> 5432
Forwarding from [::1]:5432 -> 5432
Handling connection for 5432
E0125 16:43:20.470080   17437 portforward.go:406] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod 55b25aeaae996c672f7eb762ce083e9b9666acabe96946d47790c167f1949d64, uid : exit status 1: 2022/01/25 15:43:20 socat[5831] E connect(5, AF=2 127.0.0.1:5432, 16): Connection refused
E0125 16:43:20.470389   17437 portforward.go:234] lost connection to pod

We would expect the the connection to stay open as is the case with Kubernetes before v1.23.0.

What you expected to happen: When running the test against EKS running Kubernetes version v1.21.5-eks-bc4871b we get the port-forward behavior we are use to. The port-forward remains open after the first successful netcat connection.

kubectl version:

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.2", GitCommit:"9d142434e3af351a628bffee3939e64c681afa4d", GitTreeState:"clean", BuildDate:"2022-01-19T17:27:51Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.5-eks-bc4871b", GitCommit:"5236faf39f1b7a7dabea8df12726f25608131aa9", GitTreeState:"clean", BuildDate:"2021-10-29T23:32:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.23) and server (1.21) exceeds the supported minor version skew of +/-1

Notice how the kubectl version is v1.23.2 and the server version is v1.21.5-eks-bc4871b. EKS seems to manage version skew on its own somehow.

The output we get after opening multiple connections is what we expect. The connection is not closed after subsequent nc commands (don’t be alarmed by the connection refusal by PostgreSQL, we are not using the right protocol or credentials. We are just trying to test the port-forward behavior and this is a simple approach to express the issue).

kubectl port-forward to pod output:

Forwarding from 127.0.0.1:5432 -> 5432
Forwarding from [::1]:5432 -> 5432
Handling connection for 5432
E0125 16:35:32.441184   17073 portforward.go:400] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod b4b99448ef949d8f4a2f7960edf5d25eaf0e3c7b82bb1fcd525c7f30ad2830d7, uid : exit status 1: 2022/01/25 15:35:32 socat[45088] E connect(5, AF=2 127.0.0.1:5432, 16): Connection refused
Handling connection for 5432
E0125 16:35:35.765744   17073 portforward.go:400] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod b4b99448ef949d8f4a2f7960edf5d25eaf0e3c7b82bb1fcd525c7f30ad2830d7, uid : exit status 1: 2022/01/25 15:35:35 socat[45202] E connect(5, AF=2 127.0.0.1:5432, 16): Connection refused
Handling connection for 5432
E0125 16:35:37.129167   17073 portforward.go:400] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod b4b99448ef949d8f4a2f7960edf5d25eaf0e3c7b82bb1fcd525c7f30ad2830d7, uid : exit status 1: 2022/01/25 15:35:37 socat[45243] E connect(5, AF=2 127.0.0.1:5432, 16): Connection refused
Handling connection for 5432

As we can see the port-forward connection lasts for many netcat connections. This is the behavior we expect.

For completeness this was tested using Minikube running v1.21.5 Kubernetes. The problem still exists if we don't take into account version skew. But if we match the kubectl and Minikube Kubernetes version to v1.21.5 then we get the expected behavior again of port-forwards remaining open past the first connection.

How to reproduce it (as minimally and precisely as possible):

My test is as follows:

  1. Open port forward to pod with a running service like PostgreSQL (kubectl port-forward $POD_WITH_SERVICE 5432:5432)
  2. Try and open a nc connections on localhost to the localport (nc -v localhost 5432)
  3. We should be able to open nc connection multiple times without the port-forward breaking (behaviour on Kubernetes before v1.23.0)

Tests were conducted against Kubernetes versions (v1.21.5, v1.22.1 and v1.23.1) on Minikube using minikube start --kubernetes-version=v1.21.5. Using minikube kubectl -- we can match the kubectl version to the Kubernetes version Minikube is using to avoid version skew. The problem I describe only appears when running Kubernetes above v1.23.0.

Anything else we need to know?: Based on the above testing it would seem that there is a bug introduced in kubectl >v1.23.0 which causes port-forwards to close immediately after a successful connection. This is a problem given the above test expects the old behaviour of long lasting kubectl port-forwards. My assumption is that this is a bug based on there being no mention of this behavior explicitly in CHANGELOG-1.23. so it may be a regression. Could someone please shed light on whether this is a regression or expected future behavior now for reasons unbeknown to me?

Environment:

k8s-ci-robot commented 2 years ago

@mkfdoherty: This issue is currently awaiting triage.

SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
eddiezane commented 2 years ago

Possibly related to https://github.com/kubernetes/kubernetes/pull/103526. Though that should have only had an effect when the pod dies anyways. We just stopped hiding the broken behavior.

We want to rewrite the port forward for this release as well.

brianpursley commented 2 years ago

@eddiezane Yes, I think this is probably related to the "fix" in https://github.com/kubernetes/kubernetes/pull/103526. I put "fix" in quotes because the fix was to allow port-forward to fail when there is an error instead of getting stuck in an unrecoverable non-failed state that can never process connections again.

@mkfdoherty You mentioned that this is expected behavior:

Forwarding from 127.0.0.1:5432 -> 5432
Forwarding from [::1]:5432 -> 5432
Handling connection for 5432
E0125 16:35:32.441184   17073 portforward.go:400] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod b4b99448ef949d8f4a2f7960edf5d25eaf0e3c7b82bb1fcd525c7f30ad2830d7, uid : exit status 1: 2022/01/25 15:35:32 socat[45088] E connect(5, AF=2 127.0.0.1:5432, 16): Connection refused
Handling connection for 5432
E0125 16:35:35.765744   17073 portforward.go:400] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod b4b99448ef949d8f4a2f7960edf5d25eaf0e3c7b82bb1fcd525c7f30ad2830d7, uid : exit status 1: 2022/01/25 15:35:35 socat[45202] E connect(5, AF=2 127.0.0.1:5432, 16): Connection refused
Handling connection for 5432
E0125 16:35:37.129167   17073 portforward.go:400] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod b4b99448ef949d8f4a2f7960edf5d25eaf0e3c7b82bb1fcd525c7f30ad2830d7, uid : exit status 1: 2022/01/25 15:35:37 socat[45243] E connect(5, AF=2 127.0.0.1:5432, 16): Connection refused
Handling connection for 5432

But I'm guessing that you continue to get connection refused even though the pod has failed and restarted. It says it is handling the connection, but it fails every time, so it's not really forwarding them. In this case port-forward is still technically running (from a process standpoint on your local machine), but is never able to forward connections again until you stop and restart it. This was behavior of kubectl prior to 1.23.0.

@mkfdoherty Can you double-check your kubectl version you were using in both cases? I don't think this problem should be dependent on the cluster version which is why I'm asking. It would surprise me if the behavior of port-forward using the same kubectl version would be different depending on the cluster version.

Also, can you check whether your pod has restarted while port-forward is running? If that happens, the behavior from kubectl 1.23.0 and later is for the kubectl port-forward command to log an error saying "lost connection to pod" and exit.


For reference, I tried reproducing using kubectl 1.23.1 with a 1.24.0-alpha cluster and also with a 1.21 cluster (this one an EKS cluster).

I was starting a tcp echo server, like this:

kubectl run tcpecho --image=alpine --restart=Never -- /bin/sh -c "apk add socat && socat -v tcp-listen:8080,fork EXEC:cat"
kubectl port-forward pod/tcpecho 8080

Then connecting like this:

nc -v localhost 8080

Are you able to reproduce the problem using the tcp echo server I mentioned above?

mkfdoherty commented 2 years ago

@eddiezane and @brianpursley I do agree that it does sound like the "fix" in https://github.com/kubernetes/kubernetes/pull/103526. Which is general is a fix. But I found it unexpected in this specific scenario that could be generalised to other cases of opening and closing connections to a service that a pod is running:

  1. A pod running PostgreSQL is up and running.
  2. We create a port-forward to the PostgreSQL pod using kubectl v1.23.3.
  3. We now open a successful psql connection to a database on the PostgreSQL instance via the port-forward.
  4. We close the connection to psql gracefully.
  5. The port-forward is now closed with the following error:
    E0207 08:03:13.969992   13701 portforward.go:406] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod ae5cb9fc17a1a793190887ac6d87bb3bf12e06df55bb03e370480884d2b4d69f, uid : failed to execute portforward in network namespace "/var/run/netns/cni-06fe22d7-1b91-ffc9-2c5b-568ff9137a34": read tcp4 127.0.0.1:59580->127.0.0.1:5432: read: connection reset by peer
    E0207 08:03:13.970197   13701 portforward.go:234] lost connection to pod

    We expected the port-forward to remain open for subsequent psql connections (that we close after each use). This was the case before v1.23.x. I have tested using kubectl v1.22.6 and port-forward does continue to remain open and functional even when we close psql connections although the port-forward does complain of errors (these errors are not unrecoverable).

    Handling connection for 5432
    E0207 09:01:20.124576   14998 portforward.go:400] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod ae5cb9fc17a1a793190887ac6d87bb3bf12e06df55bb03e370480884d2b4d69f, uid : failed to execute portforward in network namespace "/var/run/netns/cni-06fe22d7-1b91-ffc9-2c5b-568ff9137a34": read tcp4 127.0.0.1:33310->127.0.0.1:5432: read: connection reset by peer

I would not consider this scenario to be an example of a port-forward error occurring that requires the port-forward connection to be closed. Opening a psql connection and closing the connection does not indicate that there is anything wrong with the underlying pod or the port-forward connection. So I would say that in this scenario the port-forward is in a recoverable state and can accept new connections unlike many other scenarios in which a port-forward may return error. Can we better distinguish between these different cases of error with port-forwards?

I have run the echo server which does not cause the port-forward to break when using netcat to connect to it. This does indeed work as you have said. I would hope that this same bahaviour would be the case when opening up connections using psql. Opening and closing a netcat connection does not close the port-forward but opening and closing psql connections gracefully does return an error and close the port-forward. The only difference I see is that a netcat connection closing does not cause the port-forward return a recoverable error but closing the psql gracefully does. Are these scenarios so different? Might we not expect the same behaviour? Or might we consider it problematic to use the port-forward in this way?

To be clarify an error in my reproduction from the original post: My original reproduction method using netcat does not suffice and was the result of a false positive. Minikube has network issues in a recent update causing my pods to fail periodically. I understand this to be the case since my pod is running Patroni which manages PostgreSQL and retries the PostgreSQL process without kubelet being aware of this (A major downside of this design approach). Therefore the port-forward would fail for good reasons without pods being actually being restarted by Kubelet. Which I think is the intended value behind the https://github.com/kubernetes/kubernetes/pull/103526 PR. I am sorry for realising this after the fact. I really appreciate you taking the time to reproduce my issue. I am using kind and EKS now to avoid the current issue I experience with networking on Minikube for my use case. And so the issue appears to be only with kubectl versions as you have proposed.

brianpursley commented 2 years ago

This was the case before v1.23.x. I have tested using kubectl v1.22.6 and port-forward does continue to remain open and functional even when we close psql connections although the port-forward does complain of errors (these errors are not unrecoverable).

Hmm, so maybe there are different types of errors, some unrecoverable where it makes sense to stop forwarding, but others which are recoverable like your example, which you mentioned.

I’ll have to test with psql and see if I can reproduce that way and see what the difference is. It sounds like you are indeed hitting an issue with the change that was made in kubectl 1.23.0.

If that’s the case, using kubectl <1.23 should be a workaround for now until we can figure out what is going on here and fix it.

brianpursley commented 2 years ago

I'm still trying to reproduce this using kubectl 1.23.3 (and other versions), and still am not able to. There must be something else different about our clusters.

First, just so we're on the same page, here is my latest attempt to reproduce. I create a pod running PostgreSQL, forward port 5432, then connect using psql from my local machine:

Terminal 1:

kubectl run postgres --image=postgres --env=POSTGRES_PASSWORD=hunter2
kubectl port-forward postgres 5432

Terminal 2:

~ $ psql -h localhost -U postgres << EOF
> create table foo (bar integer, baz varchar); 
> insert into foo values(1, 'a'),(2, 'b'); 
> select * from foo; 
> drop table foo;
> EOF
CREATE TABLE
INSERT 0 2
 bar | baz 
-----+-----
   1 | a
   2 | b
(2 rows)

DROP TABLE
~ $ psql -h localhost -U postgres << EOF
> create table foo (bar integer, baz varchar); 
> insert into foo values(1, 'a'),(2, 'b'); 
> select * from foo; 
> drop table foo;
> EOF
CREATE TABLE
INSERT 0 2
 bar | baz 
-----+-----
   1 | a
   2 | b
(2 rows)

DROP TABLE
~ $ psql -h localhost -U postgres -c "SELECT * FROM somethingThatDoesntExist"
ERROR:  relation "somethingthatdoesntexist" does not exist
LINE 1: SELECT * FROM somethingThatDoesntExist
                      ^
~ $ psql -h localhost -U postgres -c "SELECT * FROM somethingThatDoesntExist"
ERROR:  relation "somethingthatdoesntexist" does not exist
LINE 1: SELECT * FROM somethingThatDoesntExist
                      ^
~ $ psql -h localhost -U postgres << EOF
> create table foo (bar integer, baz varchar); 
> insert into foo values(1, 'a'),(2, 'b'); 
> select * from foo; 
> drop table foo;
> EOF
CREATE TABLE
INSERT 0 2
 bar | baz 
-----+-----
   1 | a
   2 | b
(2 rows)

DROP TABLE

After several psql sessions, port forwarding remains running. Even when I issued a command that failed, the port forwarding connection itself remained intact.

@mkfdoherty Are my above commands similar to what you are doing when the problem happens?

Next, I'd like to try to find out if there is some difference in the clusters which is making this behave differently for you than it is for me.

@mkfdoherty Can you post your output of kubectl describe nodes?

I'm wondering if there is a difference in the container runtime or CNI provider.

Thanks for any additional info you can provide. I'm hoping we can get to the bottom of this as I'm sure if you're having this problem, some others will too.

brianpursley commented 2 years ago

I also tried using a Minikube cluster...

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:25:17Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:34:54Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}

And EKS as well.

mkfdoherty commented 2 years ago

Thank you so much for your replication attempt. I appreciate you taking the time.

I decided to try and replicate this again using a popular PostgreSQL helm chart . To my surprise the behavior is different to that of the PostgreSQL instance I have running with the Patroni replication manager. This likely explains why your replication procedure does not match mine. This appeared quite odd to me given that my PostgreSQL logs and pod appeared completely healthy when closing psql connections, but only the port-forward failed. Leading me to believe that this issue would be standard behavior across psql clients interacting with PostgreSQL servers using kubectl >1.23 port-forwards.

So my original assumption that this was a common recoverable error that could affect psql and other common clients communicating with servers over port-fowards seems to be incorrect.

I am investigating this less typical issue further but at this point the https://github.com/kubernetes/kubernetes/pull/103526 fix seems to work mostly as intended in the common cases I have tested. I would consider this closed unless I find reason to suspect otherwise.

Silvenga commented 2 years ago

Also started happening for us after upgrading to 1.23 from 1.22 - also postgres - only impacting port-forwarding.

I'm a little confused, what's the fix for this?

cconnert commented 2 years ago

I experienced a similar issue while doing port-forwarding to a PGO managed database:

Client Version: v1.24.2
Kustomize Version: v4.5.4
Server Version: v1.23.5

At some point the connection get lost. Interestingly the connection also is lost when I exist the local psql. Using kubectl v.1.22.0 port-fowarding rock solid

nic-6443 commented 2 years ago

I had the same problem, I found that the reason is that the Postgres server sends an RST packet when the client(like psql) disconnecting from the server in TLS mode, because it shut down the sub-process that handling connection without reading the SSL Shutdown packet send from the client. And the new logic introduced in https://github.com/kubernetes/kubernetes/pull/103526 causes the port forward itself to be shut down when an RST packet is read from server side in the connection established via the port forward.

If you are just okay with using the plaintext protocol, you can use PGSSLMODE=disable to temporarily bypass this issue for Postgres situation.

panteparak commented 2 years ago

@nic-6443 So is there a fix for this rather than fallback to disabling SSL on postgresql? I am too experiencing this problem.

Silvenga commented 2 years ago

FWIW, I'm not sure what I upgraded, but I'm no longer having this issue with DataGrip on the same cluster (so an upgrade to Datagrip or the kubectrl, or the cluster backplane).

panteparak commented 2 years ago

FWIW, I'm not sure what I upgraded, but I'm no longer having this issue with DataGrip on the same cluster (so an upgrade to Datagrip or the kubectrl, or the cluster backplane).

I see. I will see if cluster upgrading works. (my datagrip and kubectl are up to date)

PKizzle commented 2 years ago

@Silvenga Which versions have you upgraded the cluster, kubectl and DataGrip to?

Silvenga commented 2 years ago

@PKizzle I'm just assuming something changed, since I noticed it stopped happening (disabling SSL for my login is a chore to get done, so I never tried). I recently went though and upgraded all my local software e.g. Windows, Datagrip, etc. Postgres in the cluster wasn't upgrade.

I do distinctly remember upgrading the Datagrip Postgres driver. This was also when I migrated to use kubelogin after getting a bit too annoyed with the CLI warnings. 😆 But I doubt kubelogin would have impacted anything.

I'm on PTO, I'll check on the cluster's current version when I get back next week. Feel free to poke me if I forget.

panteparak commented 2 years ago

@Silvenga Just a gentle ping on this issue :D

Also, can you confirm that your postgresql is using SSL.

Silvenga commented 2 years ago

K8s:

# kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.4", GitCommit:"e87da0bd6e03ec3fea7933c4b5263d151aafd07c", GitTreeState:"clean", BuildDate:"2021-02-18T16:12:00Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.8", GitCommit:"bd30c9fbc8dc9668d7ab4b0bd4fdab5c929c1ad7", GitTreeState:"clean", BuildDate:"2022-06-21T17:15:16Z", GoVersion:"go1.17.11", Compiler:"gc", Platform:"linux/amd64"}

# kubelogin --version
kubelogin version
git hash: v0.0.14/f345047a580aaaf133b009041963d50b98d8d2e2
Go version: go1.17.11
Build time: 2022-07-07T17:00:54Z

I'm apparently still using the old xenial repositories in my WSL2 instance, where 1.20.4 is latest. I should switch that over at some point...

This cluster (v1.23.8) is located in Azure where the backplane is managed by AKS. All nodes have the latest security patches installed weekly with the K8s version matching the backplane. The cluster is using the standard Azure network driver.

Datagrip:

Datagrip: 2022.2.1
Driver: PostgreSQL JDBC Driver (ver. 42.4.0, JDBC4.2)
SSL: yes

All defaults except:

Run keep-alive query every 10 sec. (FWIW, doesn't actually seem to help)

When executing kubectl port-forward the following output is typical for me (Datagrip is functional in this case):

Forwarding from 127.0.0.1:5432 -> 5432
Forwarding from [::1]:5432 -> 5432
Handling connection for 5432
Handling connection for 5432
E0801 10:27:19.881663     121 portforward.go:400] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod 1f9a1685499a757ea34f558db57bd7bf29c54118c682e94fcd936cbd754df46d, uid : failed to execute portforward in network namespace "/var/run/netns/cni-7bfd7099-ab38-8f8b-b220-f653bb6013f4": read tcp4 127.0.0.1:50698->127.0.0.1:5432: read: connection reset by peer
E0801 10:28:16.876917     121 portforward.go:400] an error occurred forwarding 5432 -> 5432: error forwarding port 5432 to pod 1f9a1685499a757ea34f558db57bd7bf29c54118c682e94fcd936cbd754df46d, uid : failed to execute portforward in network namespace "/var/run/netns/cni-7bfd7099-ab38-8f8b-b220-f653bb6013f4": read tcp4 127.0.0.1:50496->127.0.0.1:5432: read: connection reset by peer

Where the connection reset by peer error is from Datagrip disconnecting one of it's connections.

Previously, the lost connection to pod error would occur nearly instantly after starting the port forward and letting Datagrip connect (I would lose the connection when Datagrip attempted to connect to the local port). It happened so quickly that running any command was fruitless.


Let me know @panteparak if I missed anything. I would really suspect that the driver was the real fix for me. Of course, I can't discount networking changes in Azure by Microsoft (there have been several since Jun 17, my first comment here).

dnnnvx commented 2 years ago

Hey, found this issue since I have the same problem but with argocd while trying to access the web UI:

kubectl port-forward service/argocd-server -n argocd 8080:443
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080

Handling connection for 8080
E0801 18:06:03.794037   26029 portforward.go:406] an error occurred forwarding 8080 -> 8080: error forwarding port 8080 to pod f4f1b30d071d4f15a4132aec5048cb482ef6f58699a32e74f284acd2bc8dd87b, uid : failed to execute portforward in network namespace "/var/run/netns/cni-7924ddc6-f0c5-1383-1d9d-0a011a47b2a7": read tcp4 127.0.0.1:56102->127.0.0.1:8080: read: connection reset by peer
E0801 18:06:03.794507   26029 portforward.go:234] lost connection to pod

The version is:

Client Version: v1.24.3
Kustomize Version: v4.5.4
Server Version: v1.24.3

(The cluster is a local kubeadm setup on 2 Intel NUCs with Debian).

EDIT: Using http instead of https solved the problem 🤧

kieranbenton commented 2 years ago

Same - experiencing this with DBeaver. I am unsure as to why this issue has been closed. Surely just disabling HTTPS is not a long term solution to this problem?

nic-6443 commented 2 years ago

@nic-6443 So is there a fix for this rather than fallback to disabling SSL on postgresql? I am too experiencing this problem.

@panteparak I don't have any good ideas yet, there is also a hack approach (which we currently use in our test environment): create iptables rules in init container to drop TCP RST packets sent from Postgres.

      initContainers:
      - args:
        - -I
        - OUTPUT
        - -p
        - tcp
        - --tcp-flags
        - RST
        - RST
        - -j
        - DROP
        command:
        - iptables
        image: istio/proxy_init:1.0.2
        imagePullPolicy: IfNotPresent
        name: drop-tcp-rst
        resources: {}
        securityContext:
          allowPrivilegeEscalation: true
          capabilities:
            add:
            - NET_ADMIN
            - NET_RAW
            drop:
            - ALL
          privileged: true
panteparak commented 2 years ago

@nic-6443 that's a nice hack If anyone is using Istio or any LB with a sidecar proxy. (My cluster does not expose postgresql via istio only at Service level). I think it could be achieve by doing some nginx tcp proxying with RST packet drop too. I'll see if the idea works and will post my result here.

Jeansen commented 2 years ago

I am having the same issue with postgres and created a new issue with more details in https://github.com/kubernetes/kubernetes/issues/111825.

marcofeltmann commented 2 years ago

Same issue with v1.24 and KinD which has a slightly more speaking message at that given line:

E0901 18:58:34.722393 3350394 portforward.go:406] an error occurred forwarding 443 -> 443: error forwarding port 443 to pod 66c6f25ad11bc695928369ebe8b4573ecf7ebbf2567dcd5b0762c476cbcdeecf, uid : failed to execute portforward in network namespace "/var/run/netns/cni-1799908f-62e1-4b8a-4acd-7eac35188133": failed to connect to localhost:443 inside namespace "66c6f25ad11bc695928369ebe8b4573ecf7ebbf2567dcd5b0762c476cbcdeecf", IPv4: dial tcp4 127.0.0.1:443: connect: connection refused IPv6 dial tcp6 [::1]:443: connect: connection refused

firefox-developer-edition states this error:

Secure Connection Failed An error occurred during a connection to localhost. PR_END_OF_FILE_ERROR

which reads like "had trouble handling the SSL stuff".

Version:

clientVersion:
  buildDate: "2022-08-23T15:32:20Z"
  compiler: gc
  gitCommit: 95ee5ab382d64cfe6c28967f36b53970b8374491
  gitTreeState: archive
  gitVersion: v1.24.4
  goVersion: go1.19
  major: "1"
  minor: "24"
  platform: linux/amd64
kustomizeVersion: v4.5.4
serverVersion:
  buildDate: "2022-08-11T22:45:16Z"
  compiler: gc
  gitCommit: aef86a93758dc3cb2c658dd9657ab4ad4afc21cb
  gitTreeState: clean
  gitVersion: v1.24.3
  goVersion: go1.18.3
  major: "1"
  minor: "24"
  platform: linux/amd64
mkosterin commented 2 years ago

@marcofeltmann Did you find solution? I have the same issue. Tried to delete flannel, install calico but have got the same

marcofeltmann commented 2 years ago

@marcofeltmann Did you find solution?

No, but since the update to 1.25 for the kindest/node and appropriate kubectl version the issue didn't occur again.

abdennour commented 2 years ago

experienced same issue in k8s 1.24-k3s

Handling connection for 1344
E0910 23:08:02.388936    5642 portforward.go:400] an error occurred forwarding 1344 -> 1344: error forwarding port 1344 to pod 139e702b76823be74c0319cbca831dca9157e847092dee86cdd6d17dbfe5477d, uid : failed to execute portforward in network namespace "/var/run/netns/cni-6d9a8c0f-1cbd-ab2b-4780-da4535934b11": failed to connect to localhost:1344 inside namespace "139e702b76823be74c0319cbca831dca9157e847092dee86cdd6d17dbfe5477d", IPv4: dial tcp4 127.0.0.1:1344: connect: connection refused IPv6 dial tcp6: address localhost: no suitable address found 
Handling connection for 1344
flickerfly commented 2 years ago

@marcofeltmann Thank you for your comment because it solved my problem. My browser was assuming https instead of http which resulted in the failed to execute portforward in network namespace error. I just had to specify http:// in browser and its happy now.

Spongman commented 2 years ago

I'm having the same issue:

E0921 22:27:13.215667   19820 portforward.go:406] an error occurred forwarding 443 -> 443: error forwarding port
 443 to pod 7d50fda57942d420fbf3222be6adf19742e85bc8efb67907cce9ec543218a2db, uid : failed to execute portforwar
d in network namespace "/var/run/netns/cni-02a8f123-fd65-33e5-d995-c6bc73621df5": failed to connect to localhost
:443 inside namespace "7d50fda57942d420fbf3222be6adf19742e85bc8efb67907cce9ec543218a2db", IPv4: dial tcp4 127.0.
0.1:443: connect: connection refused IPv6 dial tcp6: address localhost: no suitable address found
E0921 22:27:13.225449   19820 portforward.go:234] lost connection to pod

it seems like it's random. sometimes it happens immediately after running kubectl, sometimes it happens after first connection to the local port, and sometimes it doesn't happen.

tbeauvais-imagine commented 2 years ago

Why is this closed? It is still happening!

kieranbenton commented 2 years ago

Agreed, this is a breaking change and is hugely disruptive. Right now we've downgraded until it is resolved.

szaher commented 1 year ago

same here on apple silicon m1 arm64 facing the exact same issue

SamYuan1990 commented 1 year ago

may I know the reason why port-forward give me an ipv6 address? and it still happening as I try to use it on github action with kind.

something makes my confused is that ...

  1. where is the ipv6 address from?
  2. if the system, host vm/instance only have ipv4 support, does the kubectl still give me an ipv6 address inside pod? (make sense?)
lordzsolt commented 1 year ago

So...... why is this issue closed? And there's no clear resolution in this issue.

wascript3r commented 1 year ago

Still happening using kubectl v1.26.0

CDFN commented 1 year ago

Can confirm this happens to me as well on MacOS Ventura 13.1 arm64 M2. kubectl version --short output:

Client Version: v1.26.0
Kustomize Version: v4.5.7
Server Version: v1.25.4
luukasmakila commented 1 year ago

Try curling the pod you're port-forwarding. Had similar problems and when I curled the pod I got this message: curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:8080 Which then went away after I upgraded libressl with brew and turned off my corporate proxy and tried again.

Now it works even with the proxies on. Wish I could give a better explanation but im not 100% sure how it got fixed.

VolodymyrFesenko commented 1 year ago

I'm having the same issue

kayn1 commented 1 year ago

I have the same issue... I have tried to port forward inside a loop but this hacky way does not help too.

matheus-rossi commented 1 year ago

Same here Macos Ventura 13.2 M1 Pro

duclm2609 commented 1 year ago

Still an issue for me. Kubernetes version 1.24 on EKS. It succeed on the first attempt then immediately being closed.

raman-babich commented 1 year ago

The same for me but it is more like random behavior sometimes it works sometimes doesn't. Macos Ventura 13.1 M1 Pro

TrNgTien commented 1 year ago

The same for me and I only have the problem when trying forwarding a service for one namespace which is different from default one. Macos Ventura 13.1 M1 air

codihuston commented 1 year ago

I saw someone mention the use of http worked for them over https. I had the opposite experience when port forwarding the Argo Workflow UI, so this seems to depend on the back-end application in this scenario. It was resolved by visiting https://localhost:$PORT instead of http.

Interestingly, querying over HTTP using curl doesn't seem to break the port forward, but querying it via Google Chrome Version 109.0.5414.120 (Official Build) (64-bit) does break the connection. I'm using KinD. More info below:

$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

$ kind version
kind v0.14.0 go1.18.2 linux/amd64

$ kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T14:30:46Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-20T03:35:13Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.24) and server (1.26) exceeds the supported minor version skew of +/-1

$ docker version
Client: Docker Engine - Community
 Cloud integration: v1.0.24
 Version:           20.10.14
 API version:       1.41
 Go version:        go1.16.15
 Git commit:        a224086
 Built:             Thu Mar 24 01:48:21 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Desktop
 Engine:
  Version:          20.10.14
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.15
  Git commit:       87a90dc
  Built:            Thu Mar 24 01:46:14 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.11
  GitCommit:        3df54a852345ae127d1fa3092b95168e4a88e2f8
 runc:
  Version:          1.0.3
  GitCommit:        v1.0.3-0-gf46b6ba
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

$ kubectl -n argo port-forward deployment/argo-server 2746:2746
Forwarding from 127.0.0.1:2746 -> 2746
Forwarding from [::1]:2746 -> 2746
Handling connection for 2746
E0203 22:51:56.263246  491761 portforward.go:406] an error occurred forwarding 2746 -> 2746: error forwarding port 2746 to pod 1335bc8a2314bc356293a3e2c1612a28f7a05abb732fed0e5ea25a6017fec30f, uid : failed to execute portforward in network namespace "/var/run/netns/cni-c64eb420-5dd1-8744-c1cb-0c766fed1b52": read tcp4 127.0.0.1:55794->127.0.0.1:2746: read: connection reset by peer
E0203 22:51:56.263627  491761 portforward.go:234] lost connection to pod
bryanck commented 1 year ago

Downgrading to 1.22.17 resolves this issue for me (macOS 13.2.1), which isn't a great solution.

jdsdc commented 1 year ago

Same - experiencing this with DBeaver. I am unsure as to why this issue has been closed. Surely just disabling HTTPS is not a long term solution to this problem?

I had the same problem with disconnections when using port forward against a zalando postgres cluster. I only had the problem when i enabled "Show all databases".

Anyway, i changed in connection settings- > driver properties sslMode=disable

image

yuryskaletskiy commented 1 year ago

getting this as well. Apple Silicon M1 + latest DataGrip.

cr1cr1 commented 1 year ago

I confirm @jdsdc solution works (disabling sslMode in the driver)

jdsdc commented 1 year ago

getting this as well. Apple Silicon M1 + latest DataGrip.

you have the same settings i datagrip. Change connections settings as mentioned in my comment. https://github.com/kubernetes/kubectl/issues/1169#issuecomment-1473620662

marcellodesales commented 1 year ago

DataGrip confirmed