kubernetes / kubectl

Issue tracker and mirror of kubectl code
Apache License 2.0
2.88k stars 922 forks source link

port-forward dropping the connection for certain type of files #1368

Open dvaldivia opened 1 year ago

dvaldivia commented 1 year ago

We are seeing a problem where kubectl port-forward is dropping the connection after serving a particular type of file, in this case a video, strangely enough, when we serve videos from non static files (copying a stream of bytes from MinIO for example) the pipe never breaks, file size makes no difference.

I created a repo with a reproduction case at https://github.com/dvaldivia/gofileserve

Steps to reproduction

1. Deploy the app in your kubernetes cluster

kubectl apply -f kubernetes/deployment.yaml

2. Port Forward to this service

kubectl port-forward svc/gofileserve 4001:4000

3. On another terminal, try to get Spinning-Cat.mp4

curl http://localhost:4001/assets/media/Spinning-Cat.mp4

You'll see how the port-forward breaks, and no request will work afterwards

➜ kubectl port-forward svc/gofileserve 4001:4000
Forwarding from 127.0.0.1:4001 -> 4000
Forwarding from [::1]:4001 -> 4000
Handling connection for 4001
E0210 13:22:27.776615   61159 portforward.go:379] error copying from remote stream to local connection: readfrom tcp4 127.0.0.1:4001->127.0.0.1:52390: write tcp4 127.0.0.1:4001->127.0.0.1:52390: write: broken pipe

if you restart the port-forward and you try to get other static files, like hello.json the pipe will never break

➜ curl http://localhost:4001/assets/hello.json
{
  "hello": "world"
}

So for this particular type of file being returned something cracks inside kubectl port-forward

mpuckett159 commented 1 year ago

/triage accepted /assign @brianpursley

brianpursley commented 1 year ago

It looks like curl is the one breaking the pipe.

When I run your curl command, I get this warning and it exits with exit code 23:

$ curl http://localhost:4001/assets/media/Spinning-Cat.mp4
Warning: Binary output can mess up your terminal. Use "--output -" to tell 
Warning: curl to output it to your terminal anyway, or consider "--output 
Warning: <FILE>" to save to a file.
$ echo $?
23

curl gives you a couple of options:

  1. You can use the --output flag with a filename, like curl http://localhost:4001/assets/media/Spinning-Cat.mp4 --output Spinning-Cat.mp4
  2. If you really want the binary output to go to stdout, you can use --output - like this: curl http://localhost:4001/assets/media/Spinning-Cat.mp4 --output -. I'd imagine you use this if you want to pipe the output to another process.

When I did option 1, it works:

$ curl http://localhost:4001/assets/media/Spinning-Cat.mp4 --output Spinning-Cat.mp4
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1427k  100 1427k    0     0   138M      0 --:--:-- --:--:-- --:--:--  154M
$ ls -la Spinning-Cat.mp4 
-rw-rw-r-- 1 bpursley bpursley 1461984 Mar  1 21:43 Spinning-Cat.mp4

I did however notice that once you get the broken pipe, kubectl's port forwarding seems to get into some weird random (?) timeout state until you restart port forwarding. It just doesn't recover well from errors (and it doesn't exit either). If port forward cannot recover from an error, it should exit with a non-zero exit code, not keep running in some semi-operable state.

Example output of this weird timeout state:

$ kubectl port-forward svc/gofileserve 4001:4000
Forwarding from 127.0.0.1:4001 -> 4000
Forwarding from [::1]:4001 -> 4000
Handling connection for 4001
E0301 21:28:52.072825   20184 portforward.go:378] error copying from remote stream to local connection: readfrom tcp4 127.0.0.1:4001->127.0.0.1:54100: write tcp4 127.0.0.1:4001->127.0.0.1:54100: write: broken pipe
Handling connection for 4001
Handling connection for 4001
Handling connection for 4001
Handling connection for 4001
E0301 21:31:05.071949   20184 portforward.go:346] error creating error stream for port 4001 -> 4000: Timeout occurred
Handling connection for 4001
Handling connection for 4001
Handling connection for 4001
Handling connection for 4001
Handling connection for 4001
Handling connection for 4001
Handling connection for 4001
Handling connection for 4001
E0301 21:31:48.082873   20184 portforward.go:346] error creating error stream for port 4001 -> 4000: Timeout occurred
Handling connection for 4001
E0301 21:32:00.898307   20184 portforward.go:346] error creating error stream for port 4001 -> 4000: Timeout occurred
Handling connection for 4001
Handling connection for 4001
Handling connection for 4001
E0301 21:32:14.506840   20184 portforward.go:346] error creating error stream for port 4001 -> 4000: Timeout occurred
Handling connection for 4001
E0301 21:32:32.954398   20184 portforward.go:346] error creating error stream for port 4001 -> 4000: Timeout occurred
vangie commented 1 year ago

I encountered a similar problem on mac M1, where the websocket protocol was running on the forwarding port. There will be no forwarding interruption for the same service on the linux arm64 version of kubectl

E0310 15:49:30.434931   37797 portforward.go:406] an error occurred forwarding 7070 -> 7070: error forwarding port 7070 to pod 36a7b2dfefae7f1a5efdb5c2d4302f308373e04573ec4cf5fb7ad62f7e8be944, uid : failed to execute portforward in network namespace "/var/run/netns/cni-748379a4-44d1-3802-107a-8bc98f3e1fb4": readfrom tcp4 127.0.0.1:46016->127.0.0.1:7070: write tcp4 127.0.0.1:46016->127.0.0.1:7070: write: broken pipe
E0310 15:49:30.438039   37797 portforward.go:234] lost connection to pod
dvaldivia commented 1 year ago

@brianpursley I don't think it's curl alone, we noticed this happening when our web-app attempted to load a video for a background animation, so I think it has to do more with the content being returned than anything

andreldm commented 1 year ago

Sorry for not bringing any concrete evidence, but I'm experiencing the same problem while port fowarding kafka-ui, it's just doing boring rest calls under the hood. Using a mac M1 as well if that matters.

mooshmoosh commented 1 year ago

I'm also having this issue with Apache Superset on M1 Mac.

ravindk89 commented 1 year ago

https://github.com/containerd/containerd/pull/8418 <- there might be a fix in the pipeline for this, as per this comment

john-ramsey commented 1 year ago

I'm also having this issue with Apache Superset on M1 Mac.

Same here - M1 as well FWIW

gusandrioli commented 1 year ago

Same here for Grafana UI on Mac M1

dvaldivia commented 1 year ago

seems to be an issue happening on Apple Silicon mainly

dwt commented 1 year ago

Apple Silicon, grafana-ui, but also cockroach-ui triggers this for me. For local clusters via docker-desktop, but also remote x86 clusters.

On kubernetes-cli 1.27.4 this hangs after the first problem, on 1.28.2 this seems to at least recover after such a problem. So still excessive logging, but better behavior?

Do you guys consider this problem solved with that, or is the logging (see collapsed section) something you also want to fix?

Logs ``` ❯ kubectl port-forward svc/rancher-monitoring-grafana 8080:80 Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080 Handling connection for 8080 Handling connection for 8080 Handling connection for 8080 E0920 15:38:56.723419 70989 portforward.go:381] error copying from remote stream to local connection: readfrom tcp6 [::1]:8080->[::1]:58214: write tcp6 [::1]:8080->[::1]:58214: write: broken pipe E0920 15:38:56.723658 70989 portforward.go:394] error copying from local connection to remote stream: read tcp6 [::1]:8080->[::1]:58214: read: connection reset by peer Handling connection for 8080 Handling connection for 8080 Handling connection for 8080 Handling connection for 8080 Handling connection for 8080 Handling connection for 8080 E0920 15:39:04.363633 70989 portforward.go:381] error copying from remote stream to local connection: readfrom tcp6 [::1]:8080->[::1]:58221: write tcp6 [::1]:8080->[::1]:58221: write: broken pipe Handling connection for 8080 Handling connection for 8080 Handling connection for 8080 E0920 15:39:26.930397 70989 portforward.go:370] error creating forwarding stream for port 8080 -> 8080: Timeout occurred E0920 15:39:31.194304 70989 portforward.go:370] error creating forwarding stream for port 8080 -> 8080: Timeout occurred Handling connection for 8080 E0920 15:39:39.843494 70989 portforward.go:347] error creating error stream for port 8080 -> 8080: Timeout occurred Handling connection for 8080 E0920 15:39:40.720737 70989 portforward.go:370] error creating forwarding stream for port 8080 -> 8080: Timeout occurred Handling connection for 8080 E0920 15:40:09.845963 70989 portforward.go:347] error creating error stream for port 8080 -> 8080: Timeout occurred Handling connection for 8080 E0920 15:40:10.730176 70989 portforward.go:370] error creating forwarding stream for port 8080 -> 8080: Timeout occurred Handling connection for 8080 ```
debarshibasak commented 1 year ago

This issue happens on windows too for us.

hadifarnoud commented 12 months ago

I have the same issue with Postgres and pgadmin4. and doing a while true ; do nc -vz 127.0.0.1 8080 ; sleep 7 ; done did not help

allanrogerr commented 6 months ago

Still an issue

ravindk89 commented 6 months ago

https://github.com/containerd/containerd/issues/9875 maybe related? It seems like attempts to fix this have been ongoing.

ramondeklein commented 4 months ago

Not 100% sure if my issue is the same, but I have created a reproduction scenario that locks up the port forwarding every time. Both on Apple Silicon, Windows (running WSL2) and AKS (MS Azure). Full repo can be found here: https://github.com/ramondeklein/k8s-portforward-bug.

dvaldivia commented 4 months ago

I think @ramondeklein is also interesting as it shines more light on this problem @brianpursley in how the TCP streams seems to be the problem, but I don't see what in kubectl could have changed around the 1.20 release to have introduced this problem

surendragalwa commented 2 months ago

I have also encountered similar issue when running load tests on a forwarded port -

E0828 17:56:03.007294   81349 portforward.go:346] error creating error stream for port 8080 -> 8080: Timeout occurred
E0828 17:56:03.007362   81349 portforward.go:346] error creating error stream for port 8080 -> 8080: Timeout occurred
E0828 17:56:03.007439   81349 portforward.go:346] error creating error stream for port 8080 -> 8080: Timeout occurred
E0828 17:56:03.007578   81349 portforward.go:346] error creating error stream for port 8080 -> 8080: Timeout occurred
E0828 17:56:03.008054   81349 portforward.go:346] error creating error stream for port 8080 -> 8080: Timeout occurred
E0828 17:56:03.008098   81349 portforward.go:346] error creating error stream for port 8080 -> 8080: Timeout occurred
E0828 17:56:03.008137   81349 portforward.go:346] error creating error stream for port 8080 -> 8080: Timeout occurred
E0828 17:56:03.008248   81349 portforward.go:346] error creating error stream for port 8080 -> 8080: Timeout occurred
sufyansidqy commented 2 weeks ago

i encounter similar issue when trying to port-forward to my pod in gke. a day before it just worked fine, but somehow today the issue occurred. the pod contain reactJS web app which use port 3000, then tried to port-forward using clusterIP svc from local 3001 to 3000. in kubectl status it said

Forwarding from 127.0.0.1:3001 -> 3000
Forwarding from [::1]:3001 -> 3000
Handling connection for 3001
Handling connection for 3001

then tried to access the app from browser via url localhost:3001 but then kubectl command return below and port-forward stopped

E1030 09:21:32.332699   22324 portforward.go:413] an error occurred forwarding 3001 -> 3000: error forwarding port 3000 to pod 709d32b0aa4ff7f9aa2bf6ff22ea89a14e29cdc601257867bdb74567b1074dea, uid : failed to execute portforward in network namespace "/var/run/netns/cni-6069315d-1c77-3d70-818d-f89cef21540d": failed to connect to localhost:3000 inside namespace "709d32b0aa4ff7f9aa2bf6ff22ea89a14e29cdc601257867bdb74567b1074dea", IPv4: dial tcp4 127.0.0.1:3000: connect: connection refused IPv6 dial tcp6 [::1]:3000: connect: cannot assign requested address
E1030 09:21:32.332699   22324 portforward.go:413] an error occurred forwarding 3001 -> 3000: error forwarding port 3000 to pod 709d32b0aa4ff7f9aa2bf6ff22ea89a14e29cdc601257867bdb74567b1074dea, uid : failed to execute portforward in network namespace "/var/run/netns/cni-6069315d-1c77-3d70-818d-f89cef21540d": failed to connect to localhost:3000 inside namespace "709d32b0aa4ff7f9aa2bf6ff22ea89a14e29cdc601257867bdb74567b1074dea", IPv4: dial tcp4 127.0.0.1:3000: connect: connection refused IPv6 dial tcp6 [::1]:3000: connect: cannot assign requested address
error: lost connection to pod

still havent figured what is the issue.

just info that i have another pod running in GKE and port-forward is working just fine

PhamQuang-512 commented 2 weeks ago

same here for argocd

PhamQuang-512 commented 2 weeks ago

same here for argocd