projectcontour / contour

Contour is a Kubernetes ingress controller using Envoy proxy.
https://projectcontour.io
Apache License 2.0
3.71k stars 673 forks source link

Envoy breaks downloads after 30 seconds #3156

Closed jaysonsantos closed 3 years ago

jaysonsantos commented 3 years ago

What steps did you take and what happened: [A clear and concise description of what the bug is.] If you host an nginx for example with a big file or the client has a slow connection, envoy will eventually kill the connection after around 30 seconds.

What did you expect to happen: The file should be slowly downloaded

Anything else you would like to add: I've set up a basic example here https://github.com/jaysonsantos/contour-envoy-bug (it only overrides the host port to null because I had traefik on my cluster) as you already have configured contour on your side, I guess you can just install the nginx deployment

Environment:

jpeach commented 3 years ago

Sounds like you may need to tweak timeouts

youngnick commented 3 years ago

Thanks for the repo @jaysonsantos, I will see what I can do about getting it setup in a test environment.

However, I agree with @jpeach that you're probably hitting a timeout, but I'm not sure what one it is .Envoy handles timeouts very differently to nginx, as it allows more granular control over many of them, and many of the timeouts are tuned for usage as a sidecar by default. Does this only occur with the download speed limited to 10k? If you don't limit the speed, does the download work?

Envoy has at least four timeouts that could be impacting this, as far as I can see. There's two different idle timeouts, by stream and connection, and some overall timeouts (which probably aren't relevant). We try to set good defaults for all of these timeouts, but it could be that they don't work very well for very slow downloads.

youngnick commented 3 years ago

Additionally, could you please explain more about how you're using traefik? Is it inline for this request?

jaysonsantos commented 3 years ago

Hey there @youngnick I've based the route on these values https://projectcontour.io/docs/main/configuration/#timeout-configuration which says that request timeout if not specified is unlimited. Should I try specifying a timeout rule on the route? My first attempt was with Ingress instead of HTTPProxy but it also breaks down the download. The problem only happens if the download takes more than 20-30 seconds, the limit-rate is just to enforce that it will not download at full speed on a loopback for example. Wouldn't these idle timeouts only affect if the connection goes stale? To remove other problems, I've set up a civo.com account without traefik and put only contour there so no other ingress would interfere and the problem still occurs. I've also tested ingress-nginx and it works fine with the slow downloads there. I will setup grafana on that civo account to try and get the metric that envoy reports (i forgot the name but it was something that said that the upstream dropped the connection). I will also try and get trace logs with only this test running to eliminate other variables.

jaysonsantos commented 3 years ago

I guess that with Ingress I don't have much control over it but with HTTPProxy adding a huge timeout on the route seems to fix the problem. Is it of interest mentioning on the docs that you cannot configure request timeout on Ingress? Or am I mistaken?

jaysonsantos commented 3 years ago

Well, it seems that I missed this part of the docs that shows an annotation for ingress https://projectcontour.io/docs/v1.10.0/config/annotations/#contour-specific-ingress-annotations I guess I got confused by this one https://projectcontour.io/docs/main/configuration/#timeout-configuration Thank you anyway!