Open bprashanth opened 7 years ago
@balooo: Hello and thank you for your time today :-)
This sounds good to me. I'll talk to the team and see what they think. We are going to be prod live very soon so we'll have to stick with this project for at least the short term. We still have some work planned for feed that we need to do very soon (moving to client-go, using endpoints instead of services), after that we can look into switching to upstream.
fyi the upstream interface should give you something like a snapshot of urls+services+endpoints from its local cache on each update notification from apiserver, so you can pick if you want to go straight to svc vip, svc dns, svc nodeport or endpoints
@bprashanth We're probably going to back away from adding endpoints to nginx. The problem is nginx reload is quite problematic when done frequently (such as would happen with endpoint updates):
So I think using endpoints with nginx is going to be a bad idea. The problem of persistent connections + endpoint updates needs to be solved at the kube-proxy layer ideally.
I'd expect both sides to follow the tcp protocol? As long as nginx sends a fin and does the normal close handshake, the client should tear down the connection, keepalive or not, unless there's a stateful NAT inbetween that disagrees with tcp timeouts?
Maybe you can describe what you see on a tcpdump in a controlled environment (or is this too hard to hit in testing)?
We're using an ELB in front of nginx, and we'll see ELB 504s getting returned while under load when nginx does a reload. It isn't very many though, because ingress updates are pretty rare. As the ELB is a black box I can't see what it's doing.
I will try to duplicate this and capture logs to understand it better.
I think something else may be going wrong, so I'll investigate further. Could be an ELB bug. I think somewhere in the ELB docs it says the ELB needs to close the backend connections. Akamai has a similar restriction.
@bprashanth tcpdump shows the nginx host sending tons of tcp resets to the ELB when a reload occurs. These translate directly into 504s returned from the ELB. I can reproduce as well hitting nginx directly with wrk using http/1.1 keepalives, I get connection resets. It seems so far nginx reload isn't as graceful as one would hope. I would expect it to close its connections gracefully, but so far it seems nginx simply closes the socket when idle rather than wait for the connection to terminate on the ELB/client side.
how are you reloading it? (http://nginx.org/en/docs/control.html)
We send a HUP to the master process. I'm pretty sure this is all expected, http 1.1 rfc 2616 says clients should be able to handle async closes (non synchronous / graceful). Otherwise nginx workers would have to wait for some grace period to be sure the connections close.
I spoke with AWS support and they suggested I use the new application load balancers instead. I tried one out, and indeed it works fine with nginx reloads - not a single 504. Apparently the ALBs don't reuse persistent connections, instead it looks like they batch multiple requests through a connection then close it. That seems to solve the problem for us - so with ALB I can start updating endpoints with confidence :).
RST is still kind of rude, I get the feeling there's a more obvious fin/fin-wait-timeout option but I'd have to dig up nginx config a bit. Great that you have it working. Unfortunately Service.Type=LoadBalancer doesn't setup an ALB, so maybe this is how we should finally manifest Ingress on AWS?
Right now it's an ELB created by Service.Type=LB over the nginx ingress controller. Is setting up an ALB just like setting up an ELB + one last step to create something like a url map? or are all the resources in the pipeline completely different (as is the case on GCE)
ALB is very similar to an ELB it seems, I didn't have to set up any url mapping when I did it through the console. Unfortunately it is a different API so I'll have to modify feed to use it.
If you find a way to change nginx's behaviour please let me know, I spent a lot of time looking at config options and its source code, and it seems to do as I describe - closes socket immediately.
Apparently you can improve things by issuing a USR2 followed by WINCH, which will actually wait for connections to drain, but USR2 doesn't seem to work in containers.
Is ALB similar enough to ELB that we can have the service controller create it instead of the latter based off an annotation on the Service of Type=LoadBalancer?
On Tue, Dec 20, 2016 at 9:51 AM, James Ravn notifications@github.com wrote:
ALB is very similar to an ELB it seems, I didn't have to set up any url mapping when I did it through the console. Unfortunately it is a different API so I'll have to modify feed to use it.
If you find a way to change nginx's behaviour please let me know, I spent a lot of time looking at config options and its source code, and it seems to do as I describe - closes socket immediately.
Apparently you can improve things by issuing a USR2 followed by WINCH, which will actually wait for connections to drain, but USR2 doesn't seem to work in containers.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sky-uk/feed/issues/111#issuecomment-268310394, or mute the thread https://github.com/notifications/unsubscribe-auth/AKa-zMotgZMaAv2SwGO_Gr8ZRNQkYTBXks5rKBWwgaJpZM4LG6EJ .
I think the ALB is layer7 only, so you couldn't do TCP load balancing with it.
I only tested with layer7 ELB, so for me ALB was a drop in replacement.
Hmm, then maybe when you create an Ingress on AWS, if you don't specify
ingress.class=nginx
, you get an ALB with a catch all route pointing to an
nginx, that does the actual proxying based on rules in the map. Ideally
creating an Ingress on AWS should just work, no surprises, which apparently
isn't the case right now since people are shoving a Service of Type=LB/ELB
in front.
On Tue, Dec 20, 2016 at 10:13 AM, James Ravn notifications@github.com wrote:
I think the ALB is layer7 only, so you couldn't do TCP load balancing with it.
I only tested with layer7 ELB, so for me ALB was a drop in replacement.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sky-uk/feed/issues/111#issuecomment-268315610, or mute the thread https://github.com/notifications/unsubscribe-auth/AKa-zBqRbAGs-0iaCpc64aiz_uPkU0Avks5rKBq1gaJpZM4LG6EJ .
Ideally creating an Ingress on AWS should just work, no surprises, which apparently isn't the case right now since people are shoving a Service of Type=LB/ELB in front.
What is the alternative?
ingress controller that spins up an ALB, like the gce one? You can also tier them, so you have an ALB -> nginx ingress. or maybe we can find a way to setup a catch all ALB instead of an ELB via annotations on the Service (not sure how similar they are in setup)
Just an update:
Update #2 - as ingress doesn't reload very often, and the percentage of dropped requests is quite low, we haven't prioritised haproxy work. Otherwise the nginx proxy works great.
Since this is based on nginx, and we have an nginx backend (https://github.com/kubernetes/ingress/tree/master/controllers/nginx), and we're working on a generic controller with pluggable backends (https://github.com/kubernetes/ingress/blob/master/core/pkg/ingress/doc.go#L19), it seems like a good way to collaborate, for greater justice.
Even if you don't feel like it's worth your time right now, it might make sense to streamline how feed and the standard nginx backend handle nginx specific features.