Nginx Controller using endpoints instead of Services

Nalum commented 7 years ago

Is there any reason as to why the Nginx controller is set up to get the endpoints that a Service uses rather than using the Service?

When I run updates to a deployment and they get rolled out some requests are being sent to the pods that are being terminated which results in a 5XX error because nginx still thinks it's available. Doesn't the use of the Service take care of this issue?

I'm using this image: gcr.io/google_containers/nginx-ingress-controller:0.8.3

I'd be happy to look into changing this so it works with the Services rather than the Endpoints if it makes sense to everyone that it does that.

yissachar commented 7 years ago

The old docs explained:

The NGINX ingress controller does not uses Services to route traffic to the pods. Instead it uses the Endpoints API in order to bypass kube-proxy to allow NGINX features like session affinity and custom load balancing algorithms. It also removes some overhead, such as conntrack entries for iptables DNAT.

Nalum commented 7 years ago

Ah okay.

Thanks for that, I didn't find it in my searches.

Are there efforts being made to have it more responsive to the readiness and liveness checks do you know?

aledbf commented 7 years ago

Are there efforts being made to have it more responsive to the readiness and liveness checks do you know?

@Nalum yes. We need to release a final version before changing the current behavior. The idea (we need a POC first) is to use the kong load balancer code that allows the update of the nginx upstreams without requiring a reload.

Nalum commented 7 years ago

@aledbf anything I can do to help with this?

Nalum commented 7 years ago

Just dropping these comments here:

aledbf Slack 11:59 AM 2017-02-22 @nalum about that, I just need time :slightly_smiling_face: . We need to release 0.9 first. I hope after beta 2 https://github.com/kubernetes/ingress/pull/303 we can release 0.9. After that I will start looking the kong load balancer lua code. Basically this is the code we need to understand and “port” https://github.com/Mashape/kong/pull/1735

aledbf Slack 12:00 PM 2017-02-22 @nalum please take this just as an idea and POC. We really need to found a way to avoid reloads and better handling of nginx upstreams

rikatz commented 7 years ago

OK, I was taking a look at this:

https://github.com/yaoweibin/nginx_upstream_check_module

Maybe we can open a feature request to configure this, but I think this is not the case of this issue.

About using services instead of PODs, we also have to take in mind that serviceIPs are valid only inside a Kubernetes Cluster, while POD IPs might be valid in your network (using Calico or some other solution).

Anyway, ingress gets the upstream IP from Service (connect to service, watch the service and check for POD IP changes in that service) so it might be the case to health check the Upstream POD with the module above, instead of using the Service IP :)

@aledbf Is this the case to open a feature request using the upstream healthcheck module or should we keep this open?

Thanks!

Nalum commented 7 years ago

@rikatz correct me if I'm wrong, the brief look I took at that repo didn't show that it is using the kubernetes health checks, I would think it better to take advantage of those rather than adding additional checks. So have nginx be more responsive to the changes of the service.

aledbf commented 7 years ago

@rikatz adding an external module for this just duplicates the probes feature already provided by kubernetes.

aledbf commented 7 years ago

@rikatz what this issue is really about how we update the configuration (change in pods) without reloading nginx

rikatz commented 7 years ago

Yeap, Kube health check is the best approach. I was just thinking about an alternative :)

Will take a look in how to change the config without reload the nginx also.

rikatz commented 7 years ago

@aledbf and about this module: https://github.com/yzprofile/ngx_http_dyups_module

And ingress adding / removing upstreams based in HTTP request for this? This is the same approach that NGINX Plus uses to add/remove upstreams without reload / kill nginx process

aledbf commented 7 years ago

@rikatz I tried that module like a year ago (without success). Maybe we need to check again

rikatz commented 7 years ago

@aledbf I did some quick checks here and it worked 'fine'.

But the following situations still 'reloads' the nginx:

New vHost
New Path (changed Ingress)
New SSL Cert

I couldn't verify if, for each upstream change, NGINX got reloaded or if this doesn't happens.

So, is this module of upstreams still applicable to the whole problem? I can start writing a PoC of this approach (it includes changing the nginx.tmpl, the core and a lot of other things) but if you think it's still a valuable change, we can do this :D

Thanks

arikkfir commented 6 years ago

Hi @chrismoos, Would it be possible to also support this annotation in the Nginx configmap as well? We have quite a few Ingress resources and would love setting it in a centralized place instead of in each one... ?

aledbf commented 6 years ago

Closing. It is possible to choose between endpoints or services using a flag.

mbugeia commented 6 years ago

@aledbf Hi, can you tell me which flag should I use to use service-upstream by default ? I can't find it in the documentation or with -h.

jordanjennings commented 6 years ago

@mbugeia It's ingress.kubernetes.io/service-upstream and you can find that on this page: https://github.com/kubernetes/ingress-nginx/blob/master/configuration.md#service-upstream

montanaflynn commented 6 years ago

@jordanjennings that link is broken now, here's a working link:

https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/annotations.md#service-upstream

jesseshieh commented 6 years ago

@Nalum @aledbf sorry to comment on a closed issue, but I was wondering what the status of using "the kong load balancer code that allows the update of the nginx upstreams without requiring a reload" is. Was there any progress made on that at all?

aledbf commented 6 years ago

Was there any progress made on that at all?

No, sorry. One of the reason is that adding lua again implies that will work only in some platforms

rikatz commented 6 years ago

It might be a stupid question of mine, but are we 'HUP'ing the process? (http://nginx.org/en/docs/control.html)

I've seen also that we could send a 'USR2' signal (https://www.digitalocean.com/community/tutorials/how-to-upgrade-nginx-in-place-without-dropping-client-connections) but this might overload NGINX with stale connections, right?

aledbf commented 6 years ago

@rikatz the USR2 procedure create several new problems:

doubles the resource usage
hard to coordinate
if you have clients connections with keepalive you loose request (I already tried this approach two years ago)

mcfedr commented 6 years ago

This is likely improved by using https://github.com/kubernetes/ingress-nginx/pull/2174

jwatte commented 5 years ago

@montanaflynn that link is 404, here's the now-working link: https://github.com/kubernetes/ingress-nginx/blob/master/docs/user-guide/nginx-configuration/annotations.md#service-upstream (and that's very likely to rot, too -- google for nginx annotation service-upstream to find it whenever you next need this)

michaelajr commented 5 years ago

When I use the annotation I still see all the Pod IPs as endpoints. Shouldn't I see the Service's ClusterIp?

I see this:

Host  Path  Backends
  ----  ----  --------
  *     *     testapp:443 (192.168.103.2:443,192.168.203.66:443,192.168.5.3:443)

doubleyewdee commented 5 years ago

@michaelajr make sure you're using nginx.ingress.kubernetes.io/service-upstream -- the shorter form no longer works for nginx-specific ingress settings by default.

robbie-demuth commented 4 years ago

Does anyone know why the docs indicate that sticky sessions won't work if the service-upstream annotation is used? Wouldn't the session affinity configuration on the Kubernetes Service itself be honored in that case?

EDIT

It looks like this is due to the fact that Kubernetes session affinity is based on client IPs. By default, NGINX ingress obscures client IPs because the Service it creates sets externalTrafficPolicy to Cluster. It looks like setting it to Local instead might resolve the issue, but I'm not quite sure I understand what the implications are - particularly regarding imbalanced traffic spreading

https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip

zhangyuxuan1992 commented 4 years ago

@michaelajr make sure you're using nginx.ingress.kubernetes.io/service-upstream -- the shorter form no longer works for nginx-specific ingress settings by default.

I too use nginx.ingress.kubernetes.io/service-upstream: 'true' but till see all pod and ports instead of ClusterIp.

/*.* *:8423 (192.168.102.25:8423,192.168.149.225:8423)

VengefulAncient commented 3 years ago

I too use nginx.ingress.kubernetes.io/service-upstream: 'true' but till see all pod and ports instead of ClusterIp.

/*.* *:8423 (192.168.102.25:8423,192.168.149.225:8423)

Same here and I would really like to understand whether the directive is actually working. Right now it seems to me that it isn't.

cayla commented 2 years ago

For anyone finding this page years later, you can confirm the change is working by looking at the ingress-nginx logs for traffic to the ingress in question. By default, the logging will show the backend IP that handled the request. It's either an endpoint IP or service IP depending on the configuration.

https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/log-format/

narqo commented 1 year ago

Just so people coming to this issue considered it:

We tried rolling service-upstream = true setting to the nginx-ingress running in EKS cluster, and observed a huge imbalance in traffic handling per individual deployment's POD. I believe, this must have been expected, due to kube-proxy in iptables mode doing load-balancing randomly, and since we spread the deployments' PODs across the nodes and the AZs.

We've decided to roll back to use the default service-upstream = false, as shown on the screenshot below (no details on the screenshot, due the internal policies)

longwuyuan commented 1 year ago

@narqo thanks for the udpate

snigdhasambitak commented 1 year ago

@narqo did you get it working with service-upstream = true? Could you please specify the version of nginx ingress you are using and your k8s cluster details? It seems to be not working in 1.22

narqo commented 1 year ago

did you get it working with service-upstream = true?

Yes it worked in our tests (AWS EKS 1.21, ingress-nginx v1.3.1, deployed and configured via Helm). We confirmed that the change (and the rollback of the change) took the effect by observing the changes in nginx controller's access logs, where we log the details of the request's upstream.

sherifabdlnaby commented 1 year ago

Just so people coming to this issue considered it:

We tried rolling service-upstream = true setting to the nginx-ingress running in EKS cluster, and observed a huge imbalance in traffic handling per individual deployment's POD. I believe, this must have been expected, due to kube-proxy in iptables mode doing load-balancing randomly, and since we spread the deployments' PODs across the nodes and the AZs.

We've decided to roll back to use the default service-upstream = false, as shown on the screenshot below (no details on the screenshot, due the internal policies)

Interesting, would using Toplogy Aware Hints with Toplogy Spread Constraint (for Nginx pods as well as deployment) help ?

taliastocks commented 1 year ago

Just so people coming to this issue considered it:

We tried rolling service-upstream = true setting to the nginx-ingress running in EKS cluster, and observed a huge imbalance in traffic handling per individual deployment's POD. I believe, this must have been expected, due to kube-proxy in iptables mode doing load-balancing randomly, and since we spread the deployments' PODs across the nodes and the AZs.

We've decided to roll back to use the default service-upstream = false, as shown on the screenshot below (no details on the screenshot, due the internal policies)

Were you able to have any luck with using preStop lifecycle hooks on your service pods to delay long enough for nginx to pick up the endpoint configuration change? Were you able to adjust any nginx settings, like proxy_next_upstream to avoid failed requests during a service upgrade?

I've been playing around with these myself with no luck (the service backend is gRPC, so we would lose service call load balancing if we enabled service-upstream=True, similar to what you show in your chart).

narqo commented 1 year ago

Were you able to have any luck with using preStop lifecycle hooks on your service pods to delay long enough for nginx to pick up the endpoint configuration change?

This is exactly the issue we were looking at, when we experimented with service-upstream=true. The preStop hook for every deployment (initially raised in https://github.com/kubernetes/ingress-nginx/issues/7330) isn't a scalable solution, for dynamic production with hundreds of µ-services behind the ingress. We are looking for better alternatives.

taliastocks commented 1 year ago

Were you able to have any luck with using preStop lifecycle hooks on your service pods to delay long enough for nginx to pick up the endpoint configuration change?

This is exactly the issue we were looking at, when we experimented with service-upstream=true. The preStop hook for every deployment (initially raised in #7330) isn't a scalable solution, for dynamic production with hundreds of µ-services behind the ingress. We are looking for better alternatives.

Honestly, I would even be happy with like a 5-10 second preStop hook, but even 30 seconds preStop delay + 30 seconds connection draining was not consistently long enough to prevent seeing DEADLINE_EXCEEDED.

The next thing I'm planning on trying is setting nginx.ingress.kubernetes.io/proxy-connect-timeout: "1" and relying on proxy_next_upstream behavior. It's not ideal because it would still add a second latency to some requests, but that's not a complete deal-breaker for our use-case.

taliastocks commented 1 year ago

Honestly, I would even be happy with like a 5-10 second preStop hook, but even 30 seconds preStop delay + 30 seconds connection draining was not consistently long enough to prevent seeing DEADLINE_EXCEEDED.

The next thing I'm planning on trying is setting nginx.ingress.kubernetes.io/proxy-connect-timeout: "1" and relying on proxy_next_upstream behavior. It's not ideal because it would still add a second latency to some requests, but that's not a complete deal-breaker for our use-case.

Welp, turns out the actual problem I was running into was related to new pods coming up slowly rather than old pods terminating. Apologies for misidentifying the issue!

dcodix commented 9 months ago

Just so people coming to this issue considered it:

We tried rolling service-upstream = true setting to the nginx-ingress running in EKS cluster, and observed a huge imbalance in traffic handling per individual deployment's POD. I believe, this must have been expected, due to kube-proxy in iptables mode doing load-balancing randomly, and since we spread the deployments' PODs across the nodes and the AZs.

We've decided to roll back to use the default service-upstream = false, as shown on the screenshot below (no details on the screenshot, due the internal policies)

We observed the same behaviour. This seems to happen because nginx uses "long lived" (keepalive) connections, so it does just 1 connection to the service IP, which DNATs to 1 pod IP, and then all the requests go thru it up to a maximum number. That number seems to be defined by upstream-keepalive-requests which I believe defaults to 10k requests, so, only after 10k requests a new TCP connection will be established. This lowers the number of TCP connections by x/10k so, if you have 1k requests per minute, you will not establish a new connection for 10 minutes (unless it idles). kube-proxy (iptables) does a good job at simulation roundrobin, but only with a big enough volume. Like, if you actually do 20k TCP connections via iptables, it will most likely look balanced, but if instead you do 2 chances are that it will not balance. We were thinking if it was worth leaving service-upstream: true but lowering upstream-keepalive-requests so more TCP connections are made, but so far we did not because we did not have time to test the impact of this change in performance and resources in general.

kubernetes / ingress-nginx

Nginx Controller using endpoints instead of Services #257