Make upstreams route only to nodes known to contain ingress pods

4141done commented 10 months ago

Is your feature request related to a problem? Please describe

Currently NLK populates the host/port of all nodes in the cluster to the NGINX upstream block. If a request goes to a node that does not have a pod running the NGINX Ingress controller, kube-proxy will take care of getting it there. However, it seems that it would be better to route the request to the ingress-containing node to begin with since we have that information.

Describe the solution you'd like

When we receive updates about a new configuration of the cluster/controller, NLK should compare the node list to the locations of the ingress pods, and only send those upstreams to NLK. That means upstreams should change when:

We scale the ingress
When the ingress is deployed for the first time
When we add nodes to the cluster only if this creates changes to the ingress pods
Ingress pods are rebalanced across nodes.

If there are no nodes containing ingress pods, perhaps it is best then to send all notes and dispatch an event about the trouble. That way at least we consistently send traffic to the cluster and don't drop it if we have some bug/race condition in figuring out ingress pod assignments.

Describe alternatives you've considered

The current operation works fine due to kube-proxy picking up the slack. I don't see any other ways of achieving this.

Additional context

oguzhancelikarslan commented 6 months ago

@4141done

Currently NLK populates the host/port of all nodes in the cluster to the NGINX upstream block. If a request goes to a node that does not have a pod running the NGINX Ingress controller, kube-proxy will take care of getting it there.

In my cluster, kube-proxy do not pass the traffic as expected.

brianehlert commented 6 months ago

Can you please elaborate? How are you deploying your ingress controller service? Or exposing your ingress deployment?

oguzhancelikarslan commented 5 months ago

I solved my problem by changing externalTrafficPolicy to Cluster from Local. I think we should add this to the documentation. If someone doesn't change this to Cluster, NLK will not work as expected.

I deployed NGINX Ingress through Helm, where Local is the default configuration.

When we set this value to 'Local,' if the traffic is directed to a node where no NGINX Ingress controller pod exists, kube-proxy will not forward this traffic to a node where an NGINX Ingress controller does exist. @ciroque

@brianehlert @4141done

ciroque commented 5 months ago

I solved my problem by changing externalTrafficPolicy to Cluster from Local. I think we should add this to the documentation. If someone doesn't change this to Cluster, NLK will not work as expected.

I deployed NGINX Ingress through Helm, where Local is the default configuration.

When we set this value to 'Local,' if the traffic is directed to a node where no NGINX Ingress controller pod exists, kube-proxy will not forward this traffic to a node where an NGINX Ingress controller does exist. @ciroque

@brianehlert @4141done

Nice find, thoughts on where in the documentation this should be?

chrisakker commented 2 months ago

This is exactly how NodePort in Kubernetes works, it opens the same high-numbered socket port on all K8s nodes, you can't change that unless you install some non-standard CNI or do some serious modifications to how Kubernetes networking works. This would likely result in a non-standard architecture.

You don't know which Kubernetes node the scheduler will choose for the Ingress controller. Again, unless you modify deployment options and move toward a static configuration, also not recommended unless you really know what you are doing. NLK is not aware of the NIC deployment details. It is intentionally designed to look at the nginx-ingress Service object, which represents one or more NIC pods/endpoints.

You don't know how someone will build or change their Local Traffic Policy, so it's not defined as a pre-requisite. Some use Cluster, some use Local. We could add a note that Cluster is required.

However, that aside, this is addressed outside the cluster with the Plus LB server. Using the "least_time last_byte" LB algorithm on the external Plus server upstream block, will tell Nginx Plus to prefer the fastest kubernetes nodeport, and it will heavily favor the fastest node. The greater the delta in response times of the nodeports, the greater the favor to the fastest worker node. ( you can see this realtime in the Plus dashboard ). This is why least time last byte is an essential setting for the NLK upstream block. And, If you deploy more than one ingress controller replica, or even DaemonSet, then you might have two workers faster than the others, and Nginx Plus will send the majority of the Requests to the 2 fastest workers where NIC is running, also visible in the Plus dashboard. This algorithm is very dynamic, and responds very well to the dynamic, ever-changing environment running in the cluster at all times. Try a load test, and experiment, and you will see it to be true. Not all clusters, workers, ingresses, services, and pods are the same, so I would argue an adaptive LB algorithm is necessary, if your goal is to provide the quickest response time to every single request.

nginxinc / nginx-loadbalancer-kubernetes