akutz commented 3 years ago

What would you like to be added:

It would be nice if the Service API spec supported some version of the externalTrafficPolicy field from the existing services spec:

    // externalTrafficPolicy denotes if this Service desires to route external
    // traffic to node-local or cluster-wide endpoints. "Local" preserves the
    // client source IP and avoids a second hop for LoadBalancer and Nodeport
    // type services, but risks potentially imbalanced traffic spreading.
    // "Cluster" obscures the client source IP and may cause a second hop to
    // another node, but should have good overall load-spreading.
    // +optional
    ExternalTrafficPolicy ServiceExternalTrafficPolicyType `json:"externalTrafficPolicy,omitempty" protobuf:"bytes,11,opt,name=externalTrafficPolicy"`

Why is this needed:

Because the existing services resource supports it, and it makes sense to implement this in Service APIs as well. While it's true that the presence of a health check port (#97) alone would allow a back-end load balancer platform to determine whether or not an endpoint is healthy, i.e. whether a pod is running on a given node in the Kubernetes cluster. This would allow:

Load balancer algorithms to be more accurate as only the nodes on which a pod is running would be considered valid endpoints
The preservation of the client's source IP instead of a node in a cluster forwarding traffic to another node where a pod is running

To understand how the load balancer would interact with this health check, here is an example:

Create a Kind configuration file that specifies a cluster with one control plane node and two worker nodes:

cat <<EOF >kind.conf
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker
EOF

Create a new Kind cluster:

kind create cluster --config kind.conf --kubeconfig kube.conf

Persist the IP address of each node in the cluster to a file:

kubectl --kubeconfig=kube.conf get nodes -ojsonpath='{range .items[*]}{.status.addresses[0].address}{"\n"}{end}' >"node-addrs.txt"

Deploy Nginx to the new cluster and provide a service for the application:

cat <<EOF | kubectl --kubeconfig=kube.conf apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: default
  labels:
    app: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  selector:
    app: nginx
  ports:
  - port: 80
  externalTrafficPolicy: Local
  type: LoadBalancer
EOF

Get the port used to perform a health check on the service:

while [ -z "${PORT}" ]; do \
  export PORT="$(kubectl --kubeconfig=kube.conf get service nginx -ojsonpath='{.spec.healthCheckNodePort}')"; \
done

Query the service's health port to determine the node on which the pod is running:

for ip in $(cat "node-addrs.txt"); do \
  echo "checking node ${ip}" && \
  docker run --rm photon curl -sS "http://${ip}:${PORT}" && \
  printf '\n\n'; \
done

This will result in output similar to the following:

checking node 172.17.0.4
{
    "service": {
        "namespace": "default",
        "name": "nginx"
    },
    "localEndpoints": 0
}

checking node 172.17.0.2
{
    "service": {
        "namespace": "default",
        "name": "nginx"
    },
    "localEndpoints": 1
}

checking node 172.17.0.3
{
    "service": {
        "namespace": "default",
        "name": "nginx"
    },
    "localEndpoints": 0
}

In the above example, the load balancer platform would be able to set up an L7 health check and determine that only node 172.17.0.2 is valid, as that is where the pod is currently running.

There are few alternatives to supporting this:

Dynamically update route endpoints -- this is not desired because there may be time between a pod moving between nodes and a controller updating the routes associated with a gateways resource's listener. Not to mention updating the backend load balancer platform to point to the new endpoints for a listener.
Add all cluster node IP addresses -- this would work, but it's not nearly as efficient as supporting a distinct health port since it will take time for the load balancer to determine whether or not an endpoint is heatlhy by dropped traffic. Whereas a proper health check will happen right away.

Related to #97

maplain commented 3 years ago

@akutz see https://github.com/kubernetes-sigs/service-apis/issues/196 for service level configuration related discussion.

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

robscott commented 3 years ago

/remove-lifecycle stale

fejta-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale

k8s-triage-robot commented 3 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

hbagdi commented 3 years ago

/lifecycle frozen

shaneutt commented 2 years ago

While grooming we saw that this one was open for a long period of time without anyone with a strong use case to champion it. We're going to close this as we don't expect anyone's ready to drive it forward, but if you still want this feature and have a strong use case we will be happy to reconsider this and re-open.

jorhett commented 1 year ago

Please re-open this. Not acting on this forces us to patch the service after every deployment--which is an anti-pattern--as documented here https://istio.io/latest/docs/tasks/security/authorization/authz-ingress/#network

The Kubernetes model is to be declarative. Hack-patching imperatively after every apply is definitely not. Even using an admission controller to do it is WTF hacking. I don't see why this change is so objectionable.

196 mentions this issue as being unsatisfactory, but was closed without solving this issue.

youngnick commented 1 year ago

@jorhett, this issue is super old and we're not handling these concepts in the same way at all any more. Because of that, this issue won't be reopened as it stands.

However, if you would like us to handle a use case, I suggest opening an issue describing the outcome you want, and we can talk more there.

jorhett commented 1 year ago

Can you explain what you're looking for? Because "allow me to set externalTrafficPolicy: local" is pretty clear and unambiguous. We require the original client IP, and this is the only way to get it.

In fact, every word in this issue description is valid AFAICT because he's describing the exact same problem. I don't see how you having changed whatever you changed makes this request invalid, as it's still not possible now.

youngnick commented 1 year ago

Okay, your requirement is not externalTrafficPolicy: local then, it's "I want to be able to get the original client IP", presumably for layer 4 connections, since it's completely possible for HTTP connections using X-Forwarded-For or similar mechanisms.

I agree, that is not currently possible, but trying to reopen a three year old issue that proposes a solution, not a problem, is not a productive way of engaging with the community.

The API is no longer called the "Services API" for a reason - because we decided to make the constructs more general than only Kubernetes Services. It's called the Gateway API now because we intended there to be implementations that don't use a Service for describing how Layer 4 traffic gets into a cluster. That's what Gateways and Listeners are for.

Now, is there space for a GEP (a Gateway Enhancement Proposal) that describes how a Gateway or Listener can request a similar behavior to ExternalTrafficPolicy: local for a Loadbalancer Service? Absolutely.

But the way that the spec will be defined won't be what's listed in the initial comment here, because the constructs have totally changed in the three years since this issue was logged.

As I said earlier, if you really need this feature, logging a new issue is the way to go.

kubernetes-sigs / gateway-api

Support for ExternalTrafficPolicy #451

196 mentions this issue as being unsatisfactory, but was closed without solving this issue.