All requests that reach pods have the load balancer's IP

milosmns commented 2 years ago

Hey, to avoid copy-pasting the same question, here's the StackOverflow link.

Basically I want my pods to get the original client IP address... or at least have X-Forwarded-For header, in a worse-case scenario. I used this guide to set up my cluster.

As I said there, happy to share more details to get this sorted out.

mamiu commented 2 years ago

As stated in this comment:

Klipper-lb does not change source IP. it supports externalTrafficPolicy: Local.

I guess you're using k3s as Kubernetes distribution and probably Traefik as your cluster router (default router for k3s).

If you're using Traefik, here's how you get the original client IP address (for other routers, the settings may be a bit different, but the same logic still applies):

Set externalTrafficPolicy to Local. If you use the Traefik helm chart you can set the values to:
```
service:
  spec:
    externalTrafficPolicy: Local
```
If you have multiple nodes make sure that your router is running on the node where you send the traffic to. Let's say you have the domain example.com and it points to your cluster node with the IP 123.4.5.67 (e.g. via a DNS A record). Then you only have to make sure that your router (the Traefik instance) is running on this node. In the Traefik helm chart you can achieve that with the nodeAffinity config, for example:
```
affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 10
        preference:
          matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values: [ "node2" ]
```
You could even have multiple nodes in your nodeAffinity list with different weights (the higher the weight the more likely it will be deployed on that node), e.g.:
```
affinity:
  nodeAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 20
        preference:
          matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values: [ "node1" ]
      - weight: 10
        preference:
          matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values: [ "node2" ]
```
Just replace the node name in the values array and adjust the weight to your needs.

To get the kubernetes.io/hostname label for each of your nodes you can run this command (in most cases the node name and kubernetes.io/hostname label are identical):
```
kubectl get nodes -o custom-columns="NAME:.metadata.name,LABEL (kubernetes.io/hostname):{.metadata.labels.kubernetes\.io/hostname}"
```

For more details have a look at this article: K3S Thing: Make Traefik Forward Real Client IP
The only problem with that article is that it only offers a DaemonSet as solution (instead of the default deployment of kind Deployment) which prevents the user from using Traefik to generate SSL certificates (acme certificate resolvers in Traefik are only available in Deployment mode).

milosmns commented 2 years ago

As per the guide's 3rd step, I disabled Traefik and was using Nginx Ingress Controller

mamiu commented 2 years ago

Ah sorry, I didn't read that guide. But you can still use my answer to fix your problem. Just make sure that externalTrafficPolicy is set to Local (as documented here for nginx) and your nodeAffinity is set as described in my comment above (here's how you set the affinity in the nginx helm chart).

Taymindis commented 2 years ago

I am running one node. Set To Local still getting svclb ip. I think it's a bug?

mamiu commented 2 years ago

@Taymindis Where and how did you set the externalTrafficPolicy to Local? Because if you do it at runtime you have to restart Traefik. And how do you check it? With the whoami container from containous?

Taymindis commented 2 years ago

@Taymindis Where and how did you set the externalTrafficPolicy to Local? Because if you do it at runtime you have to restart Traefik. And how do you check it? With the whoami container from containous?

Hi @mamiu , I am running bare-metal k3s without traefik on ubuntu vm.

My Steps of procedure

Uninstall Ingress-nginx from Helm
Reinstall helm again with update values.yaml to Local

service:
enabled: true

# -- If enabled is adding an appProtocol option for Kubernetes service. An appProtocol field replacing annotations that were
# using for setting a backend protocol. Here is an example for AWS: service.beta.kubernetes.io/aws-load-balancer-backend-protocol: http
# It allows choosing the protocol for each backend specified in the Kubernetes service.
# See the following GitHub issue for more details about the purpose: https://github.com/kubernetes/kubernetes/issues/40244
# Will be ignored for Kubernetes versions older than 1.20
##
appProtocol: true

annotations: {}
labels: {}
# clusterIP: ""

# -- List of IP addresses at which the controller services are available
## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
##
externalIPs: []

# loadBalancerIP: ""
loadBalancerSourceRanges: []

enableHttp: true
enableHttps: true

## Set external traffic policy to: "Local" to preserve source IP on providers supporting it.
## Ref: https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-typeloadbalancer
externalTrafficPolicy: "Local"

And I have a app pod which echo back the client IP back when we service a specific url.

Please note that I am not using traefik, I am using kubernetes/ingress-nginx

jeroenrnl commented 2 years ago

Klipper-lb does not change source IP. it supports externalTrafficPolicy: Local.

Sorry, but that is simply not true: entry:

iptables -t nat -I POSTROUTING -d ${dest_ip}/32 -p ${DEST_PROTO} -j MASQUERADE

This configures iptables to do exactly that.

The whole thing about getting the router to run on the same node as the end service indeed resolves this, but it defies loadbalancing quite a bit...

mamiu commented 2 years ago

The whole thing about getting the router to run on the same node as the end service indeed resolves this, but it defies loadbalancing quite a bit...

@jeroenrnl I 100% agree with that! But haven't found a good alternative solution yet.

And I have a app pod which echo back the client IP back when we service a specific url.

@Taymindis Your router (in your case nginx) will get the correct client IP address but then has to translate it so that your app pod (which echoes the client IP address) is sending the traffic back to the router. So your router can't send the request with your client IP address otherwise your app will try to respond to the client IP directly without taking the extra step through the router and that's not how networking works (for more details watch this YouTube video). To solve this issue, load balancers, routers, reverse proxies, etc. use a special HTTP header called X-Forwarded-For. Most routers (including ingress-nginx, see here) support this HTTP header and many applications treat the value passed to that header as the client IP address (or at least have the option to enable it).

au2001 commented 2 years ago

The PROXY protocol as defined on haproxy.org provides a solution to this issue. It allows to maintain proper load balancing while maintaining access to the client IP, and isn't specific to HTTP.

It requires that Klipper LB and the backend of the service be compatible with that protocol. Traefik and nginx already support it, amongst many others.

Maybe this could get implemented in Klipper LB under a togglable flag? Although, it would require a much more complex setup than the current iptable rules which currently do not allow to add the proper headers.

sandys commented 2 years ago

@mamiu is there way to achieve your solution in a HA (high availability k3s setup) ? im talking about this - https://rancher.com/docs/k3s/latest/en/installation/ha-embedded/

would it mean that the masters have traefik running on them with externalTrafficPolicy: Local ? unsure on how to achieve this then

mamiu commented 2 years ago

@sandys Yes it's definitely possible. But it only works if the Traefik instance runs on the node where you send the traffic to. As explained in my comment up here.

pdefert commented 2 years ago

@sandys @mamiu
I have been looking for 3 weeks. I observed the behavior described by mamiu. the ip client is preserved if we arrive on the klipper which is on the same node as the traefik service.

This is a major problem for a loadbalancer especially with the default helm configuration. It creates an instance of the traefik service and as many klipper services as there are nodes.

Perhaps a solution would be to have a traefik service on each node and each klipper instance pointing to the nearby traefik pod.

However, this is beyond my capabilities at the moment. I am not comfortable with helm

MikaelElkiaer commented 1 year ago

Anyone still dealing with this?

I am, but am at my wits end. Running a fairly simple, well-documented set-up (https://github.com/MikaelElkiaer/flux-twr). It is rootless. I cannot get the proper client IP in my Traefik requests.

dakota-marshall commented 1 year ago

As stated in this comment:

Klipper-lb does not change source IP. it supports externalTrafficPolicy: Local.

I guess you're using k3s as Kubernetes distribution and probably Traefik as your cluster router (default router for k3s).

If you're using Traefik, here's how you get the original client IP address (for other routers, the settings may be a bit different, but the same logic still applies):

1. Set `externalTrafficPolicy` to `Local`. If you use the [Traefik helm chart](https://github.com/traefik/traefik-helm-chart/tree/master/traefik) you can set the values to:
   ```yaml
    service:
      spec:
        externalTrafficPolicy: Local
   ```

2. If you have multiple nodes make sure that your router is running on the node where you send the traffic to. Let's say you have the domain `example.com` and it points to your cluster node with the IP 123.4.5.67 (e.g. via a DNS A record). Then you only have to make sure that your router (the Traefik instance) is running on this node. In the Traefik helm chart you can achieve that with the `nodeAffinity` config, for example:
   ```yaml
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 10
            preference:
              matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values: [ "node2" ]
   ```

   You could even have multiple nodes in your `nodeAffinity` list with different weights (the higher the weight the more likely it will be deployed on that node), e.g.:
   ```yaml
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 20
            preference:
              matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values: [ "node1" ]
          - weight: 10
            preference:
              matchExpressions:
                - key: kubernetes.io/hostname
                  operator: In
                  values: [ "node2" ]
   ```

   Just replace the node name in the values array and adjust the weight to your needs.
   To get the `kubernetes.io/hostname` label for each of your nodes you can run this command (in most cases the node name and `kubernetes.io/hostname` label are identical):
   ```shell
   kubectl get nodes -o custom-columns="NAME:.metadata.name,LABEL (kubernetes.io/hostname):{.metadata.labels.kubernetes\.io/hostname}"
   ```

For more details have a look at this article: K3S Thing: Make Traefik Forward Real Client IP The only problem with that article is that it only offers a DaemonSet as solution (instead of the default deployment of kind Deployment) which prevents the user from using Traefik to generate SSL certificates (acme certificate resolvers in Traefik are only available in Deployment mode).

I am unable to replicate this behavior on my multi-node k3s cluster. I have setup Traefik's affinity to the correct node, and can confirm that it scheduled on the right node, and have externalTrafficPolicy set correctly, but the Traefik access logs dont show the real IP.

I also have Pi hole DNS running on a LoadBalancer on port 53, and can confirm that it is also unable to see the real IP when getting requests. Even if it did work, having to schedule all of my pods on a single node for the sake of proper logging feels like it defeats the purpose of having a Load Balancer in the first place.

Is there a way to get this working? It's entirely possible I am missing something here.

mamiu commented 1 year ago

@dakota-marshall I'd recommend you not to use klipper (the default load balancer of K3s), but instead have a Traefik instance running on each node that listens to external traffic. I know this will prevent you from using Traefik's Let's Encrypt integration, but if you want that you can just use Switchboard.

dakota-marshall commented 1 year ago

@mamiu Thanks for the info! In that case ill look at switching over to doing that and using metallb for my other services that need a LoadBalancer.

omidraha commented 10 months ago

I have same issue, I’m using k3s, I have disabled traefik (but not service-lb) and using NGINX Ingress Controller with default service-lb (klipper-lb).

Info

"controller": {
    "kind": "DaemonSet",
    "allowSnippetAnnotations": True,
    "service": {
        "externalTrafficPolicy": "Local",
    },
    "config": {
        "enable-real-ip": True,
        "use-forwarded-headers": True,
        "compute-full-forwarded-for": True,
        "use-proxy-protocol": True,
        "proxy-add-original-uri-header": True,
        "forwarded-for-header": "proxy_protocol",
        "real-ip-header": "proxy_protocol",
    },
},

Info

kubectl logs -n apps-dev  ingress-nginx-controller

" while reading PROXY protocol, client: 10.42.0.10, server: 0.0.0.0:80
2024/01/09 19:28:10 [error] 98#98: *2465 broken header: "GET / HTTP/1.1
Host: example.com
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/119.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8
Accept-Language: en-CA,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive
Upgrade-Insecure-Requests: 1

" while reading PROXY protocol, client: 10.42.0.10, server: 0.0.0.0:80

Info

kubectl describe services -n apps-dev ingress-nginx-dev-c34ab985-controlle

Name:                     ingress-nginx-dev-c34ab985-controller
Namespace:                apps-dev
Labels:                   app.kubernetes.io/component=controller
                          app.kubernetes.io/instance=ingress-nginx-dev-c34ab985
                          app.kubernetes.io/managed-by=Helm
                          app.kubernetes.io/name=ingress-nginx
                          app.kubernetes.io/part-of=ingress-nginx
                          app.kubernetes.io/version=1.9.4
                          helm.sh/chart=ingress-nginx-4.8.3
Annotations:              meta.helm.sh/release-name: ingress-nginx-dev-c34ab985
                          meta.helm.sh/release-namespace: apps-dev
Selector:                 app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-dev-c34ab985,app.kubernetes.io/name=ingress-nginx
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       10.43.172.22
IPs:                      10.43.172.22
LoadBalancer Ingress:     100.100.100.100
Port:                     http  80/TCP
TargetPort:               http/TCP
NodePort:                 http  30861/TCP
Endpoints:                10.42.0.11:80
Port:                     https  443/TCP
TargetPort:               https/TCP
NodePort:                 https  31713/TCP
Endpoints:                10.42.0.11:443
Session Affinity:         None
External Traffic Policy:  Local
HealthCheck NodePort:     30448
Events:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  EnsuringLoadBalancer  113s  service-controller  Ensuring load balancer
  Normal  AppliedDaemonSet      113s                      Applied LoadBalancer DaemonSet kube-system/svclb-ingress-nginx-dev-c34ab985-controller-8873439e
  Normal  UpdatedLoadBalancer   83s                       Updated LoadBalancer with new IPs: [] -> [100.100.100.100]

Name:              ingress-nginx-dev-c34ab985-controller-admission
Namespace:         apps-dev
Labels:            app.kubernetes.io/component=controller
                   app.kubernetes.io/instance=ingress-nginx-dev-c34ab985
                   app.kubernetes.io/managed-by=Helm
                   app.kubernetes.io/name=ingress-nginx
                   app.kubernetes.io/part-of=ingress-nginx
                   app.kubernetes.io/version=1.9.4
                   helm.sh/chart=ingress-nginx-4.8.3
Annotations:       meta.helm.sh/release-name: ingress-nginx-dev-c34ab985
                   meta.helm.sh/release-namespace: apps-dev
Selector:          app.kubernetes.io/component=controller,app.kubernetes.io/instance=ingress-nginx-dev-c34ab985,app.kubernetes.io/name=ingress-nginx
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.43.38.162
IPs:               10.43.38.162
Port:              https-webhook  443/TCP
TargetPort:        webhook/TCP
Endpoints:         10.42.0.11:8443
Session Affinity:  None
Events:            <none>

Info

kubectl describe  pods   -n kube-system svclb-ingress-nginx-dev-c34ab985-controller-8873439e-xv9tx

Name:             svclb-ingress-nginx-dev-c34ab985-controller-8873439e-xv9tx
Namespace:        kube-system
Priority:         0
Service Account:  svclb
Node:             ip-10-10-1-110/10.10.1.110
Start Time:       Tue, 09 Jan 2024 11:22:59 -0800
Labels:           app=svclb-ingress-nginx-dev-c34ab985-controller-8873439e
                  controller-revision-hash=78c594c45
                  pod-template-generation=1
                  svccontroller.k3s.cattle.io/svcname=ingress-nginx-dev-c34ab985-controller
                  svccontroller.k3s.cattle.io/svcnamespace=apps-dev
Annotations:      <none>
Status:           Running
IP:               10.42.0.10
IPs:
  IP:           10.42.0.10
Controlled By:  DaemonSet/svclb-ingress-nginx-dev-c34ab985-controller-8873439e
Containers:
  lb-tcp-80:
    Container ID:   containerd://a4aaa9c9e86a1bd738d3fda1615953d95d3c269b3059eab568b2a9a236dca0a3
    Image:          rancher/klipper-lb:v0.4.4
    Image ID:       docker.io/rancher/klipper-lb@sha256:d6780e97ac25454b56f88410b236d52572518040f11d0db5c6baaac0d2fcf860
    Port:           80/TCP
    Host Port:      80/TCP
    State:          Running
      Started:      Tue, 09 Jan 2024 11:23:03 -0800
    Ready:          True
    Restart Count:  0
    Environment:
      SRC_PORT:    80
      SRC_RANGES:  0.0.0.0/0
      DEST_PROTO:  TCP
      DEST_PORT:   30861
      DEST_IPS:     (v1:status.hostIP)
    Mounts:        <none>
  lb-tcp-443:
    Container ID:   containerd://0ff2179299dac6def1471b680ae8c37ed352c94c0c5c5afccf4aee69c1c89f0b
    Image:          rancher/klipper-lb:v0.4.4
    Image ID:       docker.io/rancher/klipper-lb@sha256:d6780e97ac25454b56f88410b236d52572518040f11d0db5c6baaac0d2fcf860
    Port:           443/TCP
    Host Port:      443/TCP
    State:          Running
      Started:      Tue, 09 Jan 2024 11:23:04 -0800
    Ready:          True
    Restart Count:  0
    Environment:
      SRC_PORT:    443
      SRC_RANGES:  0.0.0.0/0
      DEST_PROTO:  TCP
      DEST_PORT:   31713
      DEST_IPS:     (v1:status.hostIP)
    Mounts:        <none>
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       True 
  ContainersReady             True 
  PodScheduled                True 
Volumes:                      <none>
QoS Class:                    BestEffort
Node-Selectors:               <none>
Tolerations:                  CriticalAddonsOnly op=Exists
                              node-role.kubernetes.io/control-plane:NoSchedule op=Exists
                              node-role.kubernetes.io/master:NoSchedule op=Exists
                              node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                              node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                              node.kubernetes.io/not-ready:NoExecute op=Exists
                              node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                              node.kubernetes.io/unreachable:NoExecute op=Exists
                              node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  8m53s  default-scheduler  Successfully assigned kube-system/svclb-ingress-nginx-dev-c34ab985-controller-8873439e-xv9tx to ip-10-10-1-110
  Normal  Pulling    8m52s  kubelet            Pulling image "rancher/klipper-lb:v0.4.4"
  Normal  Pulled     8m49s  kubelet            Successfully pulled image "rancher/klipper-lb:v0.4.4" in 3.335s (3.335s including waiting)
  Normal  Created    8m49s  kubelet            Created container lb-tcp-80
  Normal  Started    8m49s  kubelet            Started container lb-tcp-80
  Normal  Pulled     8m49s  kubelet            Container image "rancher/klipper-lb:v0.4.4" already present on machine
  Normal  Created    8m48s  kubelet            Created container lb-tcp-443
  Normal  Started    8m48s  kubelet            Started container lb-tcp-443

Info

kubectl get pods    -A -o wide 

NAMESPACE      NAME                                                         READY   STATUS    RESTARTS   AGE   IP           NODE             NOMINATED NODE   READINESS GATES
apps-dev       ingress-nginx-dev-c34ab985-controller-zh48d                  1/1     Running   0          36m   10.42.0.11   ip-10-10-1-110   <none>           <none>
kube-system    coredns-6799fbcd5-6r48p                                      1/1     Running   0          37m   10.42.0.2    ip-10-10-1-110   <none>           <none>
kube-system    metrics-server-67c658944b-mrdwm                              1/1     Running   0          37m   10.42.0.3    ip-10-10-1-110   <none>           <none>
kube-system    svclb-ingress-nginx-dev-c34ab985-controller-8873439e-xv9tx   2/2     Running   0          36m   10.42.0.10   ip-10-10-1-110   <none>           <none>

Useful notes.

pat-s commented 2 months ago

I am also running into this. Sad to see that this is still unaddressed after this long time, given it is such an elemental feature of a LB.

manuelbuil commented 2 months ago

I am also running into this. Sad to see that this is still unaddressed after this long time, given it is such an elemental feature of a LB.

Would this help? https://kubernetes.io/blog/2023/12/18/kubernetes-1-29-feature-loadbalancer-ip-mode-alpha/

goudunz1 commented 1 month ago

I am also running into this. Sad to see that this is still unaddressed after this long time, given it is such an elemental feature of a LB.

Would this help? https://kubernetes.io/blog/2023/12/18/kubernetes-1-29-feature-loadbalancer-ip-mode-alpha/

I have the same issue on my single node k3s. On my machine, svclb-traefik pod uses its own IP address before sending package to traefik, thus x-forward-for is always filled with the IP of svclb-traefik pod. I found a possible reason in k3s document:

When the ServiceLB Pod runs on a node that has an external IP configured, the node's external IP is populated into the Service's status.loadBalancer.ingress address list with ipMode: VIP. Otherwise, the node's internal IP is used.

So it seems like it's impossible to manually set the ipMode to 'Proxy'?

k3s-io / klipper-lb