Using `externalTrafficPolicy=Local` on a k8s IPv6 service incorrectly rewrites client IP, hiding it from the destination pod

growse commented 3 years ago

(Originally posted to the k8s project at https://github.com/kubernetes/kubernetes/issues/102527. In short, @uablrek was unable to reproduce, suspecting that it wasn't a k8s issue but more likely a calico issue)

Expected Behavior

Traffic arriving at a pod through a LoadBalancer IPv6 SingleStack k8s service has its source IP preserved and visible to the pod.

Current Behavior

The source IP is currently re-written to be the cluster IP address of one of the cluster services. With an example nginx deployment behind a service configured with a loadbalancerIP of 2001:8b0:c8f:e8b1:beef:f00d::11, curling that IP address and observing the nginx access log shows that it sees a connection from fd5a:1111:1111::ffa1

fd5a:1111:1111::f7ff - - [04/Jun/2021:13:58:21 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.64.1" "-"

fd5a:1111:1111::f7ff is the cluster IP of a completely different service, and @uablrek noticed that it was the first ip address reported on the kube-ipvs0 interface on the node running the pod.

Setting the same thing up with IPv4, using a loadbalancerIP of 192.168.254.11, the source IP address of the client machine doing the curl is preserved. Here, 192.168.2.111 is the IP address of the client machine running curl.

192.168.2.111 - - [09/Jun/2021:08:30:25 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.64.1" "-"

The behaviour on the IPv6 service is what I'd expect if I set externalTrafficPolicy=Cluster. In fact, re-configuring the IPv6 service with externalTrafficPolicy=Cluster results in the exact same behaviour as externalTrafficPolicy=Local, whereas on IPv4, setting externalTrafficPolicy=Cluster behaves as expected - the client IP seen by the pod is that of the node.

It feels like some SNATting is going on here erroneously, and it's not obvious why this problem is only affecting IPv6 services.

Possible Solution

I'm not sure really what the root cause is here, so can't really suggest a solution.

Steps to Reproduce (for bugs)

Deploy k8s manifest:


---
apiVersion: v1
kind: Service
metadata:
name: test-service-source-ip-v4
namespace: default
labels:
k8s-app: test-service-source-ip
spec:
selector:
k8s-app: test-service-source-ip
type: LoadBalancer
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
loadBalancerIP: 192.168.254.11
externalTrafficPolicy: "Cluster"
ports:
- name: http-tcp
  protocol: TCP
  port: 80
---
apiVersion: v1
kind: Service
metadata:
name: test-service-source-ip-v6
namespace: default
labels:
k8s-app: test-service-source-ip
spec:
selector:
k8s-app: test-service-source-ip
type: LoadBalancer
ipFamilies:
- IPv6
ipFamilyPolicy: SingleStack
loadBalancerIP: 2001:8b0:c8f:e8b1:beef:f00d::11
externalTrafficPolicy: "Local"
ports:
- name: http-tcp
  protocol: TCP
  port: 80

apiVersion: apps/v1 kind: Deployment metadata: namespace: default name: test-service-source-ip labels: k8s-app: test-service-source-ip spec: replicas: 1 selector: matchLabels: k8s-app: test-service-source-ip template: metadata: labels: k8s-app: test-service-source-ip spec: containers:

name: test-service-source-ip image: nginx:1.20-alpine ports:

containerPort: 80 protocol: TCP


2. `curl` each loadbalancerIP address in turn while observing the nginx pod log
3. Note the client IP logged by nginx

Context

I have some applications that need to interact with an IPv6-only service, and they also do IP-based ACL (e.g. DNS servers). Hiding the client IP from the pod effectively means that it's impossible to do any ACL, or make any functionality decisions based on the client IP.

Your Environment

Calico version: 3.19.1. Mostly stock deployment straight from the release manifests, I set CALICO_IPV6POOL_CIDR to explicitly be the IPv6 pod subnet below. Calico also configured to advertise the serviceLoadBalancerIPs over BGP to the local off-cluster router.
Orchestrator version (e.g. kubernetes, mesos, rkt): k8s 1.21.1. Cluster built with kudeadm.
Subnets:
- Host subnet 192.168.2.0/24 & 2001:8b0:c8f:e8b::/64
- LB subnet 192.168.254.0/24 & 2001:8b0:c8f:e8b1:beef:f00d::/116
- Pod subnet: 10.244.0.0/16 & fd5a:5555:5555::/48
- Service subnet: 10.96.0.0/12 & fd5a:1111:1111::/112
Operating System and version: 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux

caseydavenport commented 3 years ago

@growse I think the next step here would be identifying the rule that is performing the SNAT / MASQ and going from there.

Assuming it's being done in iptables, you could run the following to see which rules are being hit:

iptables-save -c | grep MASQ
iptables-save -c | grep SNAT

growse commented 3 years ago

I feel like I may have led people on a wild-goose-chase here. The output from iptables-legacy-save led me to see that I had a slightly battered daemonset of ip-masq-agent running, although a version that also masqueraded IPv6 traffic too.

This was the culprit. Removing this daemonset and restarting the nodes now gives me a service that behaves as expected.

My bad!

projectcalico / calico