Kubernetes API Service is not accessible by pods (bpf)

mtb-xt commented 1 year ago

Hi everyone, I have a very strange issue (I'm not even 100% sure it's calico's fault) with my cluster's API service address 10.96.0.1. No pods can reach that address, and the error I'm getting is 'operation not permitted'.

I'm running a k0s cluster with manually configured Calico in eBPF mode. I have 3 control-plane servers and 3 workers (in libvirt virtual machines, with bridged networking).

I have installed Calico using the operator, according to the manual. Everything was working fine, while I had 1 cp and 1 worker. After adding more controlplane nodes to the cluster, I started having this issue.

For example, with coredns -

[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/kubernetes: pkg/mod/k8s.io/client-go@v0.27.4/tools/cache/reflector.go:231: failed to list *v1.EndpointSlice: Get "https://10.96.0.1:443/apis/discovery.k8s.io/v1/endpointslices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: connect: operation not permitted

I'm not 100% sure, but the issue seem to start after I added an external loadbalancer to the API. k0s requires having an external loadbalancer in front of highly-available control planes, so I have a keepalived virtual ip + haproxy loadbalancer on every control plane - this virtual IP is configured in both k0s and in tigera operator kubernetes-services-endpoint configmap, like so:

apiVersion: v1
data:
  KUBERNETES_SERVICE_HOST: 172.21.35.10
  KUBERNETES_SERVICE_PORT: "6443"
kind: ConfigMap
metadata:
  name: kubernetes-services-endpoint
  namespace: tigera-operator

I've tried figuring out the issue myself, and what I found - is that for the service IP, there is no 'backend' in BPF nat:

[hawara@phoenix ~]$ kubectl exec -n calico-system calico-node-hxjzp  -c calico-node   -- calico-node -bpf nat dump
W1023 07:39:34.266002   16902 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
10.96.0.1 port 443 proto 6 id 2 count 0 local 0
10.101.221.237 port 9898 proto 6 id 6 count 2 local 1
    6:0 192.168.41.9:9898
    6:1 192.168.22.160:9898
172.29.39.100 port 9898 proto 6 id 6 count 2 local 1
    6:0 192.168.41.9:9898
    6:1 192.168.22.160:9898
192.168.41.0 port 32648 proto 6 id 6 count 2 local 1
    6:0 192.168.41.9:9898
    6:1 192.168.22.160:9898
172.29.39.100 port 9999 proto 6 id 8 count 2 local 1
    8:0 192.168.41.9:9999
    8:1 192.168.22.160:9999
10.98.112.143 port 443 proto 6 id 0 count 2 local 1
    0:0 192.168.41.63:5443
    0:1 192.168.22.147:5443
10.111.92.186 port 443 proto 6 id 5 count 0 local 0
172.21.35.22 port 32648 proto 6 id 6 count 2 local 1
    6:0 192.168.41.9:9898
    6:1 192.168.22.160:9898
172.21.35.22 port 30131 proto 6 id 8 count 2 local 1
    8:0 192.168.41.9:9999
    8:1 192.168.22.160:9999
10.97.243.18 port 5473 proto 6 id 1 count 2 local 0
    1:0 172.21.35.21:5473
    1:1 172.21.35.23:5473
10.101.221.237 port 9999 proto 6 id 8 count 2 local 1
    8:0 192.168.41.9:9999
    8:1 192.168.22.160:9999
255.255.255.255 port 30131 proto 6 id 8 count 2 local 1
    8:0 192.168.41.9:9999
    8:1 192.168.22.160:9999
255.255.255.255 port 32648 proto 6 id 6 count 2 local 1
    6:0 192.168.41.9:9898
    6:1 192.168.22.160:9898
10.96.0.10 port 53 proto 6 id 7 count 0 local 0
10.96.0.10 port 53 proto 17 id 3 count 0 local 0
10.96.0.10 port 9153 proto 6 id 4 count 0 local 0
192.168.41.0 port 30131 proto 6 id 8 count 2 local 1
    8:0 192.168.41.9:9999
    8:1 192.168.22.160:9999

Not sure what to do next, or how to debug this further.

Expected Behavior

Kubernetes service API is accessible.

Current Behavior

Pods can't talk to the API server internally - dial tcp 10.96.0.1:443: connect: operation not permitted

Possible Solution

Really no idea what causes it :(

Steps to Reproduce (for bugs)

Install k0s without CNI, add CNI manually via manifest to install tigera operator.
Configure HA controlplane
Observe pods not being able to connect to the API

Context

Your Environment

Calico version - v3.26.1
Orchestrator version (e.g. kubernetes, mesos, rkt): k0s 1.28.2
Operating System and version: Ubuntu 22.04 LTS with 6.2 kernel
Calico Installation CRD and additional configs - please see this gist

Thank you, and please let me know if you need any more details.

tomastigera commented 1 year ago

What implements the 172.21.35.10 keepalived virtual ip ? Where and how is it pointing to? How and what does it translate to when a client uses it?

the service IP, there is no 'backend'

If you list kubernetes endpoints, what do you get?

Pods can't talk to the API server internally - dial tcp 10.96.0.1:443: connect: operation not permitted

Definitely the result of not having any backends.

mtb-xt commented 1 year ago

What implements the 172.21.35.10 keepalived virtual ip ? Where and how is it pointing to? How and what does it translate to when a client uses it?

There is a keepalived daemon on each kubernetes controlplane node, one of the controlplanes gets the 172.21.35.10 IP. haproxy listens on all controlplanes on this ip, and then tcp forwards to one of the controllers.

that IP is used in k0s as the external IP address, I also use this address in my kubeconfig file (and it works).

The virtual IP points to one of the controlplane mac addresses:

hawara@phoenix:~$ arp -an
? (172.21.35.10) at 52:54:00:56:d5:06 [ether] on enp1s0
? (172.21.35.13) at 52:54:00:56:d5:06 [ether] on enp1s0
? (172.21.35.11) at 52:54:00:be:58:d0 [ether] on enp1s0

[hawara@phoenix ~]$ kubectl cluster-info
Kubernetes control plane is running at https://172.21.35.10:6443
CoreDNS is running at https://172.21.35.10:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

keepalived conf

root@sentinel-cp:/etc/keepalived# cat keepalived.conf
#
# Ansible managed
#

vrrp_instance VI_1 {
  state MASTER
  interface enp1s0
  virtual_router_id 52
  priority 255
  authentication {
    auth_type PASS
    auth_pass <bogus>
  }
  virtual_ipaddress {
    172.21.35.10/23
  }
}

the service IP, there is no 'backend'

If you list kubernetes endpoints, what do you get?
[hawara@phoenix ~]$ k get endpoints kubernetes
NAME         ENDPOINTS           AGE
kubernetes   172.21.35.10:6443   37d

[hawara@phoenix ~]$ k get endpointslices.discovery.k8s.io NAME ADDRESSTYPE PORTS ENDPOINTS AGE kubernetes IPv4 6443 172.21.35.10 37d



> 
> > Pods can't talk to the API server internally - dial tcp 10.96.0.1:443: connect: operation not permitted
> 
> Definitely the result of not having any backends.

Even if I try to manually specify each controlplane IP in `kubernetes` service endpoint, there are still no backends in bpf nat.

mtb-xt commented 1 year ago

I've added debug logs to Felix, it seems it really skips the endpoints of the kubernetes service:

2023-10-25 04:43:10.974 [DEBUG][519655] felix/syncer.go 549: Applying service service=default/kubernetes:https
2023-10-25 04:43:10.974 [DEBUG][519655] felix/syncer.go 865: bpf map writing NATKey{Proto:6 Addr:10.96.0.1 Port:443 SrcAddr:0.0.0.0/0}:NATValue{ID:2,Count:0,LocalCount:0,AffinityTimeout:0,Flags:{}}
2023-10-25 04:43:10.974 [DEBUG][519655] felix/syncer.go 371: applied a service default/kubernetes:https update: sinfo={id:2 count:0 localCount:0 svc:0xc000001800}

mtb-xt commented 1 year ago

@tomastigera I think I've found why this happens - default kubernetes endpoint by default has label endpointslice.kubernetes.io/skip-mirror: "true"

when this label is set, for some reason, felix doesn't detect the endpoint IP - look at the debug logs:

felix/syncer.go 528: Applying new state, 
{map[calico-apiserver/calico-api:apiserver:10.98.112.143:443/TCP calico-system/calico-typha:calico-typha:10.97.243.18:5473/TCP default/kubernetes:https:10.96.0.1:443/TCP kube-system/kube-dns:dns:10.96.0.10:53/UDP kube-system/kube-dns:dns-tcp:10.96.0.10:53/TCP kube-system/kube-dns:metrics:10.96.0.10:9153/TCP kube-system/metrics-server:https:10.111.92.186:443/TCP test/podinfo:grpc:10.101.221.237:9999/TCP test/podinfo:http:10.101.221.237:9898/TCP] 
map[calico-apiserver/calico-api:apiserver:[192.168.22.147:5443 192.168.41.63:5443] calico-system/calico-typha:calico-typha:[172.21.35.21:5473 172.21.35.23:5473] kube-system/kube-dns:dns:[192.168.114.121:53 192.168.22.164:53] kube-system/kube-dns:dns-tcp:[192.168.114.121:53 192.168.22.164:53] kube-system/kube-dns:metrics:[192.168.114.121:9153 192.168.22.164:9153] kube-system/metrics-server:https:[192.168.22.151:10250] test/podinfo:grpc:[192.168.114.124:9999 192.168.22.167:9999 192.168.41.9:9999] test/podinfo:http:[192.168.114.124:9898 192.168.22.167:9898 192.168.41.9:9898]] }

The second map - with the endpoint IPs doesn't contain default/kubernetes! even though the endpoint itself has the IP:

[hawara@phoenix secrent-repo]$ k get endpoints -o yaml
apiVersion: v1
items:
- apiVersion: v1
  kind: Endpoints
  metadata:
    creationTimestamp: "2023-09-17T11:02:15Z"
    labels:
      endpointslice.kubernetes.io/skip-mirror: "true"
    name: kubernetes
    namespace: default
    resourceVersion: "13585030"
    uid: 065a580d-7c2a-41dd-a84a-1c5e7d7b9c66
  subsets:
  - addresses:
    - ip: 172.21.35.10
    ports:
    - name: https
      port: 6443
      protocol: TCP
kind: List
metadata:
  resourceVersion: ""

if I manually create the endpointslice, then it appears in felix's debug endpoint map:

2023-10-25 05:41:04.595 [DEBUG][527820] felix/syncer.go 528: Applying new state, {map[calico-apiserver/calico-api:apiserver:10.98.112.143:443/TCP calico-system/calico-typha:calico-typha:10.97.243.18:5473/TCP default/kubernetes:https:10.96.0.1:443/TCP kube-system/kube-dns:dns:10.96.0.10:53/UDP kube-system/kube-dns:dns-tcp:10.96.0.10:53/TCP kube-system/kube-dns:metrics:10.96.0.10:9153/TCP kube-system/metrics-server:https:10.111.92.186:443/TCP test/podinfo:grpc:10.101.221.237:9999/TCP test/podinfo:http:10.101.221.237:9898/TCP] map[calico-apiserver/calico-api:apiserver:[192.168.22.147:5443 192.168.41.63:5443] calico-system/calico-typha:calico-typha:[172.21.35.21:5473 172.21.35.23:5473] default/kubernetes:https:[172.21.35.10:6443] kube-system/kube-dns:dns:[192.168.114.121:53 192.168.22.164:53] kube-system/kube-dns:dns-tcp:[192.168.114.121:53 192.168.22.164:53] kube-system/kube-dns:metrics:[192.168.114.121:9153 192.168.22.164:9153] kube-system/metrics-server:https:[192.168.22.151:10250] test/podinfo:grpc:[192.168.114.124:9999 192.168.22.167:9999 192.168.41.9:9999] test/podinfo:http:[192.168.114.124:9898 192.168.22.167:9898 192.168.41.9:9898]] }

But this is not a solution - that label to not sync to endpointslices is set by the kube api server by default, if I'm not mistaken. And I've checked - I have Felix config parameter bpfKubeProxyEndpointSlicesEnabled: false - even though it's false by default. It doesn't matter, if it's set or not - the IP doesn't appear in the endpoint list, unless an endpointslice is present.

As a workaround, I'll create the endpointslice manually, but I still need your help, thank you.

tomastigera commented 1 year ago

Hey @mtb-xt great debugging!

Afaict Endpoints processing in kube-proxy is deprecated (see for instance here we also deprecated using endpoints in our kube-proxy. We use the k8s provides frontend package. The option you mentioned above bpfKubeProxyEndpointSlicesEnabled is an cleanup omission, does not do anything and we will remove it.

This said, I am not sure if there is any other way then to configure your system so that it does not set this label endpointslice.kubernetes.io/skip-mirror: "true"

I wonder how does this behave with vanilla kubernetes, that is iptables dataplane and k8s provided kube-proxy. Was there any patch in k0s that would keep this functionality around?

mtb-xt commented 1 year ago

I don't understand this anymore, I've recreated my cluster, set the externalAddress - and lo and behold, endpoint object DOES NOT contain that label.

I did look at what happens inside k0s and inside kubernetes itself - it looks like there is a condition, when the apiserver will set that label - https://github.com/kubernetes/kubernetes/blob/fd5c40611257c694d2338960976726344e2b45e5/pkg/controlplane/reconcilers/instancecount.go#L82

And k0s does nothing specific - it just creates these endpoints. Maybe, somehow, sometimes, it is possible that the line in apiserver adds that label. I'm not sure, but I think the issue ca be closed - at least there will be an info for a google search.

zachfi commented 3 months ago

Does this mean that if the label is present it should be removed? I'm seeing similar behavior I believe. Calico is exporting the kubernetes service address, but not importing it.

tomastigera commented 3 months ago

Calico is exporting the kubernetes service address, but not importing it.

What do you mean by that?

zachfi commented 2 months ago

I was wrong and the situation resolved itself. Ignore me. I was concerned that nodes were not BGP importing the service address for kubernetes itself. Looking at the routing table confirmed this, but I checked back later and things have started working. Does calico accept all route advertisements if there are no filters in place?

projectcalico / calico