Open liorfranko opened 4 years ago
Does anyone have any issues/successes when working with Calico security policy and NodeLocal DNSCache? I know that the NodeLocal DNSCache makes changes to the iptables, maybe there is a conflict with Calico?
@liorfranko I'm not super familiar with the node-local DNS cache. How is it deployed? Is it a host-networked pod? What requests specifically are you seeing being blocked?
You could also try removing your policy or creating an "allow all" policy to make sure it's policy that is blocking the requests and not something else.
Alongside the regular KubeDNS SVC, there is a daemonset of DNS pods on each node. During the deployment on the NodeLocal DNSCache, the pod manipulate the iptables of the nodes and "hijack" the DNS queries. It then, either respond from a local cache or queries the KubeDNS SVC on behalf of the pod.
I know that hijacking the DNS queries, is a security breach, but it's a K8S official feature.
@liorfranko can you find the exact rule that is blocking the traffic? e.g., with itpables-save -c
to view which rules are / are not getting hit?
Enforce the cluster using Calico GlobalNetworkPolicy
Could you also share the GNP that you created? It's possible that this is "working as expected" if your policy selects the local DNS pods and doesn't allow the necessaray traffic.
This is the configuration that works:
[root@raor-kmb01 liorf]# kubectl -n kube-system describe svc kube-dns
Name: kube-dns
Namespace: kube-system
Labels: addonmanager.kubernetes.io/mode=EnsureExists
k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=CoreDNS
security_role=dc
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"EnsureExists","k8s-app":"kub...
Selector: k8s-app=kube-dns
Type: ClusterIP
IP: 10.48.60.4
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 10.87.209.76:53,10.87.212.38:53,10.87.212.62:53 + 2 more...
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 10.87.209.76:53,10.87.212.38:53,10.87.212.62:53 + 2 more...
Session Affinity: None
Events: <none>
[root@raor-kmb01 liorf]#
[root@raor-kmb01 liorf]# kubectl -n kube-system get pods -o wide -l security_role=dc
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-84b58dd875-k8p2s 1/1 Running 0 28d 10.87.209.76 rapr-knb403 <none> <none>
coredns-84b58dd875-sd5vv 1/1 Running 0 28d 10.87.221.7 rapr-knb402 <none> <none>
coredns-84b58dd875-xr8pb 1/1 Running 0 28d 10.87.212.62 rapr-knb404 <none> <none>
coredns-84b58dd875-z2vnx 1/1 Running 0 28d 10.87.221.23 rapr-knb402 <none> <none>
coredns-84b58dd875-zkss4 1/1 Running 0 28d 10.87.212.38 rapr-knb404 <none> <none>
[root@raor-kmb01 liorf]#
[root@raor-kmb01 liorf]# calicoctl get gnp allow-cluster-dns-egress -o yaml
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
creationTimestamp: "2020-04-06T15:49:01Z"
name: allow-cluster-dns-egress
resourceVersion: "289083783"
uid: 2275f35a-781e-11ea-8d3f-6c96cfdd9a83
spec:
egress:
- action: Allow
destination:
ports:
- 53
selector: security_role == 'dc'
protocol: UDP
source: {}
- action: Allow
destination:
ports:
- 53
selector: security_role == 'dc'
protocol: TCP
source: {}
order: 110
types:
- Egress
[root@raor-kmb01 liorf]# calicoctl get gnp allow-cluster-dns-ingress -o yaml
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
creationTimestamp: "2020-04-06T15:49:01Z"
name: allow-cluster-dns-ingress
resourceVersion: "289083784"
uid: 2282ced2-781e-11ea-8d3f-6c96cfdd9a83
spec:
ingress:
- action: Allow
destination:
ports:
- 53
selector: security_role == 'dc'
protocol: UDP
source: {}
- action: Allow
destination:
ports:
- 53
selector: security_role == 'dc'
protocol: TCP
source: {}
order: 120
selector: security_role == 'dc'
types:
- Ingress
Here are the iptables rules:
:cali-po-_5m-r2tA7lULiAKgDJYp - [0:0]
[0:0] -A cali-po-_5m-r2tA7lULiAKgDJYp -p udp -m comment --comment "cali:TlFil7fjAMGcv5Q5" -m set --match-set cali40s:8ossrPQLjDgMAY-ksqdt5w_ dst -m multiport --dports 53 -j MARK --set-xmark 0x10000/0x10000
[0:0] -A cali-po-_5m-r2tA7lULiAKgDJYp -m comment --comment "cali:gIhEgQMSynKo-7bZ" -m mark --mark 0x10000/0x10000 -j RETURN
[0:0] -A cali-po-_5m-r2tA7lULiAKgDJYp -p tcp -m comment --comment "cali:cHcDw9oz_M13WRhm" -m set --match-set cali40s:8ossrPQLjDgMAY-ksqdt5w_ dst -m multiport --dports 53 -j MARK --set-xmark 0x10000/0x10000
[0:0] -A cali-po-_5m-r2tA7lULiAKgDJYp -m comment --comment "cali:ByOGEfuwNe7-t3Rh" -m mark --mark 0x10000/0x10000 -j RETURN
:cali-po-_5m-r2tA7lULiAKgDJYp - [0:0]
[0:0] -A cali-po-_5m-r2tA7lULiAKgDJYp -p udp -m comment --comment "cali:TlFil7fjAMGcv5Q5" -m set --match-set cali40s:8ossrPQLjDgMAY-ksqdt5w_ dst -m multiport --dports 53 -j MARK --set-xmark 0x10000/0x10000
[0:0] -A cali-po-_5m-r2tA7lULiAKgDJYp -m comment --comment "cali:gIhEgQMSynKo-7bZ" -m mark --mark 0x10000/0x10000 -j RETURN
[0:0] -A cali-po-_5m-r2tA7lULiAKgDJYp -p tcp -m comment --comment "cali:cHcDw9oz_M13WRhm" -m set --match-set cali40s:8ossrPQLjDgMAY-ksqdt5w_ dst -m multiport --dports 53 -j MARK --set-xmark 0x10000/0x10000
[0:0] -A cali-po-_5m-r2tA7lULiAKgDJYp -m comment --comment "cali:ByOGEfuwNe7-t3Rh" -m mark --mark 0x10000/0x10000 -j RETURN
:cali-po-_5m-r2tA7lULiAKgDJYp - [0:0]
This is works perfect before the NodeLocal DNSCache.
Adding the NodeLocal DNSCache, just add a cache pod on each node.
[root@raor-kmb01 liorf]# kubectl -n kube-system get pods -o wide -l security_role=dc
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-84b58dd875-k8p2s 1/1 Running 0 28d 10.87.209.76 rapr-knb403 <none> <none>
coredns-84b58dd875-sd5vv 1/1 Running 0 28d 10.87.221.7 rapr-knb402 <none> <none>
coredns-84b58dd875-xr8pb 1/1 Running 0 28d 10.87.212.62 rapr-knb404 <none> <none>
coredns-84b58dd875-z2vnx 1/1 Running 0 28d 10.87.221.23 rapr-knb402 <none> <none>
coredns-84b58dd875-zkss4 1/1 Running 0 28d 10.87.212.38 rapr-knb404 <none> <none>
node-local-dns-57z67 1/1 Running 0 18m 10.48.56.10 rapr-knb402 <none> <none>
node-local-dns-76mqt 1/1 Running 0 18m 10.48.57.11 raor-kmb02 <none> <none>
node-local-dns-crv7z 1/1 Running 0 18m 10.48.56.9 rapr-knb401 <none> <none>
node-local-dns-gkz4p 1/1 Running 0 18m 10.48.56.11 rapr-knb403 <none> <none>
node-local-dns-p86bs 1/1 Running 0 18m 10.48.57.12 raor-kmb01 <none> <none>
node-local-dns-pr47t 1/1 Running 0 18m 10.48.57.10 raor-kmb03 <none> <none>
node-local-dns-rhxsp 1/1 Running 0 18m 10.48.56.8 rapr-knb400 <none> <none>
node-local-dns-w5srg 1/1 Running 0 18m 10.48.56.12 rapr-knb404 <none> <none>
Now, each node-local-dns pod "hijack" the DNS requests and either respond from cache or send them to the kube-dns SVC and then respond.
The blocked traffic is from the application pods to the kube-dns SVC ip.
Same issue happend to me. But I have no GlobalNetworkPolicy.
Noo I am sorry, my mistake. I change kube-proxy mode last week a did not change config of node local dns.
Sorry
I think there are some incompatibilities with the node-local DNS cache's use of NOTRACK iptables rules, so keeping this issue to track making Calico work with node-local DNS.
I have GlobalNetworkPolicy and the problem is happens to me aswell This is the policies I have used:
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
name: ingress-k8s-masters
spec:
selector: has(node-role.kubernetes.io/master)
# This rule allows ingress to the Kubernetes API server.
ingress:
- action: Allow
protocol: TCP
destination:
ports:
# kube API server
- 6443
- 8443
- 53 # DNS
- 9100 # prometheus-node-exporter
# metrics-server
- 443
- 9443
- action: Allow
protocol: UDP
destination:
ports:
- 53 # DNS
- action: Allow
destination:
nets:
- 127.0.0.1/32
- action: Allow
protocol: TCP
source:
selector: has(node-role.kubernetes.io/master)
destination:
ports:
- 2380
- 10250
Reported to k8s: https://github.com/kubernetes/kubernetes/issues/98758
@kfirfer we're missing part of the puzzle here:
@fasaxc
Yes Im using automatic host endpoints:
vostro@dev101:~/code/kfirfer/helm-charts$ calicoctl get heps -owide
NAME NODE INTERFACE IPS PROFILES
nuc01-auto-hep nuc01 * 192.168.200.101,172.16.207.64 projectcalico-default-allow
nuc02-auto-hep nuc02 * 192.168.200.102,172.16.137.128 projectcalico-default-allow
nuc03-auto-hep nuc03 * 192.168.200.103,172.16.206.194 projectcalico-default-allow
I dont have egress GlobalNetworkPolicy rules This is my GlobalNetworkPolicy:
vostro@dev101:~/code/kfirfer/helm-charts$ kubectl get globalnetworkpolicies.crd.projectcalico.org default.ingress-k8s-masters -o yaml
apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
annotations:
projectcalico.org/metadata: '{"uid":"e6cfb3f2-ccbb-4302-9c1a-c49d11b3d22f","creationTimestamp":"2021-02-09T22:57:57Z"}'
creationTimestamp: "2021-02-09T22:57:58Z"
generation: 1
managedFields:
- apiVersion: crd.projectcalico.org/v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:projectcalico.org/metadata: {}
f:spec:
.: {}
f:ingress: {}
f:selector: {}
f:types: {}
manager: Go-http-client
operation: Update
time: "2021-02-09T22:57:58Z"
name: default.ingress-k8s-masters
resourceVersion: "10777643"
uid: e6cfb3f2-ccbb-4302-9c1a-c49d11b3d22f
spec:
ingress:
- action: Allow
destination:
ports:
- 6443
- 8443
- 53
- 9100
- 443
- 9443
- 7472
protocol: TCP
source: {}
- action: Allow
destination:
ports:
- 53
protocol: UDP
source: {}
- action: Allow
destination:
nets:
- 127.0.0.1/32
source: {}
- action: Allow
destination:
ports:
- 2380
- 10249
- 10250
- 10251
protocol: TCP
source:
selector: has(node-role.kubernetes.io/master)
selector: has(node-role.kubernetes.io/master)
types:
- Ingress
I ran into the same issue. It appears to me to be a problem because nodelocaldns does some unusual things with its network, in particular using a "link local" IP address instead of the normal Pod IP address. This was actually causing two separate issues:
Logging of the dropped packets showed that in both cases the link local IP address was showing up as the destination (and in the case of the health check, the source as well). To work around this, I added the following to a GlobalNetworkPolicy with default selector:
- action: Allow
metadata:
annotations:
traffic: nodelocaldns UDP DNS
protocol: UDP
destination:
ports: [53]
nets: [169.254.25.10/32]
- action: Allow
metadata:
annotations:
traffic: nodelocaldns TCP DNS
protocol: TCP
destination:
ports: [53]
nets: [169.254.25.10/32]
- action: Allow
metadata:
annotations:
traffic: nodelocaldns internal/health check
destination:
nets: [169.254.25.10/32]
source:
nets: [169.254.25.10/32]
169.254.25.10 is the the link local IP being used by nodelocaldns in my installation (and I believe the default).
I realized as I Iooked further into Calico that this is a problem at least in part because the nodelocaldns pod runs with hostNetwork: true
). This prevents Calico from generating pod specific firewall rules for it.
I came up with another, possibly better solution for this issue: set the FELIX_CHAININSERTMODE
to Append
. Nodelocaldns installs its own IPTables rules, but they get hidden by Calico when it keeps its rules first in the filter table's INPUT chain.
NOTE: this does not fix the problem with the health check. My preference there is some way of allowing all traffic on the loopback interface to pass.
For what it's worth, policy that attempts to apply to DNS traffic needs to work differently with node local DNS.
IIRC from the last time I looked at this:
After enabling the NodeLocal DNSCache feature, requests from pods to kube-dns SVC are getting blocked by Calico Policy. The NodeLocal DNS pods are deployed exactly with the same labels as the coredns pods
Steps to Reproduce (for bugs)
Your Environment