rancher / rancher

Complete container management platform
http://rancher.com
Apache License 2.0
23.53k stars 2.98k forks source link

Unable to provision K8s 1.19 cluster with firewalld enabled when using Calico, Canal, or Flannel #28840

Open aaronyeeski opened 4 years ago

aaronyeeski commented 4 years ago

What kind of request is this (question/bug/enhancement/feature request): Bug

Steps to reproduce (least amount of steps as possible): In Rancher, update rke-metadata-config to the following:

{
  "refresh-interval-minutes": "1440",
  "url": "https://raw.githubusercontent.com/Oats87/kontainer-driver-metadata/k8s-1-19-v2.5/data/data.json"
}

Refresh Kubernetes metadata

Prepare 1 node with Oracle Linux 7.7 by following the documentation: https://rancher.com/docs/rancher/v2.x/en/installation/options/firewall/ (Or use this AMI which has ports opened: ami-06dd5f94499093e3d)

In Rancher, add a v1.19.1-rancher1-1 custom cluster. 1 node, all roles using the node.

Result: Cluster is stuck provisioning: Screen Shot 2020-09-10 at 11 55 10 AM

Error in kubelet:

E0910 20:10:19.442169   25373 pod_workers.go:191] Error syncing pod c658b6ca-fbff-42e0-a3aa-f79e02f26a2c ("cattle-cluster-agent-987c678c-77rkl_cattle-system(c658b6ca-fbff-42e0-a3aa-f79e02f26a2c)"), skipping: failed to "StartContainer" for "cluster-register" with CrashLoopBackOff: "back-off 5m0s restarting failed container=cluster-register pod=cattle-cluster-agent-987c678c-77rkl_cattle-system(c658b6ca-fbff-42e0-a3aa-f79e02f26a2c)"
I0910 20:10:31.441910   25373 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 23ea099c02a5e01f00adc5e7ae2cc098b1dd3ff3fa2f4f9005574c38b8b8fe78
E0910 20:10:31.442379   25373 pod_workers.go:191] Error syncing pod c658b6ca-fbff-42e0-a3aa-f79e02f26a2c ("cattle-cluster-agent-987c678c-77rkl_cattle-system(c658b6ca-fbff-42e0-a3aa-f79e02f26a2c)"), skipping: failed to "StartContainer" for "cluster-register" with CrashLoopBackOff: "back-off 5m0s restarting failed container=cluster-register pod=cattle-cluster-agent-987c678c-77rkl_cattle-system(c658b6ca-fbff-42e0-a3aa-f79e02f26a2c)"
I0910 20:10:44.441995   25373 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 23ea099c02a5e01f00adc5e7ae2cc098b1dd3ff3fa2f4f9005574c38b8b8fe78
E0910 20:10:44.442565   25373 pod_workers.go:191] Error syncing pod c658b6ca-fbff-42e0-a3aa-f79e02f26a2c ("cattle-cluster-agent-987c678c-77rkl_cattle-system(c658b6ca-fbff-42e0-a3aa-f79e02f26a2c)"), skipping: failed to "StartContainer" for "cluster-register" with CrashLoopBackOff: "back-off 5m0s restarting failed container=cluster-register pod=cattle-cluster-agent-987c678c-77rkl_cattle-system(c658b6ca-fbff-42e0-a3aa-f79e02f26a2c)"
I0910 20:10:58.442061   25373 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 23ea099c02a5e01f00adc5e7ae2cc098b1dd3ff3fa2f4f9005574c38b8b8fe78
E0910 20:10:58.442582   25373 pod_workers.go:191] Error syncing pod c658b6ca-fbff-42e0-a3aa-f79e02f26a2c ("cattle-cluster-agent-987c678c-77rkl_cattle-system(c658b6ca-fbff-42e0-a3aa-f79e02f26a2c)"), skipping: failed to "StartContainer" for "cluster-register" with CrashLoopBackOff: "back-off 5m0s restarting failed container=cluster-register pod=cattle-cluster-agent-987c678c-77rkl_cattle-system(c658b6ca-fbff-42e0-a3aa-f79e02f26a2c)"

Other details that may be helpful:

Environment information

Cluster information

gz#14269

sangeethah commented 4 years ago

The error reported for the cluster is similar to https://github.com/rancher/rancher/issues/28836 . But in this case , we are not able to recover from this error.

aaronyeeski commented 4 years ago

This error is due to changes in port requirements needed in firewalld. Additional ports are needed for upgraded versions of network providers (canal, flannel, ect.)

This Oracle Linux image works with using k8s v1.18.x. On rancher\rancher:master-head version 198ec5b and rancher\rancher:v2.4.8 This image will work using k8s 1.19.x when firewalld is disabled.

Oats87 commented 4 years ago

This is an issue in any EL7 operating system where firewalld is running. This includes RHEL 7.x and CentOS 7.

The root appears to be caused by a change in where Calico places the Policy explicitly accepted packet rule. In Calico 3.16.x, this rule is placed at the end of the FORWARD chain, i.e.

-A FORWARD -m comment --comment "cali:S93hcgKJrXEqnTfs" -m comment --comment "Policy explicitly accepted packet." -m mark --mark 0x10000/0x10000 -j ACCEPT

whereas on earlier versions of Calico, it was on the cali-FORWARD chain:

-A cali-FORWARD -m comment --comment "cali:MH9kMp5aNICL-Olv" -m comment --comment "Policy explicitly accepted packet." -m mark --mark 0x10000/0x10000 -j ACCEPT

This change was implemented for: https://github.com/projectcalico/felix/pull/2424 which is an orthogonal issue to the current one at hand.

By appending this rule to the cali-FORWARD chain, traffic would automatically be accepted, and things "worked". Now that the Policy explicitly accepted packet is at the end of the FORWARD chain, there is a firewalld-inserted rule that can now blackhole traffic:

-A FORWARD -j REJECT --reject-with icmp-host-prohibited

which is inserted before the rest of the chain i.e.:

-A FORWARD -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -s 10.42.0.0/16 -j ACCEPT
-A FORWARD -d 10.42.0.0/16 -j ACCEPT
-A FORWARD -m comment --comment "cali:S93hcgKJrXEqnTfs" -m comment --comment "Policy explicitly accepted packet." -m mark --mark 0x10000/0x10000 -j ACCEPT

and thus, the traffic is never able to make it to the final ACCEPT rule as it is dropped instead.

Further investigation will need to be done to determine the best course of action to mitigate this, as Calico is advertised as "not working well with firewalld" in the first place.

Oats87 commented 4 years ago

One thing to note is that our documentation for firewalld rules to add never appears to have worked with clusters using only Flannel, for similar reasons.

dennisschroer commented 4 years ago

We also have this problem using iptables. After upgrading to Rancher 2.5.1 and kubernetes 1.19.1, DNS resolution and the overlay network stopped working. Stopping iptables on all nodes fixed the issue. We are using a custom RKE cluster on CentOS machines

Versions: Rancher: 2.5.1 Kubernetes: 1.19.2 iptables v1.4.21

cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

uname -r
3.10.0-1127.13.1.el7.x86_64

Some logs

bash-4.2$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = 7b85fbe9b4156cf64b768261d4ea6e3b
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:56868->192.168.3.221:53: read: no route to host
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:41408->192.168.3.220:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:46917->192.168.3.221:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:47695->192.168.3.220:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:57134->192.168.3.221:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:57679->192.168.3.221:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:52947->192.168.3.220:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:36609->192.168.3.220:53: read: no route to host

Also the test in https://rancher.com/docs/rancher/v2.x/en/troubleshooting/networking/#check-if-overlay-network-is-functioning-correctly outputed that none of the nodes could reach any of the other nodes

Oats87 commented 4 years ago

We also have this problem using iptables. After upgrading to Rancher 2.5.1 and kubernetes 1.19.1, DNS resolution and the overlay network stopped working. Stopping iptables on all nodes fixed the issue. We are using a custom RKE cluster on CentOS machines

Versions: Rancher: 2.5.1 Kubernetes: 1.19.2 iptables v1.4.21

cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

uname -r
3.10.0-1127.13.1.el7.x86_64

Some logs

bash-4.2$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = 7b85fbe9b4156cf64b768261d4ea6e3b
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:56868->192.168.3.221:53: read: no route to host
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:41408->192.168.3.220:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:46917->192.168.3.221:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:47695->192.168.3.220:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:57134->192.168.3.221:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:57679->192.168.3.221:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:52947->192.168.3.220:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:36609->192.168.3.220:53: read: no route to host

Also the test in https://rancher.com/docs/rancher/v2.x/en/troubleshooting/networking/#check-if-overlay-network-is-functioning-correctly outputed that none of the nodes could reach any of the other nodes

This is more or less expected behavior, and comes with using Calico v3.16.x+

syndr commented 4 years ago

@Oats87 -- forgive me for not being that familiar with Calico yet, but can you point me in the direction of info regarding Calico 3.16x. and that being expected behaviour?

Have firewalls turned off on all my nodes because of this, and it makes me kinda uncomfortable. Would like more info on what works in the security sense.

jzandbergen commented 3 years ago

I'm not sure if this is related but we ran into issues as wel with CentOS7 and iptables enabled. Our problem seemed that calico did not properly set up the routes. Some nodes had a missing route to the services network 10.43.0.0/16. So I manually set the route with ip route add 10.43.0.0/16 dev flannel.1.

jonathon2nd commented 3 years ago

I have run into this issue was well. While looking into it I see that this was updated: https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/node-requirements/

Some distributions of Linux derived from RHEL, including Oracle Linux, may have default firewall rules that block communication with Helm. We recommend disabling firewalld. For Kubernetes 1.19, firewalld must be turned off.

Is there some explanation on why this can not be fixed, or if there is something else that needs to be used to ensure security on the host? Sorry if I am missing something.

jonathon2nd commented 3 years ago

I was able to get a 1.19 cluster working. I use terraform with xenorchestra provider on our on prem pool. I created a new vm to with the intention of turning it into a template. I switched from centos 7 to centos 8 and then switched from firewalld to ufw. After making sure everything was good to go, I stopped the vm and turned it into a template.

Then in my terraform plan (is it called a plan?) I set this as part of the cloud-init portion

#cloud-init
runcmd:
...
 - sudo ip route add 10.43.0.0/16 dev flannel.1
 - sudo /sbin/iptables -t nat -C POSTROUTING -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN --wait
 - sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.5.2 ...
...

After this I got a working k8s 1.19 cluster with centos 8 hosts. Hope this helps someone until the issue is resolved.

syndr commented 3 years ago

If it works with iptables/ufw when you add the route, I'll take a look at that. Using firewalld would be nice, but if I have to configure an alternative that's better than just no firewall.

Keep hoping I can find more info on why flannel (I assume) and firewalld don't get along with each other.

jonathon2nd commented 3 years ago

OK, I am sure that the steps I am taking do fix this problem.

During my setup of the cluster I forgot a couple of things, so I made the changes and restarted the hosts, and the failure started happening again. The route was missing. I figured out that runcmd only runs on first time vm start up. After I added the following to my terraform plan

bootcmd:
 - sudo ip route add 10.43.0.0/16 dev flannel.1
 - sudo /sbin/iptables -t nat -C POSTROUTING -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN --wait

And redoing the cluster, it seems to work every time now. This happens to me every so often https://github.com/rancher/rancher/issues/28768 which caused me to think I was still having problems, but like they said in the thread if you wait long enough (20-30 minutes) things will start working.

andrew-landsverk-win commented 3 years ago

This is an ongoing issue, with the route missing and the firewalld errors with CentOS7 clusters. Is there any new guidance for this issue? Currently the "best" work around seems to be to disable firewalld.

jonathon2nd commented 3 years ago

For me it is not an option to deploy a k8s cluster without host firewall. My 'solution' so far:

dnf install ufw -y
systemctl disable --now firewalld.service
<add the ports you need to ufw, example>
sudo ufw allow 30000:32767/tcp
systemctl enable --now ufw.service

These commands are also needed, would be better to write these to the network scripts but I have not gotten to it. These will not persist after a reboot.

sudo ip route add 10.43.0.0/16 dev flannel.1
sudo /sbin/iptables -t nat -C POSTROUTING -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN --wait

This allows me to create a k8s cluster using centos 8.3, fully up to date. As well as k8s 1.19.6 There seems to be a fundamental issue with firewalld because of iptables. I don't understand why switching from firewalld to UFW works around this. I even setup a k3s cluster to see if that would work with firewalld, and it would not.

I have started testing out this cluster, adding storage classes (for this cluster, portworx), metallb, and other workloads. So far it is looking good.

Take a look at this as well: https://docs.docker.com/network/iptables/ This should help you with centos 7 as well.

finnzi commented 3 years ago

I had the same issue on CentOS 7 nodes (fully patched as of 8.1.2021) that I was installing a new RKE cluster on. If I enable masquerade on my default firewalld zone (sudo firewall-cmd --add-masquerade --permanent;sudo firewall-cmd --reload) my overlay network tests work as expected.

I have not added any route or anything else.

Hope this helps.

jonathon2nd commented 3 years ago

@finnzi This does look like it fixes the issue. I am able to deploy a cluster and deploy an app to do a quick test. I will redeploy the intended cluster with this change and start to build it out.

finnzi commented 3 years ago

This might be a bad idea....the support team at Rancher pointed out that this might masquerade the IP addresses of a container talking to another container....so you might want to apply this 'workaround' carefully and wait for a official solution ;-)

andrew-landsverk-win commented 3 years ago

Has anyone tried running 1.19 with iptables instead of firewalld?

jonathon2nd commented 3 years ago

@andrew-landsverk-win I tried that, and I was not able to get a better/working result. @finnzi Do you happen to have a link to Rancher stating that? Would help me with research into the problem more. So far I have not observed any issues with the cluster, though nothing is cutover from prod yet. Still testing.

finnzi commented 3 years ago

@andrew-landsverk-win I tried that, and I was not able to get a better/working result. @finnzi Do you happen to have a link to Rancher stating that? Would help me with research into the problem more. So far I have not observed any issues with the cluster, though nothing is cutover from prod yet. Still testing.

Sorry - don't have any public reply - however they state on the setup page for Rancher that with Kubernetes 1.19 firewalld should be disabled: https://rancher.com/docs/rancher/v2.x/en/installation/requirements/

finnzi commented 3 years ago

This might be a bad idea....the support team at Rancher pointed out that this might masquerade the IP addresses of a container talking to another container....so you might want to apply this 'workaround' carefully and wait for a official solution ;-)

Just FYI - was working with network policies the other day and ended up spending a lot of time debugging - my "workaround" NAT'ed all pod-to-pod communications with the node address the src pod was coming from (if the pods were not running on the same node). Be warned ;-)

sv5d commented 3 years ago

We were facing the problem described by @Oats87 above.

As we are not (yet) allowed to disable firewalld, we setup a workaround. As explained by @Oats87 the issue happens on iptable's FORWARD chain, where firewalld rules conflict with rules set by calico :

> sudo iptables -L FORWARD -v --line-numbers
Chain FORWARD (policy DROP 0 packets, 0 bytes)
num   pkts bytes target     prot opt in     out     source               destination
1      299 16801 cali-FORWARD  all  --  any    any     anywhere             anywhere             /* cali:wUHhoiAYhphO9Mso */
2      360 20430 KUBE-FORWARD  all  --  any    any     anywhere             anywhere             /* kubernetes forwarding rules */
3      360 20430 KUBE-SERVICES  all  --  any    any     anywhere             anywhere             ctstate NEW /* kubernetes service portals */
4      360 20430 DOCKER-USER  all  --  any    any     anywhere             anywhere
5      360 20430 DOCKER-ISOLATION-STAGE-1  all  --  any    any     anywhere             anywhere
6        0     0 ACCEPT     all  --  any    docker0  anywhere             anywhere             ctstate RELATED,ESTABLISHED
7        0     0 DOCKER     all  --  any    docker0  anywhere             anywhere
8        0     0 ACCEPT     all  --  docker0 !docker0  anywhere             anywhere
9        0     0 ACCEPT     all  --  docker0 docker0  anywhere             anywhere
10       0     0 ACCEPT     all  --  any    any     anywhere             anywhere             ctstate RELATED,ESTABLISHED
11       0     0 ACCEPT     all  --  lo     any     anywhere             anywhere
12     360 20430 FORWARD_direct  all  --  any    any     anywhere             anywhere
13     360 20430 FORWARD_IN_ZONES_SOURCE  all  --  any    any     anywhere             anywhere
14     360 20430 FORWARD_IN_ZONES  all  --  any    any     anywhere             anywhere
15     360 20430 FORWARD_OUT_ZONES_SOURCE  all  --  any    any     anywhere             anywhere
16     360 20430 FORWARD_OUT_ZONES  all  --  any    any     anywhere             anywhere
17       0     0 DROP       all  --  any    any     anywhere             anywhere             ctstate INVALID
18     360 20430 REJECT     all  --  any    any     anywhere             anywhere             reject-with icmp-host-prohibited
19       0     0 ACCEPT     all  --  any    any     10.42.0.0/16         anywhere
20       0     0 ACCEPT     all  --  any    any     anywhere             10.42.0.0/16
21       0     0 ACCEPT     all  --  any    any     anywhere             anywhere             /* cali:S93hcgKJrXEqnTfs */ /* Policy explicitly accepted packet. */ mark match 0x10000/0x10000

The problem is that rule 18 rejects all traffic so that rules 19 to 21 are never reached. As a consequence traffic coming from/going to 10.42.0.0/16 (i.e. the node/pod address space) is not routed by the node. Instead rule 18 is applied which rejects the traffic, causing clients to get a "no route to host" response (reject-with icmp-host-prohibited).

Our workaround is to replicate the rules added by calico to iptable's builtin FORWARD chain (19 to 21) in firewalld's FORWARD_direct chain which is traversed thanks to rule 12. This is done permanently with the following firewalld commands:

# Replicate rule 19 to FORWARD_direct#1
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD_direct 0 -s 10.42.0.0/16 -j ACCEPT
# Replicate rule 20 to FORWARD_direct#2
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD_direct 1 -d 10.42.0.0/16 -j ACCEPT
# Replicate rule 21 to FORWARD_direct#3
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD_direct 2 -m comment --comment "Policy explicitly accepted packet." -m mark --mark 0x10000/0x10000 -j ACCEPT

# Reload the configuration
sudo firewall-cmd --reload

Now when we list the content of the FORWARD_direct direct chain, we should have the following result:

> sudo iptables -L FORWARD_direct -nv --line-numbers
Chain FORWARD_direct (1 references)
num   pkts bytes target     prot opt in     out     source               destination
1        4   225 ACCEPT     all  --  *      *       10.42.0.0/16         0.0.0.0/0
2        0     0 ACCEPT     all  --  *      *       0.0.0.0/0            10.42.0.0/16
3        0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* Policy explicitly accepted packet. */ mark match 0x10000/0x10000

This way kubernetes packets are correctly routed and firewalld is still rejecting other packets not matching these rules.

Important:

sagarvelankar commented 3 years ago

Hi,

I solved my problem in CentOS 8 by creating a new firewalld zone for kubernetes pods and setting its target to ACCEPT. So, firewalld will accept packets going into POD SUBNET CIDR (ingress zone) and also packets coming out of POD SUBNET CIDR (egress zone)

Commands :

firewall-cmd --permanent --delete-zone=kubernetes_pods
firewall-cmd --permanent --new-zone=kubernetes_pods
firewall-cmd --permanent --zone=kubernetes_pods --set-target=ACCEPT
firewall-cmd --permanent --zone=kubernetes_pods --add-source=<POD SUBNET CIDR>
firewall-cmd --reload

firewall-cmd man :

       --permanent [--zone=zone] --set-target=target
           Set the target of a permanent zone.  target is one of: default, ACCEPT, DROP, REJECT

           default is similar to REJECT, but has special meaning in the following scenarios:

            1. ICMP explicitly allowed

               At the end of the zone's ruleset ICMP packets are explicitly allowed.

            2. forwarded packets follow the target of the egress zone

               In the case of forwarded packets, if the ingress zone uses default then whether or not the packet will be
               allowed is determined by the egress zone.

               For a forwarded packet that ingresses zoneA and egresses zoneB:

               ·   if zoneA's target is ACCEPT, DROP, or REJECT then the packet is accepted, dropped, or rejected
                   respectively.

               ·   if zoneA's target is default, then the packet is accepted, dropped, or rejected based on zoneB's target. If
                   zoneB's target is also default, then the packet will be rejected by firewalld's catchall reject.

            3. Zone drifting from source-based zone to interface-based zone

               This only applies if AllowZoneDrifting is enabled. See firewalld.conf(5).

               If a packet ingresses a source-based zone with a target of default, it may still enter an interface-based zone
               (including the default zone).

Versions :

firewall-cmd --version
0.8.2

uname -r
4.18.0-240.10.1.el8_3.x86_64

cat /etc/redhat-release
CentOS Linux release 8.3.2011

kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:28:09Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:20:00Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}

To see what is getting rejected by firewalld, use the below commands

firewall-cmd --set-log-denied=all
firewall-cmd --reload
dmesg | grep -i reject
bjetal commented 3 years ago

My setup (built with Kubespray using kubernetes 1.20 and calico 3.16.8), the primary problem appears to be these two rules (the last two rules in the FORWARD chain, the first set up by firewalld, the second from Calico):

REJECT     all  --  0.0.0.0/0            0.0.0.0/0            reject-with icmp-host-prohibited
ACCEPT     all  --  0.0.0.0/0            0.0.0.0/0            /* cali:S93hcgKJrXEqnTfs */ /* Policy explicitly accepted packet. */ mark match 0x10000/0x10000

Adding the same ACCEPT rule before the REJECT rule allows it work.

The problem I see with the solution proposed by @sagarvelankar is that it could result in packets that should be rejected according to network policy setup being accepted.

This is with calico using "Insert" as its FELIX_CHAININSERTMODE setting, so I am not clear why the rule is put at the end rather than earlier.

mohag commented 3 years ago

For Calico (E.g. a RKE clsuter for Rancher with mostly defaults), these rules work: (It allows all traffic to/from the Calico interfaces.

firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 99 -o cali+ -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 99 -i cali+ -j ACCEPT
firewall-cmd --direct --add-rule ipv4 filter FORWARD 99 -o cali+ -j ACCEPT
firewall-cmd --direct --add-rule ipv4 filter FORWARD 99 -i cali+ -j ACCEPT
jonathon2nd commented 3 years ago

OK, after a bit of running around and having a working cluster with UFW and no other trickery, we found that Calico policies provide everything we need to enforce security. I had just not dug into it enough.

What we did, create cluster with multi-interface workers, setting the interfaces according to https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/rke-clusters/custom-nodes/agent-options/#ip-address-options. This cluster has NO FIREWALL, no firewalld or ufw. Once the cluster comes up, we then setup Calico policies to protect the public interface. I got help from someone on the team with a much better understanding of networks and security to derive these policies. This if for k8s v1.20, centos 8.3, and docker v20.10.6

So yes, it is possible to have a secure production cluster without a host firewall on the workers. This went against all my instincts.

This may seem obvious to some, but I hope this can help others. I wonder if anyone has and insights/objections to this approach?

finnzi commented 3 years ago

OK, after a bit of running around and having a working cluster with UFW and no other trickery, we found that Calico policies provide everything we need to enforce security. I had just not dug into it enough.

What we did, create cluster with multi-interface workers, setting the interfaces according to https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/rke-clusters/custom-nodes/agent-options/#ip-address-options. This cluster has NO FIREWALL, no firewalld or ufw. Once the cluster comes up, we then setup Calico policies to protect the public interface. I got help from someone on the team with a much better understanding of networks and security to derive these policies. This if for k8s v1.20, centos 8.3, and docker v20.10.6

So yes, it is possible to have a secure production cluster without a host firewall on the workers. This went against all my instincts.

This may seem obvious to some, but I hope this can help others. I wonder if anyone has and insights/objections to this approach?

This is probably the "correct" way of going forward.

Do you mind sharing the Calico policies?

zamog commented 3 years ago

Changing the firewalld zone target to default can be a workaround calico-issuecommet

papanito commented 3 years ago

Disabling the firewall causes our setup to be non-CIS compliant as CIS requires

So what we tried is creating a separate firewalld zone with a policy of ACCEPT for the Pod CIDR

dennisschroer commented 3 years ago

Inspired by earlier comments we added the following saltstack state (https://docs.saltproject.io/en/latest/ref/states/all/salt.states.iptables.html#salt.states.iptables.delete):

iptables-FORWARD-remove-REJECT:
  iptables.delete:
    - chain: FORWARD
    - table: filter
    - jump: REJECT
    - reject-with: icmp-host-prohibited
    - save: False

I don't know the exact iptables command, but removing this line from the FORWARD chain seems to fix the issue. I hope it helps someone.

Edit: Command seems to be /usr/sbin/iptables --wait -t filter -D FORWARD --jump REJECT --reject-with icmp-host-prohibited. Use on own risk

scaronni commented 2 years ago

For Calico (E.g. a RKE clsuter for Rancher with mostly defaults), these rules work: (It allows all traffic to/from the Calico interfaces.

firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 99 -o cali+ -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 99 -i cali+ -j ACCEPT
firewall-cmd --direct --add-rule ipv4 filter FORWARD 99 -o cali+ -j ACCEPT
firewall-cmd --direct --add-rule ipv4 filter FORWARD 99 -i cali+ -j ACCEPT

Since direct rules are now deprecated, does anyone have an idea how to convert this set of direct rules (which worked perfectly for us) to a firewalld policy (https://firewalld.org/documentation/man-pages/firewalld.policies.html)?

We're attempting to translate the same to a firewalld policy that can be applied to el8+ systems with nftables.

scaronni commented 2 years ago

Or a rich rule (https://firewalld.org/documentation/man-pages/firewalld.richlanguage.html), for what matters.

scaronni commented 2 years ago
firewall-cmd --permanent --delete-zone=kubernetes_pods
firewall-cmd --permanent --new-zone=kubernetes_pods
firewall-cmd --permanent --zone=kubernetes_pods --set-target=ACCEPT
firewall-cmd --permanent --zone=kubernetes_pods --add-source=<POD SUBNET CIDR>
firewall-cmd --reload

Switching to this works as well, and it's easier to translate if you use linux-system-roles.firewall or similar things on Red Hat based systems. This is what we have in ansible:

firewall:

  - zone: kubernetes-pods
    state: present
    permanent: true

  - zone: kubernetes-pods
    target: ACCEPT
    source: "{{ rancher_overlay_network_cidr }}"
    state: enabled
    permanent: true

Works for both iptables (el7) and nftables (el8+/fedora).

gunther788 commented 2 years ago
firewall-cmd --permanent --delete-zone=kubernetes_pods
firewall-cmd --permanent --new-zone=kubernetes_pods
firewall-cmd --permanent --zone=kubernetes_pods --set-target=ACCEPT
firewall-cmd --permanent --zone=kubernetes_pods --add-source=<POD SUBNET CIDR>
firewall-cmd --reload

Switching to this works as well, and it's easier to translate if you use linux-system-roles.firewall or similar things on Red Hat based systems. This is what we have in ansible:

firewall:

  - zone: kubernetes-pods
    state: present
    permanent: true

  - zone: kubernetes-pods
    target: ACCEPT
    source: "{{ rancher_overlay_network_cidr }}"
    state: enabled
    permanent: true

Works for both iptables (el7) and nftables (el8+/fedora).

As one replaces el7 nodes with el8/9 in a cluster, one by one, this last bit about using the same configuration for all becomes all that more vital.

Side note: before we resolved it in this very way, we had suspected Felix to maybe recognise the backend improperly. See FELIX_IPTABLESBACKEND with Legacy, NFT, Auto at https://projectcalico.docs.tigera.io/archive/v3.19/reference/felix/configuration. Our first attempt of adding FelixConfiguration resources per node (node.<nodename>) didn't do anything, as the environment variables have higher precedence, but luckily changing FELIX_LOGSEVERITYSCREEN to Info already revealed felix merrily acting on nft where appropriate. What really mattered was a reboot to clean out all the various attempts of the migration.

Long story short: enable some verbosity on the calico-node daemonset via the environment variables, and make sure to reboot the node when making significant changes.

Sunshine-JamesHu commented 2 years ago

如果并不是firewalld的重度用户可以参考我的办法,使用ufw替换掉firewalld,我替换之后整个集群运行正常。

yum install epel-release -y

systemctl stop firewalld
systemctl disable firewalld

yum install ufw -y
systemctl enable --now ufw.service

ufw default allow
ufw enable -y

ufw allow 22/tcp

ufw allow 10250/tcp
ufw allow 443/tcp
ufw allow 80/tcp
ufw allow 6443/tcp
ufw allow 8472/udp

ufw allow 6379/tcp
ufw allow 30000:49999/tcp
ufw allow 30000:49999/udp
ufw allow 9099/tcp
ufw allow 10254/tcp
ufw allow 19796/tcp
ufw allow 9796/tcp
ufw allow 4789/udp
ufw allow 6783:6784/udp
ufw allow 6783/tcp
ufw allow 2376/tcp
ufw allow 2379:2380/tcp

ufw default deny
ufw reload

systemctl restart docker

ufw status