Open aaronyeeski opened 4 years ago
The error reported for the cluster is similar to https://github.com/rancher/rancher/issues/28836 . But in this case , we are not able to recover from this error.
This error is due to changes in port requirements needed in firewalld. Additional ports are needed for upgraded versions of network providers (canal, flannel, ect.)
This Oracle Linux image works with using k8s v1.18.x. On rancher\rancher:master-head
version 198ec5b
and rancher\rancher:v2.4.8
This image will work using k8s 1.19.x when firewalld is disabled.
This is an issue in any EL7 operating system where firewalld
is running. This includes RHEL 7.x and CentOS 7.
The root appears to be caused by a change in where Calico places the Policy explicitly accepted packet
rule. In Calico 3.16.x
, this rule is placed at the end of the FORWARD
chain, i.e.
-A FORWARD -m comment --comment "cali:S93hcgKJrXEqnTfs" -m comment --comment "Policy explicitly accepted packet." -m mark --mark 0x10000/0x10000 -j ACCEPT
whereas on earlier versions of Calico, it was on the cali-FORWARD
chain:
-A cali-FORWARD -m comment --comment "cali:MH9kMp5aNICL-Olv" -m comment --comment "Policy explicitly accepted packet." -m mark --mark 0x10000/0x10000 -j ACCEPT
This change was implemented for: https://github.com/projectcalico/felix/pull/2424 which is an orthogonal issue to the current one at hand.
By appending this rule to the cali-FORWARD
chain, traffic would automatically be accepted, and things "worked". Now that the Policy explicitly accepted packet
is at the end of the FORWARD
chain, there is a firewalld
-inserted rule that can now blackhole traffic:
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
which is inserted before the rest of the chain i.e.:
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -s 10.42.0.0/16 -j ACCEPT
-A FORWARD -d 10.42.0.0/16 -j ACCEPT
-A FORWARD -m comment --comment "cali:S93hcgKJrXEqnTfs" -m comment --comment "Policy explicitly accepted packet." -m mark --mark 0x10000/0x10000 -j ACCEPT
and thus, the traffic is never able to make it to the final ACCEPT
rule as it is dropped instead.
Further investigation will need to be done to determine the best course of action to mitigate this, as Calico is advertised as "not working well with firewalld" in the first place.
One thing to note is that our documentation for firewalld rules to add never appears to have worked with clusters using only Flannel, for similar reasons.
We also have this problem using iptables. After upgrading to Rancher 2.5.1 and kubernetes 1.19.1, DNS resolution and the overlay network stopped working. Stopping iptables on all nodes fixed the issue. We are using a custom RKE cluster on CentOS machines
Versions: Rancher: 2.5.1 Kubernetes: 1.19.2 iptables v1.4.21
cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
uname -r
3.10.0-1127.13.1.el7.x86_64
Some logs
bash-4.2$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = 7b85fbe9b4156cf64b768261d4ea6e3b
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:56868->192.168.3.221:53: read: no route to host
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:41408->192.168.3.220:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:46917->192.168.3.221:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:47695->192.168.3.220:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:57134->192.168.3.221:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:57679->192.168.3.221:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:52947->192.168.3.220:53: i/o timeout
[ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:36609->192.168.3.220:53: read: no route to host
Also the test in https://rancher.com/docs/rancher/v2.x/en/troubleshooting/networking/#check-if-overlay-network-is-functioning-correctly outputed that none of the nodes could reach any of the other nodes
We also have this problem using iptables. After upgrading to Rancher 2.5.1 and kubernetes 1.19.1, DNS resolution and the overlay network stopped working. Stopping iptables on all nodes fixed the issue. We are using a custom RKE cluster on CentOS machines
Versions: Rancher: 2.5.1 Kubernetes: 1.19.2 iptables v1.4.21
cat /etc/os-release NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7" HOME_URL="https://www.centos.org/" BUG_REPORT_URL="https://bugs.centos.org/" CENTOS_MANTISBT_PROJECT="CentOS-7" CENTOS_MANTISBT_PROJECT_VERSION="7" REDHAT_SUPPORT_PRODUCT="centos" REDHAT_SUPPORT_PRODUCT_VERSION="7" uname -r 3.10.0-1127.13.1.el7.x86_64
Some logs
bash-4.2$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns .:53 [INFO] plugin/reload: Running configuration MD5 = 7b85fbe9b4156cf64b768261d4ea6e3b CoreDNS-1.7.0 linux/amd64, go1.14.4, f59c03d CoreDNS-1.7.0 linux/amd64, go1.14.4, f59c03d [ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:56868->192.168.3.221:53: read: no route to host [ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:41408->192.168.3.220:53: i/o timeout [ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:46917->192.168.3.221:53: i/o timeout [ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:47695->192.168.3.220:53: i/o timeout [ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:57134->192.168.3.221:53: i/o timeout [ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:57679->192.168.3.221:53: i/o timeout [ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:52947->192.168.3.220:53: i/o timeout [ERROR] plugin/errors: 2 8992485821289819923.6920671835383681148. HINFO: read udp 10.42.3.126:36609->192.168.3.220:53: read: no route to host
Also the test in https://rancher.com/docs/rancher/v2.x/en/troubleshooting/networking/#check-if-overlay-network-is-functioning-correctly outputed that none of the nodes could reach any of the other nodes
This is more or less expected behavior, and comes with using Calico v3.16.x+
@Oats87 -- forgive me for not being that familiar with Calico yet, but can you point me in the direction of info regarding Calico 3.16x. and that being expected behaviour?
Have firewalls turned off on all my nodes because of this, and it makes me kinda uncomfortable. Would like more info on what works in the security sense.
I'm not sure if this is related but we ran into issues as wel with CentOS7 and iptables enabled. Our problem seemed that calico did not properly set up the routes. Some nodes had a missing route to the services network 10.43.0.0/16
. So I manually set the route with ip route add 10.43.0.0/16 dev flannel.1
.
I have run into this issue was well. While looking into it I see that this was updated: https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/node-requirements/
Some distributions of Linux derived from RHEL, including Oracle Linux, may have default firewall rules that block communication with Helm. We recommend disabling firewalld. For Kubernetes 1.19, firewalld must be turned off.
Is there some explanation on why this can not be fixed, or if there is something else that needs to be used to ensure security on the host? Sorry if I am missing something.
I was able to get a 1.19 cluster working. I use terraform with xenorchestra provider on our on prem pool. I created a new vm to with the intention of turning it into a template. I switched from centos 7 to centos 8 and then switched from firewalld to ufw. After making sure everything was good to go, I stopped the vm and turned it into a template.
Then in my terraform plan (is it called a plan?) I set this as part of the cloud-init portion
#cloud-init
runcmd:
...
- sudo ip route add 10.43.0.0/16 dev flannel.1
- sudo /sbin/iptables -t nat -C POSTROUTING -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN --wait
- sudo docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.5.2 ...
...
After this I got a working k8s 1.19 cluster with centos 8 hosts. Hope this helps someone until the issue is resolved.
If it works with iptables/ufw when you add the route, I'll take a look at that. Using firewalld would be nice, but if I have to configure an alternative that's better than just no firewall.
Keep hoping I can find more info on why flannel (I assume) and firewalld don't get along with each other.
OK, I am sure that the steps I am taking do fix this problem.
During my setup of the cluster I forgot a couple of things, so I made the changes and restarted the hosts, and the failure started happening again. The route was missing. I figured out that runcmd
only runs on first time vm start up.
After I added the following to my terraform plan
bootcmd:
- sudo ip route add 10.43.0.0/16 dev flannel.1
- sudo /sbin/iptables -t nat -C POSTROUTING -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN --wait
And redoing the cluster, it seems to work every time now. This happens to me every so often https://github.com/rancher/rancher/issues/28768 which caused me to think I was still having problems, but like they said in the thread if you wait long enough (20-30 minutes) things will start working.
This is an ongoing issue, with the route missing and the firewalld errors with CentOS7 clusters. Is there any new guidance for this issue? Currently the "best" work around seems to be to disable firewalld.
For me it is not an option to deploy a k8s cluster without host firewall. My 'solution' so far:
dnf install ufw -y
systemctl disable --now firewalld.service
<add the ports you need to ufw, example>
sudo ufw allow 30000:32767/tcp
systemctl enable --now ufw.service
These commands are also needed, would be better to write these to the network scripts but I have not gotten to it. These will not persist after a reboot.
sudo ip route add 10.43.0.0/16 dev flannel.1
sudo /sbin/iptables -t nat -C POSTROUTING -s 10.42.0.0/16 -d 10.42.0.0/16 -j RETURN --wait
This allows me to create a k8s cluster using centos 8.3, fully up to date. As well as k8s 1.19.6 There seems to be a fundamental issue with firewalld because of iptables. I don't understand why switching from firewalld to UFW works around this. I even setup a k3s cluster to see if that would work with firewalld, and it would not.
I have started testing out this cluster, adding storage classes (for this cluster, portworx), metallb, and other workloads. So far it is looking good.
Take a look at this as well: https://docs.docker.com/network/iptables/ This should help you with centos 7 as well.
I had the same issue on CentOS 7 nodes (fully patched as of 8.1.2021) that I was installing a new RKE cluster on. If I enable masquerade on my default firewalld zone (sudo firewall-cmd --add-masquerade --permanent;sudo firewall-cmd --reload) my overlay network tests work as expected.
I have not added any route or anything else.
Hope this helps.
@finnzi This does look like it fixes the issue. I am able to deploy a cluster and deploy an app to do a quick test. I will redeploy the intended cluster with this change and start to build it out.
This might be a bad idea....the support team at Rancher pointed out that this might masquerade the IP addresses of a container talking to another container....so you might want to apply this 'workaround' carefully and wait for a official solution ;-)
Has anyone tried running 1.19 with iptables instead of firewalld?
@andrew-landsverk-win I tried that, and I was not able to get a better/working result. @finnzi Do you happen to have a link to Rancher stating that? Would help me with research into the problem more. So far I have not observed any issues with the cluster, though nothing is cutover from prod yet. Still testing.
@andrew-landsverk-win I tried that, and I was not able to get a better/working result. @finnzi Do you happen to have a link to Rancher stating that? Would help me with research into the problem more. So far I have not observed any issues with the cluster, though nothing is cutover from prod yet. Still testing.
Sorry - don't have any public reply - however they state on the setup page for Rancher that with Kubernetes 1.19 firewalld should be disabled: https://rancher.com/docs/rancher/v2.x/en/installation/requirements/
This might be a bad idea....the support team at Rancher pointed out that this might masquerade the IP addresses of a container talking to another container....so you might want to apply this 'workaround' carefully and wait for a official solution ;-)
Just FYI - was working with network policies the other day and ended up spending a lot of time debugging - my "workaround" NAT'ed all pod-to-pod communications with the node address the src pod was coming from (if the pods were not running on the same node). Be warned ;-)
We were facing the problem described by @Oats87 above.
As we are not (yet) allowed to disable firewalld, we setup a workaround. As explained by @Oats87 the issue happens on iptable's FORWARD
chain, where firewalld rules conflict with rules set by calico :
> sudo iptables -L FORWARD -v --line-numbers
Chain FORWARD (policy DROP 0 packets, 0 bytes)
num pkts bytes target prot opt in out source destination
1 299 16801 cali-FORWARD all -- any any anywhere anywhere /* cali:wUHhoiAYhphO9Mso */
2 360 20430 KUBE-FORWARD all -- any any anywhere anywhere /* kubernetes forwarding rules */
3 360 20430 KUBE-SERVICES all -- any any anywhere anywhere ctstate NEW /* kubernetes service portals */
4 360 20430 DOCKER-USER all -- any any anywhere anywhere
5 360 20430 DOCKER-ISOLATION-STAGE-1 all -- any any anywhere anywhere
6 0 0 ACCEPT all -- any docker0 anywhere anywhere ctstate RELATED,ESTABLISHED
7 0 0 DOCKER all -- any docker0 anywhere anywhere
8 0 0 ACCEPT all -- docker0 !docker0 anywhere anywhere
9 0 0 ACCEPT all -- docker0 docker0 anywhere anywhere
10 0 0 ACCEPT all -- any any anywhere anywhere ctstate RELATED,ESTABLISHED
11 0 0 ACCEPT all -- lo any anywhere anywhere
12 360 20430 FORWARD_direct all -- any any anywhere anywhere
13 360 20430 FORWARD_IN_ZONES_SOURCE all -- any any anywhere anywhere
14 360 20430 FORWARD_IN_ZONES all -- any any anywhere anywhere
15 360 20430 FORWARD_OUT_ZONES_SOURCE all -- any any anywhere anywhere
16 360 20430 FORWARD_OUT_ZONES all -- any any anywhere anywhere
17 0 0 DROP all -- any any anywhere anywhere ctstate INVALID
18 360 20430 REJECT all -- any any anywhere anywhere reject-with icmp-host-prohibited
19 0 0 ACCEPT all -- any any 10.42.0.0/16 anywhere
20 0 0 ACCEPT all -- any any anywhere 10.42.0.0/16
21 0 0 ACCEPT all -- any any anywhere anywhere /* cali:S93hcgKJrXEqnTfs */ /* Policy explicitly accepted packet. */ mark match 0x10000/0x10000
The problem is that rule 18 rejects all traffic so that rules 19 to 21 are never reached. As a consequence traffic coming from/going to 10.42.0.0/16 (i.e. the node/pod address space) is not routed by the node. Instead rule 18 is applied which rejects the traffic, causing clients to get a "no route to host" response (reject-with icmp-host-prohibited).
Our workaround is to replicate the rules added by calico to iptable's builtin FORWARD
chain (19 to 21) in firewalld's FORWARD_direct
chain which is traversed thanks to rule 12. This is done permanently with the following firewalld commands:
# Replicate rule 19 to FORWARD_direct#1
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD_direct 0 -s 10.42.0.0/16 -j ACCEPT
# Replicate rule 20 to FORWARD_direct#2
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD_direct 1 -d 10.42.0.0/16 -j ACCEPT
# Replicate rule 21 to FORWARD_direct#3
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD_direct 2 -m comment --comment "Policy explicitly accepted packet." -m mark --mark 0x10000/0x10000 -j ACCEPT
# Reload the configuration
sudo firewall-cmd --reload
Now when we list the content of the FORWARD_direct
direct chain, we should have the following result:
> sudo iptables -L FORWARD_direct -nv --line-numbers
Chain FORWARD_direct (1 references)
num pkts bytes target prot opt in out source destination
1 4 225 ACCEPT all -- * * 10.42.0.0/16 0.0.0.0/0
2 0 0 ACCEPT all -- * * 0.0.0.0/0 10.42.0.0/16
3 0 0 ACCEPT all -- * * 0.0.0.0/0 0.0.0.0/0 /* Policy explicitly accepted packet. */ mark match 0x10000/0x10000
This way kubernetes packets are correctly routed and firewalld is still rejecting other packets not matching these rules.
Important:
FORWARD
chain (they are never reached) so we can still compare them to the replicated rules to check whether calico changed them.Hi,
I solved my problem in CentOS 8 by creating a new firewalld zone for kubernetes pods and setting its target to ACCEPT. So, firewalld will accept packets going into POD SUBNET CIDR (ingress zone) and also packets coming out of POD SUBNET CIDR (egress zone)
Commands :
firewall-cmd --permanent --delete-zone=kubernetes_pods
firewall-cmd --permanent --new-zone=kubernetes_pods
firewall-cmd --permanent --zone=kubernetes_pods --set-target=ACCEPT
firewall-cmd --permanent --zone=kubernetes_pods --add-source=<POD SUBNET CIDR>
firewall-cmd --reload
firewall-cmd man :
--permanent [--zone=zone] --set-target=target
Set the target of a permanent zone. target is one of: default, ACCEPT, DROP, REJECT
default is similar to REJECT, but has special meaning in the following scenarios:
1. ICMP explicitly allowed
At the end of the zone's ruleset ICMP packets are explicitly allowed.
2. forwarded packets follow the target of the egress zone
In the case of forwarded packets, if the ingress zone uses default then whether or not the packet will be
allowed is determined by the egress zone.
For a forwarded packet that ingresses zoneA and egresses zoneB:
· if zoneA's target is ACCEPT, DROP, or REJECT then the packet is accepted, dropped, or rejected
respectively.
· if zoneA's target is default, then the packet is accepted, dropped, or rejected based on zoneB's target. If
zoneB's target is also default, then the packet will be rejected by firewalld's catchall reject.
3. Zone drifting from source-based zone to interface-based zone
This only applies if AllowZoneDrifting is enabled. See firewalld.conf(5).
If a packet ingresses a source-based zone with a target of default, it may still enter an interface-based zone
(including the default zone).
Versions :
firewall-cmd --version
0.8.2
uname -r
4.18.0-240.10.1.el8_3.x86_64
cat /etc/redhat-release
CentOS Linux release 8.3.2011
kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:28:09Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-13T13:20:00Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
To see what is getting rejected by firewalld, use the below commands
firewall-cmd --set-log-denied=all
firewall-cmd --reload
dmesg | grep -i reject
My setup (built with Kubespray using kubernetes 1.20 and calico 3.16.8), the primary problem appears to be these two rules (the last two rules in the FORWARD chain, the first set up by firewalld, the second from Calico):
REJECT all -- 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited
ACCEPT all -- 0.0.0.0/0 0.0.0.0/0 /* cali:S93hcgKJrXEqnTfs */ /* Policy explicitly accepted packet. */ mark match 0x10000/0x10000
Adding the same ACCEPT rule before the REJECT rule allows it work.
The problem I see with the solution proposed by @sagarvelankar is that it could result in packets that should be rejected according to network policy setup being accepted.
This is with calico using "Insert" as its FELIX_CHAININSERTMODE
setting, so I am not clear why the rule is put at the end rather than earlier.
For Calico (E.g. a RKE clsuter for Rancher with mostly defaults), these rules work: (It allows all traffic to/from the Calico interfaces.
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 99 -o cali+ -j ACCEPT
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 99 -i cali+ -j ACCEPT
firewall-cmd --direct --add-rule ipv4 filter FORWARD 99 -o cali+ -j ACCEPT
firewall-cmd --direct --add-rule ipv4 filter FORWARD 99 -i cali+ -j ACCEPT
OK, after a bit of running around and having a working cluster with UFW and no other trickery, we found that Calico policies provide everything we need to enforce security. I had just not dug into it enough.
What we did, create cluster with multi-interface workers, setting the interfaces according to https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/rke-clusters/custom-nodes/agent-options/#ip-address-options. This cluster has NO FIREWALL, no firewalld or ufw. Once the cluster comes up, we then setup Calico policies to protect the public interface. I got help from someone on the team with a much better understanding of networks and security to derive these policies. This if for k8s v1.20, centos 8.3, and docker v20.10.6
So yes, it is possible to have a secure production cluster without a host firewall on the workers. This went against all my instincts.
This may seem obvious to some, but I hope this can help others. I wonder if anyone has and insights/objections to this approach?
OK, after a bit of running around and having a working cluster with UFW and no other trickery, we found that Calico policies provide everything we need to enforce security. I had just not dug into it enough.
What we did, create cluster with multi-interface workers, setting the interfaces according to https://rancher.com/docs/rancher/v2.x/en/cluster-provisioning/rke-clusters/custom-nodes/agent-options/#ip-address-options. This cluster has NO FIREWALL, no firewalld or ufw. Once the cluster comes up, we then setup Calico policies to protect the public interface. I got help from someone on the team with a much better understanding of networks and security to derive these policies. This if for k8s v1.20, centos 8.3, and docker v20.10.6
So yes, it is possible to have a secure production cluster without a host firewall on the workers. This went against all my instincts.
This may seem obvious to some, but I hope this can help others. I wonder if anyone has and insights/objections to this approach?
This is probably the "correct" way of going forward.
Do you mind sharing the Calico policies?
Changing the firewalld zone target to default can be a workaround calico-issuecommet
Disabling the firewall causes our setup to be non-CIS compliant as CIS requires
So what we tried is creating a separate firewalld zone with a policy of ACCEPT for the Pod CIDR
Inspired by earlier comments we added the following saltstack state (https://docs.saltproject.io/en/latest/ref/states/all/salt.states.iptables.html#salt.states.iptables.delete):
iptables-FORWARD-remove-REJECT:
iptables.delete:
- chain: FORWARD
- table: filter
- jump: REJECT
- reject-with: icmp-host-prohibited
- save: False
I don't know the exact iptables command, but removing this line from the FORWARD chain seems to fix the issue. I hope it helps someone.
Edit: Command seems to be /usr/sbin/iptables --wait -t filter -D FORWARD --jump REJECT --reject-with icmp-host-prohibited
. Use on own risk
For Calico (E.g. a RKE clsuter for Rancher with mostly defaults), these rules work: (It allows all traffic to/from the Calico interfaces.
firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 99 -o cali+ -j ACCEPT firewall-cmd --permanent --direct --add-rule ipv4 filter FORWARD 99 -i cali+ -j ACCEPT firewall-cmd --direct --add-rule ipv4 filter FORWARD 99 -o cali+ -j ACCEPT firewall-cmd --direct --add-rule ipv4 filter FORWARD 99 -i cali+ -j ACCEPT
Since direct rules are now deprecated, does anyone have an idea how to convert this set of direct rules (which worked perfectly for us) to a firewalld
policy (https://firewalld.org/documentation/man-pages/firewalld.policies.html)?
We're attempting to translate the same to a firewalld
policy that can be applied to el8+ systems with nftables
.
Or a rich rule (https://firewalld.org/documentation/man-pages/firewalld.richlanguage.html), for what matters.
firewall-cmd --permanent --delete-zone=kubernetes_pods firewall-cmd --permanent --new-zone=kubernetes_pods firewall-cmd --permanent --zone=kubernetes_pods --set-target=ACCEPT firewall-cmd --permanent --zone=kubernetes_pods --add-source=<POD SUBNET CIDR> firewall-cmd --reload
Switching to this works as well, and it's easier to translate if you use linux-system-roles.firewall
or similar things on Red Hat based systems. This is what we have in ansible:
firewall:
- zone: kubernetes-pods
state: present
permanent: true
- zone: kubernetes-pods
target: ACCEPT
source: "{{ rancher_overlay_network_cidr }}"
state: enabled
permanent: true
Works for both iptables
(el7) and nftables
(el8+/fedora).
firewall-cmd --permanent --delete-zone=kubernetes_pods firewall-cmd --permanent --new-zone=kubernetes_pods firewall-cmd --permanent --zone=kubernetes_pods --set-target=ACCEPT firewall-cmd --permanent --zone=kubernetes_pods --add-source=<POD SUBNET CIDR> firewall-cmd --reload
Switching to this works as well, and it's easier to translate if you use
linux-system-roles.firewall
or similar things on Red Hat based systems. This is what we have in ansible:firewall: - zone: kubernetes-pods state: present permanent: true - zone: kubernetes-pods target: ACCEPT source: "{{ rancher_overlay_network_cidr }}" state: enabled permanent: true
Works for both
iptables
(el7) andnftables
(el8+/fedora).
As one replaces el7 nodes with el8/9 in a cluster, one by one, this last bit about using the same configuration for all becomes all that more vital.
Side note: before we resolved it in this very way, we had suspected Felix to maybe recognise the backend improperly. See FELIX_IPTABLESBACKEND
with Legacy
, NFT
, Auto
at https://projectcalico.docs.tigera.io/archive/v3.19/reference/felix/configuration. Our first attempt of adding FelixConfiguration
resources per node (node.<nodename>
) didn't do anything, as the environment variables have higher precedence, but luckily changing FELIX_LOGSEVERITYSCREEN
to Info
already revealed felix merrily acting on nft
where appropriate. What really mattered was a reboot to clean out all the various attempts of the migration.
Long story short: enable some verbosity on the calico-node
daemonset via the environment variables, and make sure to reboot the node when making significant changes.
如果并不是firewalld
的重度用户可以参考我的办法,使用ufw
替换掉firewalld
,我替换之后整个集群运行正常。
yum install epel-release -y
systemctl stop firewalld
systemctl disable firewalld
yum install ufw -y
systemctl enable --now ufw.service
ufw default allow
ufw enable -y
ufw allow 22/tcp
ufw allow 10250/tcp
ufw allow 443/tcp
ufw allow 80/tcp
ufw allow 6443/tcp
ufw allow 8472/udp
ufw allow 6379/tcp
ufw allow 30000:49999/tcp
ufw allow 30000:49999/udp
ufw allow 9099/tcp
ufw allow 10254/tcp
ufw allow 19796/tcp
ufw allow 9796/tcp
ufw allow 4789/udp
ufw allow 6783:6784/udp
ufw allow 6783/tcp
ufw allow 2376/tcp
ufw allow 2379:2380/tcp
ufw default deny
ufw reload
systemctl restart docker
ufw status
What kind of request is this (question/bug/enhancement/feature request): Bug
Steps to reproduce (least amount of steps as possible): In Rancher, update rke-metadata-config to the following:
Refresh Kubernetes metadata
Prepare 1 node with Oracle Linux 7.7 by following the documentation: https://rancher.com/docs/rancher/v2.x/en/installation/options/firewall/ (Or use this AMI which has ports opened:
ami-06dd5f94499093e3d
)In Rancher, add a
v1.19.1-rancher1-1
custom cluster. 1 node, all roles using the node.Result: Cluster is stuck provisioning:
Error in kubelet:
Other details that may be helpful:
Environment information
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI):rancher\rancher:master-head
version 198ec5b
Cluster information
kubectl version
):v1.19.1-rancher1-1
gz#14269