k3s-io / k3s

Lightweight Kubernetes
https://k3s.io
Apache License 2.0
28.02k stars 2.35k forks source link

Formally add support for CentOS 7 #1371

Closed davidnuzik closed 4 years ago

davidnuzik commented 4 years ago

We need to expand our testing and identify any issues that prevent us from formally supporting CentOS. Keep in mind K3s is expected to work fine on CentOS 7. This issue is to track the testing effort required to formally support and certify the operating system (See https://rancher.com/docs/k3s/latest/en/installation/node-requirements/#operating-systems )

Currently there are existing issues with the os/centos label, but take care to note that these issues are not all necessarily caused just by utilizing CentOS. As such, it makes sense to review those GitHub issues, but we need to execute some testing and identify any other issues. As needed, we'll need to resolve these issues so we may fully support CentOS.

SELinux support is also needed, which is tracked separately here: https://github.com/rancher/k3s/issues/1372

gz#9311

gz#9743

davidnuzik commented 4 years ago

@ShylajaDevadiga I have assigned this issue to you for now. This will require some testing and discovery. We need to identify any/all CentOS issues that prevent us from formally supporting CentOS in our next release. Work with me as needed.

davidnuzik commented 4 years ago

As a reminder we must support IPv6 as well.

ThomasADavis commented 4 years ago

iptables in Centos8 is now legacy.. they now use iptables-nft.

so on a centos 8 system, using iptables gives you this:

[root@mouse-r13 ~]# iptables -L -v -n
Chain INPUT (policy ACCEPT 154K packets, 264M bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 957 packets, 53559 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 58311 packets, 10M bytes)
 pkts bytes target     prot opt in     out     source               destination         
# Warning: iptables-legacy tables present, use iptables-legacy to see them
[root@mouse-r13 ~]# iptables -t nat -L -v -n
Chain PREROUTING (policy ACCEPT 990 packets, 73523 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain INPUT (policy ACCEPT 59 packets, 3564 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain POSTROUTING (policy ACCEPT 693 packets, 45283 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 620 packets, 39433 bytes)
 pkts bytes target     prot opt in     out     source               destination         
# Warning: iptables-legacy tables present, use iptables-legacy to see them

and I believe firewalld is also not supported by k3s.

This doesn't mean k3s does not work, it's just not possible to see the iptables rules.

Lohann commented 4 years ago

+1

Lohann commented 4 years ago

Related #401 #1019

philipsparrow commented 4 years ago

I have been documenting all the steps I needed to get it working in CentOS7, I'll gladly share those steps. It worked out of the box on a Google Cloud VM but not on a local, freshly installed instance. Namely, installation of iptables and removal of firewalld and wiping out reject rules from INPUT and FORWARD chains and installation of semanage. I'll gladly share these steps although my procedure is a little heavy-handed

ThomasADavis commented 4 years ago

Reader digest version: Don't use Centos v8 because of nft/legacy iptables problems.

So, to help clarify - in my reasearch, RHEL/Centos8 uses nft for iptables, not iptables. At this time, nft is not supported by Kubernetes. There is iptables/iptables-legacy support, and what will happen is the rules are still created and executed, but in RHEL/Centos8, they do not live in harmony with any other nft/iptables, unless it's the only ruleset you want to run.

You cannot see these iptables rule sets by default, since they occur in the legacy iptables rules due to the container having it's own iptables (not nft the binaries), and RHEL/Centos8 does not provide the legacy iptable tools.

There are other distributions heading towards using nft instead of iptables, but so far, it appears that they do include the legacy iptable binaries.

This means until nft is in Kubernetes (not k3s), RHEL/Centos8 and other distribution using nft tables is not truly supported.

ThomasADavis commented 4 years ago

Well, maybe it's not so bad for RHEL/Centos8..

see https://github.com/kubernetes/kubernetes/issues/71305

sraillard commented 4 years ago

@philipsparrow I would be interessed in the steps needed to make k3s works on CentOS 7.7. Even after a fresh install, removing firewalld, disabling SELinux, installing iptables-service, adding "user_namespace.enable=1" to the kernel command line, k3s is still not vworking... It's looking like a network issue as the API server isn't reachable.

Lohann commented 4 years ago

@sraillard I wrote a step-by-step here, let me know if it works for you: https://github.com/rancher/k3s/issues/1019#issuecomment-593043089

philipsparrow commented 4 years ago

I don't think I have anything as good as @Lohann has provided, I got it working with only the following steps (caveat: I don't need Traefik so haven't worked on that):

systemctl stop firewalld
systemctl disable firewalld
yum update
yum install -y iptables-services policycoreutils-python
systemctl start iptables
systemctl enable iptables
grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
iptables -F
# This gets rid of any DROP rules in the INPUT and FORWARD chains
iptables-save > /etc/sysconfig/iptables
reboot now

Then I installed K3s with no special options. FYI in my debugging, I found it enormously helpful to check both routes and firewall. Sometimes I was missing routes. ip a and ip route are your friend. From my cluster (single master, 2 worker nodes, flannel VXLAN) I expect to see routes that look like:

default via 10.126.126.1 dev eth0 proto dhcp metric 100
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1
10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
10.126.126.1 dev eth0 proto dhcp scope link metric 100
10.126.126.3 dev eth0 proto kernel scope link src 10.126.126.3 metric 100

I hope this helps

sraillard commented 4 years ago

Thank you @Lohann and @philipsparrow, I was able to make it work.

I'm not sure where the black magic is, I have installed the policycoreutils-python package and I have saved the iptables configuration once cleaned (I think I was missing that step).

To clean all iptables tables, I have used:

iptables -F
iptables -F -t nat
iptables -F -t mangle
philipsparrow commented 4 years ago

Thank you @Lohann and @philipsparrow, I was able to make it work.

I'm not sure where the black magic is, I have installed the policycoreutils-python package and I have saved the iptables configuration once cleaned (I think I was missing that step).

To clean all iptables tables, I have used:

iptables -F
iptables -F -t nat
iptables -F -t mangle

That aught to do it, but check that on reboot your iptables rules aren't re-populated. Saving the configuration iptables-save > /etc/sysconfig/iptables worked wonders for me. I think the general idea here for iptables rules is to remove any DROP from the INPUT and FORWARD chains

davidnuzik commented 4 years ago

We are still planning support in the v1.17.x scope however this is not going to make it in v1.17.4+k3s1. This will likely be in the next release.

parekhha commented 4 years ago

@Lohann @philipsparrow, I was able to make work k3s on centos but I am facing following problem.

I have 3 master node with external etcd. (no worker nodes) and I have deployed admission controller on this k3s. What I have observed is, k3s server taking too long time (more than 1 minutes) to connect with admission control service if pod is running on another k3s server host.

It seems from k3s host, its taking time to connect with pod running on different k3s host using clusterIP (not pod ip) however if pod is running on same k3s host its not a problem.

philipsparrow commented 4 years ago

@parekhha Does this issue happen only when this configuration is on CentOS? It doesn't sound like an OS issue (but I'm no expert).

parekhha commented 4 years ago

@philipsparrow It get resolved after I updated kernel version.

Loki-Afro commented 4 years ago

if you install k3s on centos 7 the executeables get written to /usr/local/bin however I installed it as root and /usr/local/bin is normally not in the $PATH variable of root. Maybe that is something to consider as well.

Also I'd like to add .. as of centos 7 dosen't seem to work for now, one could hack a Vagrantfile (with centos 7) or something together to have something to test with.

Drakkai commented 4 years ago

@Loki-Afro It is not true, /usr/local/bin is in root's $PATH by default. Here is $PATH from fresh installation of centos 7: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin

I've managed to install k3s on centos 7 (one node/master installation). The only problem (after node reboot) I have is with iptables and dial tcp 10.43.0.1:443: connect: no route to host in coredns but flush of rules helps.

CentOS Linux release 7.7.1908 (Core)
3.10.0-1062.18.1.el7.x86_64
iptables v1.4.21
k3s version v1.17.3+k3s1 (5b17a175)
erikwilson commented 4 years ago

fwiw, Centos8 is a no go until we make some changes to k3s & dependencies also, I am guessing it will be awhile before all of that stuff is worked out.

We would like to support firewalld, ufw, and any other firewall. For the most part this just means adding docs on how to whitelist the cni interface & poke holes for the k3s api service or other services.

We have also added selinux support for containerd/cri and it is enabled by default in 1.17.4. This can cause issues as related here: https://github.com/rancher/k3s/issues/1583#issuecomment-605169698 (summary: use --disable-selinux for old behavior, or install the k3s-selinux policy & deal with selinux going forward).

sraillard commented 4 years ago

I can confirm what @Drakkai is saying about the default path: when installed using the root user on CentOS7, the k3s executable and the kubectl command are working just after k3s is installed. And as suspected, I think it's more a firewall management issue.

Loki-Afro commented 4 years ago

@sraillard @Drakkai well that is really strange. But I wasn't the first one :(

https://serverfault.com/questions/833762/where-does-the-bash-path-on-centos-7-get-usr-local-bin-from

and https://bugs.centos.org/view.php?id=7492

basically there is some inconsistency when /usr/local/bin is added to the PATH and when not. But maybe one should keep that one in mind ...

erikwilson commented 4 years ago

From my personal experience with CentOS7 /usr/local/bin is not in the path. I have modified my provisioning with Vagrant to include it, but for the rpm version of k3s we plan to install to /usr/bin.

Lohann commented 4 years ago

/usr/local/bin isn't in root $PATH on Centos 7/8 by default, and isn't recommended to include it there.

https://security.stackexchange.com/questions/136990/whats-the-motivation-for-excluing-usr-local-bin-from-roots-path

More to the point, you're vulnerable if there is danger of bad stuff being installed at /usr/local/bin. By forcing yourself to use the full path (/usr/local/bin/whatever) you don't have any risk of accidentally invoking bad stuff via $PATH

Fodoj commented 4 years ago

See https://github.com/rancher/k3s/issues/1666

nemonik commented 4 years ago

I don't think I have anything as good as @Lohann has provided, I got it working with only the following steps (caveat: I don't need Traefik so haven't worked on that):

systemctl stop firewalld
systemctl disable firewalld
yum update
yum install -y iptables-services policycoreutils-python
systemctl start iptables
systemctl enable iptables
grubby --args="user_namespace.enable=1" --update-kernel="$(grubby --default-kernel)"
iptables -F
# This gets rid of any DROP rules in the INPUT and FORWARD chains
iptables-save > /etc/sysconfig/iptables
reboot now

Then I installed K3s with no special options. FYI in my debugging, I found it enormously helpful to check both routes and firewall. Sometimes I was missing routes. ip a and ip route are your friend. From my cluster (single master, 2 worker nodes, flannel VXLAN) I expect to see routes that look like:

default via 10.126.126.1 dev eth0 proto dhcp metric 100
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1
10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
10.126.126.1 dev eth0 proto dhcp scope link metric 100
10.126.126.3 dev eth0 proto kernel scope link src 10.126.126.3 metric 100

I hope this helps

I took the approach of

iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X
service iptables save

to clear iptable rules.

I also installed these packaged via yum

so that the k3s-selinux policy would work.

In addition, I'd also encourage you to run on each node

sudo ethtool -K flannel.1 tx-checksum-ip-generic off

otherwise you will run into the problems coreos/flannel#1243 and #1638 with accessing applications in your cluster from worker nodes as described in these issues.

nemonik commented 4 years ago

I found

sudo ethtool -K flannel.1 tx-checksum-ip-generic off

will not persist reboots, so you have to wrap in systemd service

/etc/systemd/system/flannel-tx-checksum-ip-generic-off.service:

[Unit]
Description=Ensure TX (outgoing) checksum offloading is disabled on flannel.1
After=sys-devices-virtual-net-flannel.1.device

[Install]
WantedBy=sys-devices-virtual-net-flannel.1.device

[Service]
Type=oneshot
ExecStart=/sbin/ethtool -K flannel.1 tx-checksum-ip-generic off

and then enable and start, but on reboots where the master comes up fine... the worker node will not come up okay upon reboot... the k3s service is running, but routes that should be there are missing... and i cannot access the docker registry hosted on master exposed via a loadBalancerIP via metallb. If swap out CentOS 7 for Ubuntu using my Ansible automation... the problem doesn't exist.

brandond commented 4 years ago

I am curious to see how this gets fixed in k8s and flannel. The core issue (according to https://github.com/kubernetes/kubernetes/issues/88986#issuecomment-620633097) is a bug in the kernel netfilter code that was exposed by some recent updates to k8s's netlink code, but the fix is unlikely to be back ported to RHEL7.

nemonik commented 4 years ago

Hmm, I just restarted the k3s-agent service and problem went away... hmmm... My flannel-tx-checksum-ip-generic-off.service appears to kick off correctly, the node is added, etc. But w/o a restart this issue is not yet resolved. I may try something different. Interesting, @brandond.

brandond commented 4 years ago

Looks like flannel is going to just disable it automatically. https://github.com/coreos/flannel/pull/1282#issuecomment-617209151

nemonik commented 4 years ago

'til a fix in the kernel shows up from RH. These don't usually come fast. So, now I gotta figure out how to replace the Flannel that ships with k3s with the patched one.

alpapad commented 4 years ago

This:

Set FirewallBackend=iptables in /etc/firewalld/firewalld.conf and restart firewalld.

Seems to be needed.

Comes from: https://github.com/rancher/k3s/issues/1711

ShylajaDevadiga commented 4 years ago

Adding my progress in testing with no changes made to iptables. As centos user /usr/local/bin is in PATH as root /usr/local/bin is not in PATH Node OS CentOS 7 k3s v1.18.4+k3s1 Rancher version 2.4.5 With selinux set to Enforcing mode:

ShylajaDevadiga commented 4 years ago

Closing this issue as CentOS 7 validation is complete. Sonobuoy fails are tracked here #1960

noelmcloughlin commented 4 years ago

Just note, basic k3s installation on CentOS7 fails because there is no selinux-policy-base package. After some digging around k3s depends on selinux-policy-targeted/minimum on CentOS7.
This is not obvious so should be added to CentOS 7 validation learnings.

[vagrant@localhost ~]$ rpm -q --whatprovides selinux-policy-base
selinux-policy-targeted-3.13.1-266.el7_8.1.noarch
selinux-policy-minimum-3.13.1-266.el7_8.1.noarch
[vagrant@localhost ~]$ cat /etc/redhat-release 
CentOS Linux release 7.8.2003 (Core)
brandond commented 4 years ago

@noelmcloughlin do the steps documented here not work for you? https://rancher.com/docs/k3s/latest/en/advanced/#experimental-selinux-support

Since that package provides selinux-policy-base you should be able to simply yum install it as described.

ShylajaDevadiga commented 4 years ago

@noelmcloughlin Trying to get more info on this. selinux-policy-targeted.noarch is already installed. So we don't explicitly need to install selinux-policy-base package.

Do you see k3s installation failing on CentOS Linux release 7.8.2003 (Core)?

rpm -q --whatprovides selinux-policy-base
selinux-policy-targeted-3.13.1-266.el7.noarch

yum list installed |grep selinux
libselinux.x86_64                           2.5-15.el7                 installed
libselinux-python.x86_64                    2.5-15.el7                 installed
libselinux-utils.x86_64                     2.5-15.el7                 installed
selinux-policy.noarch                       3.13.1-266.el7             installed
selinux-policy-targeted.noarch              3.13.1-266.el7             installed

I am able to get it running using

yum install -y container-selinux
 rpm -i https://rpm.rancher.io/k3s-selinux-0.1.1-rc1.el7.noarch.rpm
curl -sfL https://get.k3s.io | sh -
 kubectl get pods -A 
NAMESPACE     NAME                                     READY   STATUS      RESTARTS   AGE
kube-system   metrics-server-7566d596c8-vnhpv          1/1     Running     0          4m35s
kube-system   local-path-provisioner-6d59f47c7-j4mvd   1/1     Running     0          4m35s
kube-system   helm-install-traefik-hzflz               0/1     Completed   0          4m35s
kube-system   svclb-traefik-q4tx2                      2/2     Running     0          4m21s
kube-system   coredns-8655855d6-f5qsc                  1/1     Running     0          4m35s
kube-system   traefik-758cd5fc85-nmglj                 1/1     Running     0          4m21s
noelmcloughlin commented 4 years ago

@noelmcloughlin do the steps documented here not work for you? selinux-policy-targeted.noarch is already installed. So we don't explicitly need to install selinux-policy-base package.

Those instructions are for selinux enforcing I guess. I was testing with selinux permissive (i.e. not targeting selinux, just generic use case).
I think installing K3S via script on CentOS was failing because that package was missing. It was a few days ago when I was exploring the issue.

ShylajaDevadiga commented 4 years ago

@noelmcloughlin When you have selinux set to permissive mode you can skip the installation of rpms by setting INSTALL_K3S_SELINUX_WARN=true.

curl -sfL https://get.k3s.io | INSTALL_K3S_SELINUX_WARN=true sh -s -
noelmcloughlin commented 4 years ago

Thanks, I missed that one.

brandond commented 4 years ago

Yeah to me "a system where SELinux is enabled by default" means enforcing or permissive - not absent or disabled. Maybe worth a clarifying change to the docs?

noelmcloughlin commented 4 years ago

I remember the issue now. Running the script failed. It did not say you should have selinux=enforcing or set INSTALL_K3S_SELINUX_WARN=true but instead threw an error message saying "ensure selinux-policy-base is installed" so that indicated a packaging problem, not a SELinux != enforcing issue. The script error confused me.

cjellick commented 4 years ago

@ShylajaDevadiga I dont think I want this issue closed until CentOS 7 is 100% validated. I can't see that happening until the conformance tests pass cleanly and successfully on an officially release. So, I htink its fine that you opened an issues specifically for the conformance test failures, but this issue should be held open until that one works.

cjellick commented 4 years ago

To be honest, I'm also not sure that we can claim cent 7 support without revisiting selinux.

ShylajaDevadiga commented 4 years ago

Closing issue as conformance tests have passed.Results tracked in https://github.com/rancher/k3s/issues/1960.

bbhenry commented 4 years ago

I just recently deployed K3S to a CentOS7 server. K3S was installed but the pods were not able to communicate to the api server just like described before. I had to disable firewalld to get things working. How is this ticket closed if the latest K3S should work on a CentOS7 environment? Am I missing something?

Fodoj commented 4 years ago

IMO this is expected behavior if you have firewall enabled. Installation of K3s doesn’t handle complete server configuration (correct me if I am wrong).

brandond commented 4 years ago

That is correct. It works on RHEL7 if you don't break it by blocking traffic or doing other things that would prevent it from working.

sraillard commented 4 years ago

I agree that k3s can't configure all the server settings. The fact is that firewalld is by default enabled, so that's classic issue (and many people have it). Maybe a solution could be checking some firewall rules and printing a warning if some rules may prevent k3s from working correctly?

Fodoj commented 4 years ago

For example, CentOS 7 AMIs (and I guess other cloud images) have firewalld disabled by default, but yeah, standard ISO installation has it enabled normally. But then, firewall could be also outside the server and also break K8s/K3s.