Canal containers give selinux related error message

nheinemans commented 4 years ago

RKE version: 0.3.0

Docker version: (docker version,docker info preferred)

Client: Docker Engine - Community
 Version:           19.03.3
 API version:       1.39 (downgraded from 1.40)
 Go version:        go1.12.10
 Git commit:        a872fc2f86
 Built:             Tue Oct  8 00:58:10 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          18.09.1
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.6
  Git commit:       4c52b90
  Built:            Wed Jan  9 19:06:30 2019
  OS/Arch:          linux/amd64
  Experimental:     false

Docker daemon.json:

{
  "selinux-enabled": true,
  "userland-proxy": false,
  "bip": "10.10.0.1/24",
  "fixed-cidr": "10.10.0.1/24"
}

Operating system and kernel: (cat /etc/os-release, uname -r preferred) NAME="Red Hat Enterprise Linux" VERSION="8.0 (Ootpa)" ID="rhel" ID_LIKE="fedora" VERSION_ID="8.0" PLATFORM_ID="platform:el8" PRETTY_NAME="Red Hat Enterprise Linux 8.0 (Ootpa)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:8.0:GA" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/"

REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8" REDHAT_BUGZILLA_PRODUCT_VERSION=8.0 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8.0"

Type/provider of hosts: (VirtualBox/Bare-metal/AWS/GCE/DO) Doesn't matter

cluster.yml file:

cluster_name: name

nodes:
  - address: node1
    user: user
    ssh_key_path: /home/user/.ssh/id_rsa
    role: [controlplane,etcd,worker]
  - address: node2
    user: user
    ssh_key_path: /home/user/.ssh/id_rsa
    role: [controlplane,etcd,worker]
  - address: node3
    user: user
    ssh_key_path: /home/user/.ssh/id_rsa
    role: [controlplane,etcd,worker]

private_registries:
  - url: internal-registry
    is_default: true # All system images will be pulled using this registry. 

services:
  etcd:
    snapshot: true
    creation: 6h
    retention: 24h

Steps to Reproduce: rke up When the cluster is built, I see problemens with canal pods:

kubectl -n kube-system get pods
NAME                                      READY   STATUS                  RESTARTS   AGE
canal-9vg2d                               1/2     Running                 0          45h
canal-ftfrv                               0/2     Init:CrashLoopBackOff   197        16h
canal-l5g2d                               2/2     Running                 0          147m
coredns-5c98fc7769-wbscd                  0/1     CrashLoopBackOff        487        45h
coredns-autoscaler-64c857cf7-qgqwc        1/1     Running                 0          167m
metrics-server-7cf4dfc846-2vvbl           1/1     Running                 34         167m
rke-coredns-addon-deploy-job-kn952        0/1     Completed               0          45h
rke-ingress-controller-deploy-job-f29cv   0/1     Completed               0          45h
rke-metrics-addon-deploy-job-hfsxx        0/1     Completed               0          45h
rke-network-plugin-deploy-job-lfnj4       0/1     Completed               0          45h

Looking into the cni-install pod, I see this error message:

mv: inter-device move failed: '/calico.conf.tmp' to '/host/etc/cni/net.d/10-canal.conflist'; unable to remove target: Permission denied
Failed to mv files. This may be caused by selinux configuration on the host, or something else.

Results: Cluster doesn't work properly. Setting selinux to permissive is not really an option.

carloscarnero commented 4 years ago

Exactly the same thing happened to me after updating a cluster from CentOS 7.6 to 7.7, leading me to believe that something changed in SELinux in the transition (I've checked their release notes and found nothing.)

I "fixed" it by changing the network plugin and using plain flannel for the time being (which was... laborious) but, because of this, I still haven't upgraded CentOS on the other clusters. Also, see projectcalico/calico#2704.

leodotcloud commented 4 years ago

Whil trying to reproduce the problem using a couple of different cloud providers, I see that ip_tables module is not loaded by default in RHEL8/CentOS 8 VMs.

[root@ip-172-31-16-240 ~]# lsmod | grep ip_tables
[root@ip-172-31-16-240 ~]#

This is causing problems with the install. Running modprobe ip_tables enables the modules and the installation goes through fine with 'Enforcing' setting.

@nheinemans and @carloscarnero could you check if this step resolves your problem?

carloscarnero commented 4 years ago

I have not upgraded to CentOS 8 yet. Instead, I observed the problem going from 7.6 to 7.7. Thank you @leodotcloud for looking into this!

leodotcloud commented 4 years ago

@carloscarnero Where are your machines running (cloud/on-prem)? Any steps to reproduce the problem?

carloscarnero commented 4 years ago

@carloscarnero Where are your machines running (cloud/on-prem)? Any steps to reproduce the problem?

The following is the rke configuration for a three-node on-premises 1.15.5 cluster (some data is obscured/anonymized) that uses an internal registry because this setup is (mostly) air-gapped:

---
cluster_name: development
nodes:
- address: cfdd9f3c.example.com
  user: dockeruser
  role:
  - controlplane
  - etcd
  - worker
- address: b5833011.example.com
  user: dockeruser
  role:
  - controlplane
  - etcd
  - worker
- address: 307309d8.example.com
  user: dockeruser
  role:
  - controlplane
  - etcd
  - worker
network:
  plugin: canal
dns:
  provider: coredns
  upstreamnameservers:
  - 8.8.8.8
ingress:
  provider: none
system_images:
  etcd: example.com/rancher/coreos-etcd:v3.3.10-rancher1
  alpine: example.com/rancher/rke-tools:v0.1.50
  nginx_proxy: example.com/rancher/rke-tools:v0.1.50
  cert_downloader: example.com/rancher/rke-tools:v0.1.50
  kubernetes: example.com/rancher/hyperkube:v1.15.5-rancher1
  kubernetes_services_sidecar: example.com/rancher/rke-tools:v0.1.50
  pod_infra_container: example.com/rancher/pause:3.1
  kubedns: example.com/rancher/k8s-dns-kube-dns-amd64:1.15.0
  dnsmasq: example.com/rancher/k8s-dns-dnsmasq-nanny-amd64:1.15.0
  kubedns_sidecar: example.com/rancher/k8s-dns-sidecar-amd64:1.15.0
  kubedns_autoscaler: example.com/rancher/cluster-proportional-autoscaler:1.3.0
  coredns: example.com/rancher/coredns:1.3.1
  coredns_autoscaler: example.com/rancher/cluster-proportional-autoscaler:1.3.0
  flannel: example.com/rancher/coreos-flannel:v0.11.0-rancher1
  flannel_cni: example.com/rancher/coreos-flannel-cni:v0.3.0-rancher5
  calico_node: example.com/rancher/calico-node:v3.7.4
  calico_cni: example.com/rancher/calico-cni:v3.7.4
  calico_controllers: example.com/rancher/calico-kube-controllers:v3.7.4
  calico_ctl: example.com/rancher/calico-ctl:v2.0.0
  canal_node: example.com/rancher/calico-node:v3.7.4
  canal_cni: example.com/rancher/calico-cni:v3.7.4
  canal_flannel: example.com/rancher/coreos-flannel:v0.11.0
  weave_node: example.com/rancher/weave-kube:2.5.2
  weave_cni: example.com/rancher/weave-npc:2.5.2
  ingress: example.com/rancher/nginx-ingress-controller:nginx-0.25.1-rancher1
  ingress_backend: example.com/rancher/nginx-ingress-controller-defaultbackend:1.5-rancher1
  metrics_server: example.com/rancher/metrics-server:v0.3.3

The nodes are based on CentOS 7.7, updated up to the last minute; and during basic system configuration, the documented requirements were taken into account. SELinux is completely enabled, of course, and that's what's preventing calico/canal to start.

leodotcloud commented 4 years ago

Thanks @carloscarnero for sharing the detailed info. I will try to reproduce it on my end. I have one more question. Do you use the stock CentOS ISO to bring up the machines or do you have any other customizations done?

carloscarnero commented 4 years ago

Thanks @carloscarnero for sharing the detailed info. I will try to reproduce it on my end. I have one more question. Do you use the stock CentOS ISO to bring up the machines or do you have any other customizations done?

I'm using the CentOS minimal install option, practically vanilla. Close to no customizations, except that I remove the firewalld service and install iptables, which is a fully supported option (besides, it has worked as such close to two years.)

One more thing I have just discovered: I was wrong that this happened during the upgrade from CentOS 7.6 to 7.7... I just checked a non-upgraded cluster and it was already failing (fragment):

NAMESPACE     NAME                     READY  STATUS                  RESTARTS   AGE
kube-system   canal-2xvgm              0/2    Init:CrashLoopBackOff   4289   15d
kube-system   canal-7x2tq              2/2    Running                 0          27d
kube-system   canal-jsfrm              0/2    Init:CrashLoopBackOff   2814    9d
kube-system   canal-rpgpd              0/2    Init:CrashLoopBackOff   2808    9d
kube-system   canal-tbvt5              0/2    Init:CrashLoopBackOff   2808    9d
kube-system   canal-vqbh8              2/2    Running                 0          27d
kube-system   canal-xcfqb              0/2    Init:CrashLoopBackOff   2808    9d
kube-system   coredns-795fc698b-68qjv  1/1    Running                 4          27d
kube-system   coredns-795fc698b-xthjd  1/1    Running                 5          27d

The above comes from a seven-node cluster, configured with the same settings as before, and you can see that five pods failed, and two are running.

I can be 100% certain that the OS settings are the same, as they're managed via Ansible. The logs for the failing pods show exactly the same message as with the opening message of this issue:

mv: inter-device move failed: '/calico.conf.tmp' to …
  '/host/etc/cni/net.d/10-canal.conflist'; unable to …
  remove target: Permission denied

Every time the previous message pops up, there's a corresponding one on the SELinux audit log:

type=AVC msg=audit(1572642338.208:17704): avc: …
  denied  { unlink } for  pid=16000 comm="mv" …
  name="10-canal.conflist" dev="sda1" ino=891087 …
  scontext=system_u:system_r:container_t:s0:c387,c438 …
  tcontext=system_u:object_r:container_file_t:s0:c308,c873 …
  tclass=file permissive=0

carloscarnero commented 4 years ago

From the discussion in projectcalico/calico#2704 it seems that

securityContext:
  privileged: true

is needed in order to properly handle SELinux systems. Thus, I edited the running canal daemonset with kubectl -n kube-system edit daemonset/canal and added those lines to the init container named install-cni.

After saving, the pods immediately reached the running state, and no more errors were logged. Maybe this suggests that those lines are missing in the template?

carloscarnero commented 4 years ago

@leodotcloud I have tried the fix above in another different cluster, and it seems to work.

superseb commented 4 years ago

RHEL8 support is tracked in https://github.com/rancher/rancher/issues/23045.

To validate the new templates (should show privileged true in the new templates and nothing in the old templates):

Canal

kubectl get ds -n kube-system -l k8s-app=canal -o json | jq .items[].spec.template.spec.initContainers[].securityContext

Calico

kubectl get ds -n kube-system -l k8s-app=calico-node -o json | jq .items[].spec.template.spec.initContainers[].securityContext

superseb commented 4 years ago

@carloscarnero If you can test this change on some lab machines which are identical to the ones that were exhibiting the problem, that would be appreciated

soumyalj commented 4 years ago

Reproduced the issue with RKE version v0.3.2 for Canal network plugin: security context for the template returns a NULL as below:

soumyas-MBP:rke soumya$ kubectl --kubeconfig kube_config_clusterzero.yml get ds -n kube-system -l k8s-app=canal -o json | jq .items[].spec.template.spec.initContainers[].securityContext 
null

Tested with rke version v1.1.0-rc11. Created a 3 node cluster with 3 roles K8s version K8s1.15.10-rancher1-2 for different configs as below. Cluster came up successfully . Security context for the template returns, privileged=true

RHEL 7.7 nodes, native docker and SELINUX ON - Canal network plugin


kubectl --kubeconfig kube_config_clusterzero.yml get ds -n kube-system -l k8s-app=canal -o json | jq .items[].spec.template.spec.initContainers[].securityContext 
{
"privileged": true
}

2. RHEL 7.7 nodes, native docker and SELINUX ON - Calico network plugin

soumyas-MBP:rke soumya$ kubectl --kubeconfig kube_config_clusterzero.yml get ds -n kube-system -l k8s-app=calico-node -o json | jq .items[].spec.template.spec.initContainers[].securityContext { "privileged": true } { "privileged": true }

2.  RHEL 7.7 nodes, native docker and SELINUX OFF - Canal network plugin

kubectl get ds -n kube-system -l k8s-app=canal -o json | jq .items[].spec.template.spec.initContainers[].securityContext { "privileged": true }

3. RHEL 7.7 nodes, upstream docker and SELINUX ON - Canal network plugin.

kubectl get ds -n kube-system -l k8s-app=canal -o json | jq .items[].spec.template.spec.initContainers[].securityContext { "privileged": true }


Automation tests were also run on the above setups with Canal network plugin and no issues were found.

carloscarnero commented 4 years ago

@carloscarnero If you can test this change on some lab machines which are identical to the ones that were exhibiting the problem, that would be appreciated

@superseb I'm not clear what I should test. I mean... should I use rke v1.1.0-rc11? If that's the case, should I test against one of that version's supported K8s?

EDIT: based on the previous comment, I will test with v1.1.0-rc11 and K8s1.15.10-rancher1-2. The operating system is CentOS 7.7, completely updated, with SELinux enabled and enforcing. This will take some time because all my setups are air-gapped and I have to prime the internal registry.

carloscarnero commented 4 years ago

Success using v1.1.0-rc11 and K8s1.15.10-rancher1-2 on CentOS 7.7 with enforcing SELinux! Note, however:

I had to use the CoreDNS images from v1.16.7-rancher1-2, rancher/coredns-coredns:1.6.2, instead of rancher/coredns-coredns:1.3.1 because the latter was failing with an error (pod logs reported that the --nodelabel option was incorrect, and I assumed that it was introduced later.)
I had to specify the calico_flexvol and canal_flexvol images in the config.yml because the nodes were trying to get them from the Internet, not sure why (that failed because this is an air-gapped setup.) I used the values from v1.16.7-rancher1-2.

Next test is upgrading from 1.15.5 to 1.15.10, and will report back in this very comment to avoid further noise.

EDIT: A cluster upgrade into 1.15.10 from 1.15.5 was successful! The canal pods are privileged and running properly.

superseb commented 4 years ago

Thanks for testing

rancher / rke

Canal containers give selinux related error message #1691