Unable to Deploy Kubernetes on CIS Hardened (Ubuntu)

Environment:

Cloud provider or hardware configuration: AWS (eu-west-1_
OS (printf "$(uname -srm)\n$(cat /etc/os-release)\n"): Ubuntu (CIS hardened 20.4 ami-099653576ff881962) as well as other CIS hardened AMIs
Version of Ansible (ansible --version): ansible 2.10.8
Version of Python (python --version): local: Python 3.9.6

Kubespray version (commit) (git rev-parse --short HEAD): 1

Network plugin used:

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

 $ ansible -i inventory/local/hosts.ini all -m debug
[WARNING]: Found both group and host with same name: bastion
ip-10-189-123-112.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-144.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-186.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-123.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-116.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-156.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-181.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
bastion | SUCCESS => {
    "msg": "Hello world!"
}

Command used to invoke ansible: ansible-playbook -i ./inventory/local/hosts.ini ./cluster.yml -e local_volumes_enabled=true -e cloud_provider=aws -e ansible_user=ubuntu -b --become-user=root --flush-cache

Output of ansible run: TASK [etcd : Configure | Wait for etcd cluster to be healthy] ** fatal: [ip-10-189-123-116.eu-west-1.compute.internal]: FAILED! => {"attempts": 4, "changed": false, "cmd": "set -o pipefail && /usr/local/bin/etcdctl endpoint --cluster status && /usr/local/bin/etcdctl endpoint --cluster health 2>&1 | grep -v 'Error: unhealthy cluster' >/dev/null", "delta": "0:00:05.012099", "end": "2021-08-14 17:36:56.138848", "msg": "non-zero return code", "rc": 1, "start": "2021-08-14 17:36:51.126749", "stderr": "{\"level\":\"warn\",\"ts\":\"2021-08-14T17:36:56.137Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-ea3ecd4b-a915-4679-ab48-77738b5ab511/10.189.123.116:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = context deadline exceeded\"}\nError: failed to fetch endpoints from etcd cluster member list: context deadline exceeded", "stderr_lines": ["{\"level\":\"warn\",\"ts\":\"2021-08-14T17:36:56.137Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-ea3ecd4b-a915-4679-ab48-77738b5ab511/10.189.123.116:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = context deadline exceeded\"}", "Error: failed to fetch endpoints from etcd cluster member list: context deadline exceeded"], "stdout": "", "stdout_lines": []}

Anything else do we need to know: Deployment to non-hardened Ubuntu 20.4 (ami-0a8e758f5e873d1c1) is succful Tried add user-data in terraform that bring up the instances:

setenforce 0
sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
systemctl stop firewalld
systemctl stop iptabels

On one of the etcd instances:

root@ip-10-189-123-116:~# docker logs etcd1 --tail=20
2021-08-14 19:26:10.245726 W | rafthttp: health check for peer b824a46a6d1774a6 could not connect: dial tcp 10.189.123.181:2380: i/o timeout
raft2021/08/14 19:26:14 INFO: d5675049be20a7a8 is starting a new election at term 897
raft2021/08/14 19:26:14 INFO: d5675049be20a7a8 became candidate at term 898
raft2021/08/14 19:26:14 INFO: d5675049be20a7a8 received MsgVoteResp from d5675049be20a7a8 at term 898
raft2021/08/14 19:26:14 INFO: d5675049be20a7a8 [logterm: 1, index: 3] sent MsgVote request to 35c1ab1d72f23a5a at term 898
raft2021/08/14 19:26:14 INFO: d5675049be20a7a8 [logterm: 1, index: 3] sent MsgVote request to b824a46a6d1774a6 at term 898
2021-08-14 19:26:15.227443 W | rafthttp: health check for peer 35c1ab1d72f23a5a could not connect: dial tcp 10.189.123.156:2380: i/o timeout
2021-08-14 19:26:15.233765 W | rafthttp: health check for peer 35c1ab1d72f23a5a could not connect: dial tcp 10.189.123.156:2380: i/o timeout
2021-08-14 19:26:15.238887 W | rafthttp: health check for peer b824a46a6d1774a6 could not connect: dial tcp 10.189.123.181:2380: i/o timeout
2021-08-14 19:26:15.245846 W | rafthttp: health check for peer b824a46a6d1774a6 could not connect: dial tcp 10.189.123.181:2380: i/o timeout
2021-08-14 19:26:20.133402 E | etcdserver: publish error: etcdserver: request timed out
2021-08-14 19:26:20.227562 W | rafthttp: health check for peer 35c1ab1d72f23a5a could not connect: dial tcp 10.189.123.156:2380: i/o timeout
2021-08-14 19:26:20.233932 W | rafthttp: health check for peer 35c1ab1d72f23a5a could not connect: dial tcp 10.189.123.156:2380: i/o timeout
2021-08-14 19:26:20.239007 W | rafthttp: health check for peer b824a46a6d1774a6 could not connect: dial tcp 10.189.123.181:2380: i/o timeout
2021-08-14 19:26:20.246002 W | rafthttp: health check for peer b824a46a6d1774a6 could not connect: dial tcp 10.189.123.181:2380: i/o timeout
raft2021/08/14 19:26:21 INFO: d5675049be20a7a8 is starting a new election at term 898
raft2021/08/14 19:26:21 INFO: d5675049be20a7a8 became candidate at term 899
raft2021/08/14 19:26:21 INFO: d5675049be20a7a8 received MsgVoteResp from d5675049be20a7a8 at term 899
raft2021/08/14 19:26:21 INFO: d5675049be20a7a8 [logterm: 1, index: 3] sent MsgVote request to 35c1ab1d72f23a5a at term 899
raft2021/08/14 19:26:21 INFO: d5675049be20a7a8 [logterm: 1, index: 3] sent MsgVote request to b824a46a6d1774a6 at term 899
root@ip-10-189-123-116:~# iptables -L
Chain INPUT (policy DROP)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
DROP       all  --  localhost/8          anywhere            
ACCEPT     tcp  --  anywhere             anywhere             state ESTABLISHED
ACCEPT     udp  --  anywhere             anywhere             state ESTABLISHED
ACCEPT     icmp --  anywhere             anywhere             state ESTABLISHED
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh state NEW
ACCEPT     udp  --  anywhere             anywhere             udp dpt:bootpc state NEW
ACCEPT     udp  --  anywhere             anywhere             udp dpt:ntp state NEW
ACCEPT     udp  --  anywhere             anywhere             udp dpt:323 state NEW

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy DROP)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     tcp  --  anywhere             anywhere             state NEW,ESTABLISHED
ACCEPT     udp  --  anywhere             anywhere             state NEW,ESTABLISHED
ACCEPT     icmp --  anywhere             anywhere             state NEW,ESTABLISHED

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere

on the non-hardened Ubuntu, the iptables is much shorter:

# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
DOCKER-USER  all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere

Using user-data in terraform to cleanup iptables I brought up Kubernetes. Please improve kubespray on deploying k8s on to CIS hardened OS without weakened the OS.

setenforce 0
sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X

@tigerpeng2001 your commands above effectively weaken the CIS hardening.

Note that kubespray does not test against CIS hardened configurations nor does the code check for non-standard setups like SELinux enabled on Debian / Ubuntu. There may be some non-trivial issues with this kind of configurations, if you require SELinux support for your environment you would be better served by a CentOS or Redhat flavor on which we actively do testing.

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/7878#issuecomment-1040298896): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-sigs / kubespray

Unable to Deploy Kubernetes on CIS Hardened (Ubuntu) #7878