kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.05k stars 6.45k forks source link

Unable to Deploy Kubernetes on CIS Hardened (Ubuntu) #7878

Closed tigerpeng2001 closed 2 years ago

tigerpeng2001 commented 3 years ago

Environment:

Kubespray version (commit) (git rev-parse --short HEAD): 1

Network plugin used:

Full inventory with variables (ansible -i inventory/sample/inventory.ini all -m debug -a "var=hostvars[inventory_hostname]"):

 $ ansible -i inventory/local/hosts.ini all -m debug
[WARNING]: Found both group and host with same name: bastion
ip-10-189-123-112.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-144.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-186.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-123.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-116.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-156.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
ip-10-189-123-181.eu-west-1.compute.internal | SUCCESS => {
    "msg": "Hello world!"
}
bastion | SUCCESS => {
    "msg": "Hello world!"
}

Command used to invoke ansible: ansible-playbook -i ./inventory/local/hosts.ini ./cluster.yml -e local_volumes_enabled=true -e cloud_provider=aws -e ansible_user=ubuntu -b --become-user=root --flush-cache

Output of ansible run: TASK [etcd : Configure | Wait for etcd cluster to be healthy] ** fatal: [ip-10-189-123-116.eu-west-1.compute.internal]: FAILED! => {"attempts": 4, "changed": false, "cmd": "set -o pipefail && /usr/local/bin/etcdctl endpoint --cluster status && /usr/local/bin/etcdctl endpoint --cluster health 2>&1 | grep -v 'Error: unhealthy cluster' >/dev/null", "delta": "0:00:05.012099", "end": "2021-08-14 17:36:56.138848", "msg": "non-zero return code", "rc": 1, "start": "2021-08-14 17:36:51.126749", "stderr": "{\"level\":\"warn\",\"ts\":\"2021-08-14T17:36:56.137Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-ea3ecd4b-a915-4679-ab48-77738b5ab511/10.189.123.116:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = context deadline exceeded\"}\nError: failed to fetch endpoints from etcd cluster member list: context deadline exceeded", "stderr_lines": ["{\"level\":\"warn\",\"ts\":\"2021-08-14T17:36:56.137Z\",\"caller\":\"clientv3/retry_interceptor.go:62\",\"msg\":\"retrying of unary invoker failed\",\"target\":\"endpoint://client-ea3ecd4b-a915-4679-ab48-77738b5ab511/10.189.123.116:2379\",\"attempt\":0,\"error\":\"rpc error: code = DeadlineExceeded desc = context deadline exceeded\"}", "Error: failed to fetch endpoints from etcd cluster member list: context deadline exceeded"], "stdout": "", "stdout_lines": []}

Anything else do we need to know: Deployment to non-hardened Ubuntu 20.4 (ami-0a8e758f5e873d1c1) is succful Tried add user-data in terraform that bring up the instances:

setenforce 0
sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
systemctl stop firewalld
systemctl stop iptabels

On one of the etcd instances:

root@ip-10-189-123-116:~# docker logs etcd1 --tail=20
2021-08-14 19:26:10.245726 W | rafthttp: health check for peer b824a46a6d1774a6 could not connect: dial tcp 10.189.123.181:2380: i/o timeout
raft2021/08/14 19:26:14 INFO: d5675049be20a7a8 is starting a new election at term 897
raft2021/08/14 19:26:14 INFO: d5675049be20a7a8 became candidate at term 898
raft2021/08/14 19:26:14 INFO: d5675049be20a7a8 received MsgVoteResp from d5675049be20a7a8 at term 898
raft2021/08/14 19:26:14 INFO: d5675049be20a7a8 [logterm: 1, index: 3] sent MsgVote request to 35c1ab1d72f23a5a at term 898
raft2021/08/14 19:26:14 INFO: d5675049be20a7a8 [logterm: 1, index: 3] sent MsgVote request to b824a46a6d1774a6 at term 898
2021-08-14 19:26:15.227443 W | rafthttp: health check for peer 35c1ab1d72f23a5a could not connect: dial tcp 10.189.123.156:2380: i/o timeout
2021-08-14 19:26:15.233765 W | rafthttp: health check for peer 35c1ab1d72f23a5a could not connect: dial tcp 10.189.123.156:2380: i/o timeout
2021-08-14 19:26:15.238887 W | rafthttp: health check for peer b824a46a6d1774a6 could not connect: dial tcp 10.189.123.181:2380: i/o timeout
2021-08-14 19:26:15.245846 W | rafthttp: health check for peer b824a46a6d1774a6 could not connect: dial tcp 10.189.123.181:2380: i/o timeout
2021-08-14 19:26:20.133402 E | etcdserver: publish error: etcdserver: request timed out
2021-08-14 19:26:20.227562 W | rafthttp: health check for peer 35c1ab1d72f23a5a could not connect: dial tcp 10.189.123.156:2380: i/o timeout
2021-08-14 19:26:20.233932 W | rafthttp: health check for peer 35c1ab1d72f23a5a could not connect: dial tcp 10.189.123.156:2380: i/o timeout
2021-08-14 19:26:20.239007 W | rafthttp: health check for peer b824a46a6d1774a6 could not connect: dial tcp 10.189.123.181:2380: i/o timeout
2021-08-14 19:26:20.246002 W | rafthttp: health check for peer b824a46a6d1774a6 could not connect: dial tcp 10.189.123.181:2380: i/o timeout
raft2021/08/14 19:26:21 INFO: d5675049be20a7a8 is starting a new election at term 898
raft2021/08/14 19:26:21 INFO: d5675049be20a7a8 became candidate at term 899
raft2021/08/14 19:26:21 INFO: d5675049be20a7a8 received MsgVoteResp from d5675049be20a7a8 at term 899
raft2021/08/14 19:26:21 INFO: d5675049be20a7a8 [logterm: 1, index: 3] sent MsgVote request to 35c1ab1d72f23a5a at term 899
raft2021/08/14 19:26:21 INFO: d5675049be20a7a8 [logterm: 1, index: 3] sent MsgVote request to b824a46a6d1774a6 at term 899
root@ip-10-189-123-116:~# iptables -L
Chain INPUT (policy DROP)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
DROP       all  --  localhost/8          anywhere            
ACCEPT     tcp  --  anywhere             anywhere             state ESTABLISHED
ACCEPT     udp  --  anywhere             anywhere             state ESTABLISHED
ACCEPT     icmp --  anywhere             anywhere             state ESTABLISHED
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:ssh state NEW
ACCEPT     udp  --  anywhere             anywhere             udp dpt:bootpc state NEW
ACCEPT     udp  --  anywhere             anywhere             udp dpt:ntp state NEW
ACCEPT     udp  --  anywhere             anywhere             udp dpt:323 state NEW

Chain FORWARD (policy DROP)
target     prot opt source               destination         
DOCKER-USER  all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy DROP)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     tcp  --  anywhere             anywhere             state NEW,ESTABLISHED
ACCEPT     udp  --  anywhere             anywhere             state NEW,ESTABLISHED
ACCEPT     icmp --  anywhere             anywhere             state NEW,ESTABLISHED

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere   
tigerpeng2001 commented 3 years ago

on the non-hardened Ubuntu, the iptables is much shorter:

# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
DOCKER-USER  all  --  anywhere             anywhere            
DOCKER-ISOLATION-STAGE-1  all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
DOCKER     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain DOCKER (1 references)
target     prot opt source               destination         

Chain DOCKER-ISOLATION-STAGE-1 (1 references)
target     prot opt source               destination         
DOCKER-ISOLATION-STAGE-2  all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-ISOLATION-STAGE-2 (1 references)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere            
RETURN     all  --  anywhere             anywhere            

Chain DOCKER-USER (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere 
tigerpeng2001 commented 3 years ago

Using user-data in terraform to cleanup iptables I brought up Kubernetes. Please improve kubespray on deploying k8s on to CIS hardened OS without weakened the OS.

setenforce 0
sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -t nat -F
iptables -t mangle -F
iptables -F
iptables -X
cristicalin commented 3 years ago

@tigerpeng2001 your commands above effectively weaken the CIS hardening.

Note that kubespray does not test against CIS hardened configurations nor does the code check for non-standard setups like SELinux enabled on Debian / Ubuntu. There may be some non-trivial issues with this kind of configurations, if you require SELinux support for your environment you would be better served by a CentOS or Redhat flavor on which we actively do testing.

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/7878#issuecomment-1040298896): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.