kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.18k stars 6.48k forks source link

Coredns is in crashloopbackoff because of kubelet_systemd_hardening: true #11019

Closed dv29 closed 8 months ago

dv29 commented 8 months ago

What happened?

Coredns is in crashloopbackoff because of kubelet_systemd_hardening: true as soon as its turned of it works otherwise it doesnt start. coredns fails readiness and health probes and never trurns to running state

What did you expect to happen?

Coredns should run as without any problems

How can we reproduce it (as minimally and precisely as possible)?

ansible-playbook -v cluster.yml -i inventory/inficluster/hosts.yaml -b --become-user=root -e '@inventory/inficluster/hardening.yaml'

Hardening

# Hardening
---

## kube-apiserver
authorization_modes: ['Node', 'RBAC']
# AppArmor-based OS
kube_apiserver_feature_gates: ['AppArmor=true']
kube_apiserver_request_timeout: 120s
kube_apiserver_service_account_lookup: true

# enable kubernetes audit
kubernetes_audit: true
audit_log_path: "/var/log/kube-apiserver-log.json"
audit_log_maxage: 30
audit_log_maxbackups: 10
audit_log_maxsize: 100

tls_min_version: VersionTLS12
tls_cipher_suites:
  - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
  - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
  - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305

# enable encryption at rest
kube_encrypt_secret_data: true
kube_encryption_resources: [secrets]
kube_encryption_algorithm: "secretbox"

kube_apiserver_enable_admission_plugins:
  - EventRateLimit
  - AlwaysPullImages
  - ServiceAccount
  - NamespaceLifecycle
  - NodeRestriction
  - LimitRanger
  - ResourceQuota
  - MutatingAdmissionWebhook
  - ValidatingAdmissionWebhook
  - PodNodeSelector
  - PodSecurity
kube_apiserver_admission_control_config_file: true
# Creates config file for PodNodeSelector
# kube_apiserver_admission_plugins_needs_configuration: [PodNodeSelector]
# Define the default node selector, by default all the workloads will be scheduled on nodes
# with label network=srv1
# kube_apiserver_admission_plugins_podnodeselector_default_node_selector: "network=srv1"
# EventRateLimit plugin configuration
kube_apiserver_admission_event_rate_limits:
  limit_1:
    type: Namespace
    qps: 50
    burst: 100
    cache_size: 2000
  limit_2:
    type: User
    qps: 50
    burst: 100
kube_profiling: false

## kube-controller-manager
kube_controller_manager_bind_address: 127.0.0.1
kube_controller_terminated_pod_gc_threshold: 50
# AppArmor-based OS
kube_controller_feature_gates: ["RotateKubeletServerCertificate=true", "AppArmor=true"]
# kube_controller_feature_gates: ["RotateKubeletServerCertificate=true"]

## kube-scheduler
kube_scheduler_bind_address: 127.0.0.1
# AppArmor-based OS
kube_scheduler_feature_gates: ["AppArmor=true"]

## etcd
etcd_deployment_type: kubeadm

## kubelet
kubelet_authorization_mode_webhook: true
kubelet_authentication_token_webhook: true
kube_read_only_port: 0
kubelet_rotate_server_certificates: true
kubelet_protect_kernel_defaults: true
kubelet_event_record_qps: 1
kubelet_rotate_certificates: true
kubelet_streaming_connection_idle_timeout: "5m"
kubelet_make_iptables_util_chains: true
kubelet_feature_gates: ["RotateKubeletServerCertificate=true"]
kubelet_seccomp_default: true
kubelet_systemd_hardening: true
# In case you have multiple interfaces in your
# control plane nodes and you want to specify the right
# IP addresses, kubelet_secure_addresses allows you
# to specify the IP from which the kubelet
# will receive the packets.
kubelet_secure_addresses: "192.168.1.17 192.168.1.18 localhost link-local"

# additional configurations
kube_owner: root
kube_cert_group: root

# create a default Pod Security Configuration and deny running of insecure pods
# kube_system namespace is exempted by default
kube_pod_security_use_default: true
kube_pod_security_default_enforce: restricted

kubelet_csr_approver_values:
  providerRegex: '^node\d+$'
  bypassDnsResolution: true

OS

Linux 6.5.0-25-generic x86_64 PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy

Version of Ansible

ansible [core 2.16.4] config file = /home/tyson/Project/kubespray/ansible.cfg configured module search path = ['/home/tyson/Project/kubespray/library'] ansible python module location = /home/tyson/Project/kubespray-venv/lib/python3.10/site-packages/ansible ansible collection location = /home/tyson/.ansible/collections:/usr/share/ansible/collections executable location = /home/tyson/Project/kubespray-venv/bin/ansible python version = 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (/home/tyson/Project/kubespray-venv/bin/python3) jinja version = 3.1.2 libyaml = True

Version of Python

Python 3.10.12

Version of Kubespray (commit)

a1cf8291a

Network plugin used

calico

Full inventory with variables

This has some private vars

Command used to invoke ansible

ansible-playbook -v cluster.yml -i inventory/inficluster/hosts.yaml -b --become-user=root -e '@inventory/inficluster/hardening.yaml'

Output of ansible run

node2 : ok=732 changed=27 unreachable=0 failed=0 skipped=1188 rescued=0 ignored=1
node3 : ok=591 changed=13 unreachable=0 failed=0 skipped=1058 rescued=0 ignored=1

Anything else we need to know

No response

dv29 commented 8 months ago

Fixed it with this, had to add those entries,

kubelet_secure_addresses: "<host_addresses> localhost link-local 10.233.64.0/18"

https://github.com/kubernetes-sigs/kubespray/issues/10744#issue-2054743692