kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.18k stars 6.48k forks source link

[BUG] Errors with Kubernetes and Hardening values #10102

Closed pomland-94 closed 10 months ago

pomland-94 commented 1 year ago

Hey, I installed my Cluster with Kubespray and use this to hardening my Cluster. https://kubespray.io/#/docs/hardening and this one https://github.com/kubernetes-sigs/kubespray/blob/master/docs/cgroups.md

But every time I try to setup workloads I get lots of errors inside my Cluster, for example with the Bitnami Chart for cert-manager:

helm install certificate-manager oci://registry-1.docker.io/bitnamicharts/cert-manager \
--namespace certificate-manager --create-namespace \
--set installCRDs=true \
--set replicaCount=2 \
--set controller.replicaCount=2 \
--set controller.podSecurityContext.enabled=true \
--set controller.containerSecurityContext.enabled=true \
--set webhook.replicaCount=2 \
--set webhook.podSecurityContext.enabled=true \
--set webhook.containerSecurityContext.enabled=true \
--set webhook.securityContext.allowPrivilegeEscalation=false \
--set webhook.securityContext.capabilities.drop="{ALL}" \
--set webhook.securityContext.seccompProfile.type="RuntimeDefault" \
--set cainjector.podSecurityContext.enabled=true \
--set cainjector.containerSecurityContext.enabled=true \
--set cainjector.securityContext.allowPrivilegeEscalation=false \
--set cainjector.securityContext.capabilities.drop="{ALL}" \
--set cainjector.securityContext.seccompProfile.type="RuntimeDefault" \
--set metrics.enabled=true \
--set rbac.create=true

I get the following errors:

E0517 22:57:20.943031   16889 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0517 22:57:20.963913   16889 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 22:57:21.038029   16889 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0517 22:57:21.048496   16889 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 22:57:21.068323   16889 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0517 22:57:21.089424   16889 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 22:57:21.160222   16889 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0517 22:57:21.177626   16889 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 22:57:21.197235   16889 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0517 22:57:21.218133   16889 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
W0517 22:57:24.900463   16889 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "cainjector" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "cainjector" must set securityContext.capabilities.drop=["ALL"]), seccompProfile (pod or container "cainjector" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W0517 22:57:24.981046   16889 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "cert-manager" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "cert-manager" must set securityContext.capabilities.drop=["ALL"]), seccompProfile (pod or container "cert-manager" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
W0517 22:57:25.074086   16889 warnings.go:70] would violate PodSecurity "restricted:latest": allowPrivilegeEscalation != false (container "cert-manager-webhook" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "cert-manager-webhook" must set securityContext.capabilities.drop=["ALL"]), seccompProfile (pod or container "cert-manager-webhook" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")

I also get errors with the Metrics Server:

kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

or when I try to list Namespaces or delete Resources:

kubectl describe ns cert-manager
E0517 23:02:33.255386   17999 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0517 23:02:33.284783   17999 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 23:02:33.311342   17999 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0517 23:02:33.333329   17999 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 23:02:33.355377   17999 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0517 23:02:33.376425   17999 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0517 23:02:33.399115   17999 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0517 23:02:33.421441   17999 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
Name:         cert-manager
Labels:       kubernetes.io/metadata.name=cert-manager
              name=cert-manager
Annotations:  <none>
Status:       Terminating
Conditions:
  Type                                         Status  LastTransitionTime               Reason                  Message
  ----                                         ------  ------------------               ------                  -------
  NamespaceDeletionDiscoveryFailure            True    Wed, 17 May 2023 22:45:11 +0200  DiscoveryFailed         Discovery failed for some groups, 2 failing: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request, projectcalico.org/v3: the server is currently unable to handle the request
  NamespaceDeletionGroupVersionParsingFailure  False   Wed, 17 May 2023 22:45:13 +0200  ParsedGroupVersions     All legacy kube types successfully parsed
  NamespaceDeletionContentFailure              False   Wed, 17 May 2023 22:45:13 +0200  ContentDeleted          All content successfully deleted, may be waiting on finalization
  NamespaceContentRemaining                    False   Wed, 17 May 2023 22:45:13 +0200  ContentRemoved          All content successfully removed
  NamespaceFinalizersRemaining                 False   Wed, 17 May 2023 22:45:13 +0200  ContentHasNoFinalizers  All content-preserving finalizers finished

No resource quota.

No LimitRange resource.
kubectl get ns
E0518 12:24:30.427770    9025 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0518 12:24:30.438847    9025 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0518 12:24:30.470631    9025 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0518 12:24:30.492286    9025 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0518 12:24:30.514814    9025 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0518 12:24:30.536244    9025 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0518 12:24:30.557950    9025 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0518 12:24:30.580869    9025 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
NAME                  STATUS        AGE
calico-apiserver      Active        19h
cert-manager          Terminating   13h
certificate-manager   Active        13h
default               Active        19h
kube-node-lease       Active        19h
kube-public           Active        19h
kube-system           Active        19h

What is going wrong inside my Cluster, did I make any mistakes during the installation? I also postet this on the Slack channel but I got no response.

https://kubernetes.slack.com/archives/C2V9WJSJD/p1684355192603989

pomland-94 commented 1 year ago

Inside the calico-apiserver Namespace the calico-apiserver deployment didn't come up

kubectl -n calico-apiserver get deployments
E0519 17:59:31.057627   57664 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:31.061121   57664 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 17:59:31.093885   57664 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:31.120301   57664 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 17:59:31.146815   57664 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:31.172858   57664 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 17:59:31.199450   57664 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:31.225139   57664 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
calico-apiserver   0/1     0            0           17m
kubectl -n calico-apiserver describe deployments/calico-apiserver
E0519 17:59:48.703006   58498 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:48.729254   58498 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 17:59:48.759945   58498 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:48.786707   58498 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 17:59:48.813128   58498 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:48.840746   58498 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
Name:               calico-apiserver
Namespace:          calico-apiserver
CreationTimestamp:  Fri, 19 May 2023 17:41:47 +0200
Labels:             apiserver=true
                    k8s-app=calico-apiserver
Annotations:        deployment.kubernetes.io/revision: 1
Selector:           apiserver=true
Replicas:           1 desired | 0 updated | 0 total | 0 available | 1 unavailable
StrategyType:       Recreate
MinReadySeconds:    0
Pod Template:
  Labels:           apiserver=true
                    k8s-app=calico-apiserver
  Service Account:  calico-apiserver
  Containers:
   calico-apiserver:
    Image:      quay.io/calico/apiserver:v3.24.5
    Port:       <none>
    Host Port:  <none>
    Args:
      --secure-port=5443
    Liveness:   http-get https://:5443/version delay=90s timeout=1s period=10s #success=1 #failure=3
    Readiness:  exec [/code/filecheck] delay=5s timeout=1s period=10s #success=1 #failure=5
    Environment:
      DATASTORE_TYPE:  kubernetes
    Mounts:
      /code/apiserver.local.config/certificates from calico-apiserver-certs (rw)
  Volumes:
   calico-apiserver-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-apiserver-certs
    Optional:    false
Conditions:
  Type             Status  Reason
  ----             ------  ------
  Available        False   MinimumReplicasUnavailable
  ReplicaFailure   True    FailedCreate
  Progressing      False   ProgressDeadlineExceeded
OldReplicaSets:    <none>
NewReplicaSet:     calico-apiserver-7ff786649f (0/1 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  18m   deployment-controller  Scaled up replica set calico-apiserver-7ff786649f to 1

The Metrics Server also didn't come up.

kubectl -n kube-system get deployments
E0519 18:03:01.871549   59533 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:03:01.894591   59533 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:03:01.926151   59533 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:03:01.953288   59533 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:03:01.981432   59533 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:03:02.006916   59533 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:03:02.032445   59533 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:03:02.058194   59533 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
calico-kube-controllers   1/1     1            1           20m
calico-typha              1/1     1            1           21m
coredns                   2/2     2            2           20m
dns-autoscaler            1/1     1            1           20m
metrics-server            0/3     3            0           20m

I checked some Server CSR with Kubectl get csr

and there were lots of Pending Certificates, I approve all of them but nothing happend.

pomland-94 commented 1 year ago

When I look at the endpoints I'll see that metrics-server and project calico has no Availability.

kubectl get apiservices
E0519 18:10:32.151229   67689 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:10:32.175224   67689 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:10:32.197726   67689 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:10:32.219736   67689 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:10:32.241704   67689 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:10:32.264789   67689 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:10:32.286956   67689 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:10:32.308669   67689 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
NAME                                   SERVICE                       AVAILABLE                  AGE
v1.                                    Local                         True                       34m
v1.admissionregistration.k8s.io        Local                         True                       34m
v1.apiextensions.k8s.io                Local                         True                       34m
v1.apps                                Local                         True                       34m
v1.authentication.k8s.io               Local                         True                       34m
v1.authorization.k8s.io                Local                         True                       34m
v1.autoscaling                         Local                         True                       34m
v1.batch                               Local                         True                       34m
v1.certificates.k8s.io                 Local                         True                       34m
v1.coordination.k8s.io                 Local                         True                       34m
v1.crd.projectcalico.org               Local                         True                       25m
v1.discovery.k8s.io                    Local                         True                       34m
v1.events.k8s.io                       Local                         True                       34m
v1.networking.k8s.io                   Local                         True                       34m
v1.node.k8s.io                         Local                         True                       34m
v1.policy                              Local                         True                       34m
v1.rbac.authorization.k8s.io           Local                         True                       34m
v1.scheduling.k8s.io                   Local                         True                       34m
v1.storage.k8s.io                      Local                         True                       34m
v1beta1.flowcontrol.apiserver.k8s.io   Local                         True                       34m
v1beta1.metrics.k8s.io                 kube-system/metrics-server    False (MissingEndpoints)   27m
v1beta1.storage.k8s.io                 Local                         True                       34m
v1beta2.flowcontrol.apiserver.k8s.io   Local                         True                       34m
v2.autoscaling                         Local                         True                       34m
v2beta2.autoscaling                    Local                         True                       34m
v3.projectcalico.org                   calico-apiserver/calico-api   False (MissingEndpoints)   28m

When I describe the apiservices for example metrics Server there are errors in it.

kubectl get apiservices v1beta1.metrics.k8s.io -o yaml
E0519 18:12:39.867790   69941 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:12:39.891313   69941 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:12:39.922355   69941 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:12:39.948392   69941 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:12:39.974621   69941 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:12:40.001085   69941 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:12:40.029530   69941 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:12:40.057437   69941 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"apiregistration.k8s.io/v1","kind":"APIService","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile"},"name":"v1beta1.metrics.k8s.io"},"spec":{"group":"metrics.k8s.io","groupPriorityMinimum":100,"insecureSkipTLSVerify":true,"service":{"name":"metrics-server","namespace":"kube-system"},"version":"v1beta1","versionPriority":100}}
  creationTimestamp: "2023-05-19T15:43:01Z"
  labels:
    addonmanager.kubernetes.io/mode: Reconcile
  name: v1beta1.metrics.k8s.io
  resourceVersion: "2413"
  uid: e06c307c-6600-4994-9d65-0e8b1a1b9fa5
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
    port: 443
  version: v1beta1
  versionPriority: 100
status:
  conditions:
  - lastTransitionTime: "2023-05-19T15:43:01Z"
    message: endpoints for service/metrics-server in "kube-system" have no addresses
      with port name "https"
    reason: MissingEndpoints
    status: "False"
    type: Available

When I deploy my cluster without the hardening guide everything works well.

pomland-94 commented 1 year ago

This is my hardening config:

---

## kube-apiserver
authorization_modes: ['Node', 'RBAC']
# AppArmor-based OS
kube_apiserver_feature_gates: ['AppArmor=true']
kube_apiserver_request_timeout: 120s
kube_apiserver_service_account_lookup: true

# enable kubernetes audit
kubernetes_audit: true
audit_log_path: "/var/log/kube-apiserver-log.json"
audit_log_maxage: 30
audit_log_maxbackups: 10
audit_log_maxsize: 100
audit_policy_file: "{{ kube_config_dir }}/audit-policy/apiserver-audit-policy.yaml"

tls_min_version: VersionTLS12
tls_cipher_suites:
  - TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
  - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
  - TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305

# enable encryption at rest
kube_encrypt_secret_data: true
kube_encryption_resources: [secrets]
kube_encryption_algorithm: "secretbox"

kube_apiserver_enable_admission_plugins:
  - EventRateLimit
  - AlwaysPullImages
  - ServiceAccount
  - NamespaceLifecycle
  - NodeRestriction
  - LimitRanger
  - ResourceQuota
  - MutatingAdmissionWebhook
  - ValidatingAdmissionWebhook
  - PodNodeSelector
  - PodSecurity
kube_apiserver_admission_control_config_file: true
# EventRateLimit plugin configuration
kube_apiserver_admission_event_rate_limits:
  limit_1:
    type: Namespace
    qps: 50
    burst: 100
    cache_size: 2000
  limit_2:
    type: User
    qps: 50
    burst: 100
kube_profiling: false

## kube-controller-manager
# kube_controller_manager_bind_address: 127.0.0.1
kube_controller_terminated_pod_gc_threshold: 50
# AppArmor-based OS
kube_controller_feature_gates: ["RotateKubeletServerCertificate=true", "AppArmor=true"]

## kube-scheduler
# kube_scheduler_bind_address: 127.0.0.1
# AppArmor-based OS
kube_scheduler_feature_gates: ["AppArmor=true"]

## kubelet
kubelet_authorization_mode_webhook: true
kubelet_authentication_token_webhook: true
kube_read_only_port: 0
kubelet_rotate_server_certificates: true
kubelet_protect_kernel_defaults: true
kubelet_event_record_qps: 1
kubelet_rotate_certificates: true
kubelet_streaming_connection_idle_timeout: "5m"
kubelet_make_iptables_util_chains: true
kubelet_feature_gates: ["RotateKubeletServerCertificate=true", "SeccompDefault=true"]
kubelet_seccomp_default: true
kubelet_systemd_hardening: true
# In case you have multiple interfaces in your
# control plane nodes and you want to specify the right
# IP addresses, kubelet_secure_addresses allows you
# to specify the IP from which the kubelet
# will receive the packets.
kubelet_secure_addresses: "10.0.0.20 10.0.0.21 10.0.0.22"

# additional configurations
kube_owner: root
kube_cert_group: root

# create a default Pod Security Configuration and deny running of insecure pods
# kube_system namespace is exempted by default
kube_pod_security_use_default: true
kube_pod_security_default_enforce: restricted

# Custom Flags for Kubernetes Components

My System is: Debian GNU/Linux 11 (bullseye) Kernel: 5.10.0-21-arm64 containerd://1.6.15 Kubernetes: v1.25.6

pomland-94 commented 1 year ago

I think I found the Error, inside the hardening.yaml I commented the following lines:

kubelet_systemd_hardening: true
kubelet_secure_addresses: "10.0.0.20 10.0.0.21 10.0.0.22"

Now everything works now.

Then I have to manually approve some Certificates with the following command:

kubectl get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty kubectl certificate approve

batazor commented 1 year ago

@pomland-94 I met the same problem, in my case I changed the ENV variable BYPASS_DNS_RESOLUTION to true

kubelet_csr_approver_values:
  # Do not check DNS resolution in testing (not recommended in production)
  bypassDnsResolution: true

Perhaps we should change the hardening documentation or change providerIpPrefixes system host list or set custom providerRegex

EmyLIEUTAUD commented 1 year ago

Hi @batazor

Have you found a way to get around this problem without using the ENV variable BYPASS_DNS_RESOLUTION?

batazor commented 1 year ago

@EmyLIEUTAUD No, I haven't looked into it.

kukacz commented 1 year ago

If all kubelet certificate requests stay in pending state and the kubelet_rotate_server_certificates is enabled - which is by default in the hardening template - then most probably the kubelet-csr-approver did not come up. I suggest you'll check logs and events related to this application.

I have experienced similar problem. Turned out that kubelet-csr-approver requires network backend to be up, which I had intentionally disabled by setting kube_network_plugin: cni. The approver did not start and therefore could not approve kubelet certificates. Deployment timed out after an hour with no helpful message in Ansible logs. Only way to troubleshoot this is to connect to the cluster under deployment and check the Kubernetes resources.

arusa commented 12 months ago

In my case the kubelet-csr-approver is running fine, but all CSR requests get denied because the DNS name could not be resolved.

When checking an individual denied csr, I can see that the request was for the hostname of the cp node "cp-2", which obviously is not an FQDN.

So I assume when I change all my nodes and cp nodes hostnames to something that can be resolved, the problem should disappear.

VannTen commented 10 months ago

Closing as the questions seems to have been answered. /close Feel free to open a bug report / feature request if hardening is causing specific problems.

k8s-ci-robot commented 10 months ago

@VannTen: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/10102#issuecomment-1889531476): >Closing as the questions seems to have been answered. >/close >Feel free to open a bug report / feature request if hardening is causing specific problems. > > Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.