Closed pomland-94 closed 10 months ago
Inside the calico-apiserver Namespace the calico-apiserver deployment didn't come up
kubectl -n calico-apiserver get deployments
E0519 17:59:31.057627 57664 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:31.061121 57664 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 17:59:31.093885 57664 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:31.120301 57664 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 17:59:31.146815 57664 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:31.172858 57664 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 17:59:31.199450 57664 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:31.225139 57664 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
NAME READY UP-TO-DATE AVAILABLE AGE
calico-apiserver 0/1 0 0 17m
kubectl -n calico-apiserver describe deployments/calico-apiserver
E0519 17:59:48.703006 58498 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:48.729254 58498 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 17:59:48.759945 58498 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:48.786707 58498 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 17:59:48.813128 58498 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 17:59:48.840746 58498 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
Name: calico-apiserver
Namespace: calico-apiserver
CreationTimestamp: Fri, 19 May 2023 17:41:47 +0200
Labels: apiserver=true
k8s-app=calico-apiserver
Annotations: deployment.kubernetes.io/revision: 1
Selector: apiserver=true
Replicas: 1 desired | 0 updated | 0 total | 0 available | 1 unavailable
StrategyType: Recreate
MinReadySeconds: 0
Pod Template:
Labels: apiserver=true
k8s-app=calico-apiserver
Service Account: calico-apiserver
Containers:
calico-apiserver:
Image: quay.io/calico/apiserver:v3.24.5
Port: <none>
Host Port: <none>
Args:
--secure-port=5443
Liveness: http-get https://:5443/version delay=90s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/code/filecheck] delay=5s timeout=1s period=10s #success=1 #failure=5
Environment:
DATASTORE_TYPE: kubernetes
Mounts:
/code/apiserver.local.config/certificates from calico-apiserver-certs (rw)
Volumes:
calico-apiserver-certs:
Type: Secret (a volume populated by a Secret)
SecretName: calico-apiserver-certs
Optional: false
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
ReplicaFailure True FailedCreate
Progressing False ProgressDeadlineExceeded
OldReplicaSets: <none>
NewReplicaSet: calico-apiserver-7ff786649f (0/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 18m deployment-controller Scaled up replica set calico-apiserver-7ff786649f to 1
The Metrics Server also didn't come up.
kubectl -n kube-system get deployments
E0519 18:03:01.871549 59533 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:03:01.894591 59533 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:03:01.926151 59533 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:03:01.953288 59533 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:03:01.981432 59533 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:03:02.006916 59533 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:03:02.032445 59533 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:03:02.058194 59533 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
NAME READY UP-TO-DATE AVAILABLE AGE
calico-kube-controllers 1/1 1 1 20m
calico-typha 1/1 1 1 21m
coredns 2/2 2 2 20m
dns-autoscaler 1/1 1 1 20m
metrics-server 0/3 3 0 20m
I checked some Server CSR with
Kubectl get csr
and there were lots of Pending Certificates, I approve all of them but nothing happend.
When I look at the endpoints I'll see that metrics-server and project calico has no Availability.
kubectl get apiservices
E0519 18:10:32.151229 67689 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:10:32.175224 67689 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:10:32.197726 67689 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:10:32.219736 67689 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:10:32.241704 67689 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:10:32.264789 67689 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:10:32.286956 67689 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:10:32.308669 67689 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
NAME SERVICE AVAILABLE AGE
v1. Local True 34m
v1.admissionregistration.k8s.io Local True 34m
v1.apiextensions.k8s.io Local True 34m
v1.apps Local True 34m
v1.authentication.k8s.io Local True 34m
v1.authorization.k8s.io Local True 34m
v1.autoscaling Local True 34m
v1.batch Local True 34m
v1.certificates.k8s.io Local True 34m
v1.coordination.k8s.io Local True 34m
v1.crd.projectcalico.org Local True 25m
v1.discovery.k8s.io Local True 34m
v1.events.k8s.io Local True 34m
v1.networking.k8s.io Local True 34m
v1.node.k8s.io Local True 34m
v1.policy Local True 34m
v1.rbac.authorization.k8s.io Local True 34m
v1.scheduling.k8s.io Local True 34m
v1.storage.k8s.io Local True 34m
v1beta1.flowcontrol.apiserver.k8s.io Local True 34m
v1beta1.metrics.k8s.io kube-system/metrics-server False (MissingEndpoints) 27m
v1beta1.storage.k8s.io Local True 34m
v1beta2.flowcontrol.apiserver.k8s.io Local True 34m
v2.autoscaling Local True 34m
v2beta2.autoscaling Local True 34m
v3.projectcalico.org calico-apiserver/calico-api False (MissingEndpoints) 28m
When I describe the apiservices for example metrics Server there are errors in it.
kubectl get apiservices v1beta1.metrics.k8s.io -o yaml
E0519 18:12:39.867790 69941 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:12:39.891313 69941 memcache.go:287] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:12:39.922355 69941 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:12:39.948392 69941 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:12:39.974621 69941 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:12:40.001085 69941 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
E0519 18:12:40.029530 69941 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
E0519 18:12:40.057437 69941 memcache.go:121] couldn't get resource list for projectcalico.org/v3: the server is currently unable to handle the request
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"apiregistration.k8s.io/v1","kind":"APIService","metadata":{"annotations":{},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile"},"name":"v1beta1.metrics.k8s.io"},"spec":{"group":"metrics.k8s.io","groupPriorityMinimum":100,"insecureSkipTLSVerify":true,"service":{"name":"metrics-server","namespace":"kube-system"},"version":"v1beta1","versionPriority":100}}
creationTimestamp: "2023-05-19T15:43:01Z"
labels:
addonmanager.kubernetes.io/mode: Reconcile
name: v1beta1.metrics.k8s.io
resourceVersion: "2413"
uid: e06c307c-6600-4994-9d65-0e8b1a1b9fa5
spec:
group: metrics.k8s.io
groupPriorityMinimum: 100
insecureSkipTLSVerify: true
service:
name: metrics-server
namespace: kube-system
port: 443
version: v1beta1
versionPriority: 100
status:
conditions:
- lastTransitionTime: "2023-05-19T15:43:01Z"
message: endpoints for service/metrics-server in "kube-system" have no addresses
with port name "https"
reason: MissingEndpoints
status: "False"
type: Available
When I deploy my cluster without the hardening guide everything works well.
This is my hardening config:
---
## kube-apiserver
authorization_modes: ['Node', 'RBAC']
# AppArmor-based OS
kube_apiserver_feature_gates: ['AppArmor=true']
kube_apiserver_request_timeout: 120s
kube_apiserver_service_account_lookup: true
# enable kubernetes audit
kubernetes_audit: true
audit_log_path: "/var/log/kube-apiserver-log.json"
audit_log_maxage: 30
audit_log_maxbackups: 10
audit_log_maxsize: 100
audit_policy_file: "{{ kube_config_dir }}/audit-policy/apiserver-audit-policy.yaml"
tls_min_version: VersionTLS12
tls_cipher_suites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305
# enable encryption at rest
kube_encrypt_secret_data: true
kube_encryption_resources: [secrets]
kube_encryption_algorithm: "secretbox"
kube_apiserver_enable_admission_plugins:
- EventRateLimit
- AlwaysPullImages
- ServiceAccount
- NamespaceLifecycle
- NodeRestriction
- LimitRanger
- ResourceQuota
- MutatingAdmissionWebhook
- ValidatingAdmissionWebhook
- PodNodeSelector
- PodSecurity
kube_apiserver_admission_control_config_file: true
# EventRateLimit plugin configuration
kube_apiserver_admission_event_rate_limits:
limit_1:
type: Namespace
qps: 50
burst: 100
cache_size: 2000
limit_2:
type: User
qps: 50
burst: 100
kube_profiling: false
## kube-controller-manager
# kube_controller_manager_bind_address: 127.0.0.1
kube_controller_terminated_pod_gc_threshold: 50
# AppArmor-based OS
kube_controller_feature_gates: ["RotateKubeletServerCertificate=true", "AppArmor=true"]
## kube-scheduler
# kube_scheduler_bind_address: 127.0.0.1
# AppArmor-based OS
kube_scheduler_feature_gates: ["AppArmor=true"]
## kubelet
kubelet_authorization_mode_webhook: true
kubelet_authentication_token_webhook: true
kube_read_only_port: 0
kubelet_rotate_server_certificates: true
kubelet_protect_kernel_defaults: true
kubelet_event_record_qps: 1
kubelet_rotate_certificates: true
kubelet_streaming_connection_idle_timeout: "5m"
kubelet_make_iptables_util_chains: true
kubelet_feature_gates: ["RotateKubeletServerCertificate=true", "SeccompDefault=true"]
kubelet_seccomp_default: true
kubelet_systemd_hardening: true
# In case you have multiple interfaces in your
# control plane nodes and you want to specify the right
# IP addresses, kubelet_secure_addresses allows you
# to specify the IP from which the kubelet
# will receive the packets.
kubelet_secure_addresses: "10.0.0.20 10.0.0.21 10.0.0.22"
# additional configurations
kube_owner: root
kube_cert_group: root
# create a default Pod Security Configuration and deny running of insecure pods
# kube_system namespace is exempted by default
kube_pod_security_use_default: true
kube_pod_security_default_enforce: restricted
# Custom Flags for Kubernetes Components
My System is:
Debian GNU/Linux 11 (bullseye)
Kernel: 5.10.0-21-arm64
containerd://1.6.15
Kubernetes: v1.25.6
I think I found the Error, inside the hardening.yaml I commented the following lines:
kubelet_systemd_hardening: true
kubelet_secure_addresses: "10.0.0.20 10.0.0.21 10.0.0.22"
Now everything works now.
Then I have to manually approve some Certificates with the following command:
kubectl get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs --no-run-if-empty kubectl certificate approve
@pomland-94 I met the same problem, in my case I changed the ENV variable BYPASS_DNS_RESOLUTION
to true
kubelet_csr_approver_values:
# Do not check DNS resolution in testing (not recommended in production)
bypassDnsResolution: true
Perhaps we should change the hardening documentation or change providerIpPrefixes system host list or set custom providerRegex
Hi @batazor
Have you found a way to get around this problem without using the ENV variable BYPASS_DNS_RESOLUTION
?
@EmyLIEUTAUD No, I haven't looked into it.
If all kubelet certificate requests stay in pending state and the kubelet_rotate_server_certificates
is enabled - which is by default in the hardening template - then most probably the kubelet-csr-approver did not come up. I suggest you'll check logs and events related to this application.
I have experienced similar problem. Turned out that kubelet-csr-approver requires network backend to be up, which I had intentionally disabled by setting kube_network_plugin: cni
. The approver did not start and therefore could not approve kubelet certificates. Deployment timed out after an hour with no helpful message in Ansible logs. Only way to troubleshoot this is to connect to the cluster under deployment and check the Kubernetes resources.
In my case the kubelet-csr-approver is running fine, but all CSR requests get denied because the DNS name could not be resolved.
When checking an individual denied csr, I can see that the request was for the hostname of the cp node "cp-2", which obviously is not an FQDN.
So I assume when I change all my nodes and cp nodes hostnames to something that can be resolved, the problem should disappear.
Closing as the questions seems to have been answered. /close Feel free to open a bug report / feature request if hardening is causing specific problems.
@VannTen: Closing this issue.
Hey, I installed my Cluster with Kubespray and use this to hardening my Cluster. https://kubespray.io/#/docs/hardening and this one https://github.com/kubernetes-sigs/kubespray/blob/master/docs/cgroups.md
But every time I try to setup workloads I get lots of errors inside my Cluster, for example with the Bitnami Chart for cert-manager:
I get the following errors:
I also get errors with the Metrics Server:
or when I try to list Namespaces or delete Resources:
What is going wrong inside my Cluster, did I make any mistakes during the installation? I also postet this on the Slack channel but I got no response.
https://kubernetes.slack.com/archives/C2V9WJSJD/p1684355192603989