Closed rgarcia closed 7 months ago
can you please show the output of kubectl get pods -A
of the workload cluster?
Hi @batistein here is the output, really appreciate your help!
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-5dd5756b68-hr545 0/1 Pending 0 5m27s
kube-system coredns-5dd5756b68-m8cgs 0/1 Pending 0 5m27s
kube-system etcd-management-cluster-control-plane-gxw2s 1/1 Running 0 5m27s
kube-system kube-apiserver-management-cluster-control-plane-gxw2s 1/1 Running 0 5m27s
kube-system kube-controller-manager-management-cluster-control-plane-gxw2s 1/1 Running 0 5m27s
kube-system kube-proxy-62nkw 1/1 Running 0 4m13s
kube-system kube-proxy-74chj 1/1 Running 0 5m27s
kube-system kube-proxy-gs79r 1/1 Running 0 4m14s
kube-system kube-proxy-l474v 1/1 Running 0 4m14s
kube-system kube-scheduler-management-cluster-control-plane-gxw2s 1/1 Running 0 5m27s
And describe pod on one of the coredns pods that isn't coming up:
kubectl describe pod coredns-5dd5756b68-hr545 --namespace kube-system
Name: coredns-5dd5756b68-hr545
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: coredns
Node: <none>
Labels: k8s-app=kube-dns
pod-template-hash=5dd5756b68
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/coredns-5dd5756b68
Containers:
coredns:
Image: registry.k8s.io/coredns/coredns:v1.10.1
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-nt6sv (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-nt6sv:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 45s
node.kubernetes.io/unreachable:NoExecute op=Exists for 45s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 34m default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
Warning FailedScheduling 8m37s (x5 over 29m) default-scheduler 0/4 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 3 node(s) had untolerated taint {node.cluster.x-k8s.io/uninitialized: }. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..
Warning FailedScheduling 6m59s default-scheduler 0/7 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 6 node(s) had untolerated taint {node.cluster.x-k8s.io/uninitialized: }. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling..
Warning FailedScheduling 119s default-scheduler 0/9 nodes are available: 1 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 8 node(s) had untolerated taint {node.cluster.x-k8s.io/uninitialized: }. preemption: 0/9 nodes are available: 9 Preemption is not helpful for scheduling..
@rgarcia Have you installed a CNI (for example cilium) and the ccm?
Can you please post the output of this command twice (from mgt-cluster and from wl-cluster)
go run github.com/guettli/check-conditions@latest all
@rgarcia from the list of your pods running in the workload cluster. It's clear that ccm and cni are missing. Coredns cannot run without a cni.
🤦 should have kept reading through the docs. Sorry for the noise. After installing cilium and ccm everything looks good. Thank you!
/kind bug
What steps did you take and what happened:
kind create cluster
Created a new project in hetzner "caph"
Generated an SSH key (
id_ed25519{,.pub}
) and added them to the projectGenerated an API token
caph-api-token
with read and write accessGenerated a webservice user password in Hetzner Robot
Set up env:
Added HCLOUD_TOKEN etc. as secrets per the docs
Generated the cluster yaml:
Applied:
kubectl apply -f $CLUSTER_NAME.yaml
What did you expect to happen: Cluster to start up! The control plane comes online but then the worker nodes fail to register themselves:
caph-controller-manager logs:
I ssh'd onto one of the worker nodes and here are the kubelet logs:
Anything else you would like to add: Thank you for any assistance you can provide!
Environment:
kubectl version
) v1.29.1/etc/os-release
): mac os