Closed queso closed 7 months ago
I can see during the install that it is adding namespaces:
Name: default
Labels: kubernetes.io/metadata.name=default
Annotations: <none>
Status: Active
No resource quota.
No LimitRange resource.
Name: kube-node-lease
Labels: kubernetes.io/metadata.name=kube-node-lease
Annotations: <none>
Status: Active
No resource quota.
No LimitRange resource.
Name: kube-public
Labels: kubernetes.io/metadata.name=kube-public
Annotations: <none>
Status: Active
No resource quota.
No LimitRange resource.
Name: kube-system
Labels: kubernetes.io/metadata.name=kube-system
Annotations: <none>
Status: Active
No resource quota.
No LimitRange resource.
Name: metallb-system
Labels: kubernetes.io/metadata.name=metallb-system
objectset.rio.cattle.io/hash=fc1016f2d449e33945c25d61c449a1c8b3278935
pod-security.kubernetes.io/audit=privileged
pod-security.kubernetes.io/enforce=privileged
pod-security.kubernetes.io/warn=privileged
Annotations: objectset.rio.cattle.io/applied:
H4sIAAAAAAAA/4yQzU7DMBCEXwXN2QmkSUtjiQNnJI7cN/a2NXHsyN6mqqq+O0oREiBRerTmx/PtCTS6N07ZxQCNqYJC74KFxisNnEcyDIWBhSwJQZ9AIUQhcTHk+Rm7dzaSWcrkYm...
objectset.rio.cattle.io/id:
objectset.rio.cattle.io/owner-gvk: k3s.cattle.io/v1, Kind=Addon
objectset.rio.cattle.io/owner-name: metallb-crds
objectset.rio.cattle.io/owner-namespace: kube-system
Status: Active
No resource quota.
No LimitRange resource.
I managed to turn on the k3s-init logs and see a lot of this:
Mar 13 16:09:24 valhalla1 k3s[4049]: time="2024-03-13T16:09:24-04:00" level=info msg="Reconciling ETCDSnapshotFile resources"
Mar 13 16:09:24 valhalla1 k3s[4049]: time="2024-03-13T16:09:24-04:00" level=info msg="Reconciliation of ETCDSnapshotFile resources complete"
Mar 13 16:09:24 valhalla1 k3s[4049]: time="2024-03-13T16:09:24-04:00" level=error msg="Failed to record snapshots for cluster: nodes \"valhalla1\" not found"
Mar 13 16:09:24 valhalla1 k3s[4049]: time="2024-03-13T16:09:24-04:00" level=info msg="Waiting for control-plane node valhalla1 startup: nodes \"valhalla1\" not found"
Mar 13 16:09:24 valhalla1 k3s[4049]: {"level":"info","ts":"2024-03-13T16:09:24.867777-0400","caller":"traceutil/trace.go:171","msg":"trace[1138961132] transaction","detail":"{read_only:false; response_revision:1396; number_of_response:1; }","duration":"107.321117ms","start":"2024-03-13T16:09:24.760415-0400","end":"2024-03-13T16:09:24.867736-0400","steps":["trace[1138961132] 'process raft request' (duration: 107.091305ms)"],"step_count":1}
Mar 13 16:09:25 valhalla1 k3s[4049]: W0313 16:09:25.131721 4049 handler_proxy.go:93] no RequestInfo found in the context
Mar 13 16:09:25 valhalla1 k3s[4049]: E0313 16:09:25.132333 4049 controller.go:113] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: Error, could not get list of group versions for APIService
Mar 13 16:09:25 valhalla1 k3s[4049]: I0313 16:09:25.132551 4049 controller.go:126] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
Mar 13 16:09:25 valhalla1 k3s[4049]: W0313 16:09:25.132385 4049 handler_proxy.go:93] no RequestInfo found in the context
Mar 13 16:09:25 valhalla1 k3s[4049]: E0313 16:09:25.133030 4049 controller.go:102] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
Mar 13 16:09:25 valhalla1 k3s[4049]: , Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
Mar 13 16:09:25 valhalla1 k3s[4049]: I0313 16:09:25.133559 4049 controller.go:109] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
So it looks like it is related to NFS and how containerd works. The overlay stuff wasn't working.
I had to install fuse: sudo apt-get install fuse-overlayfs
and then I added this to my extra server args:
--snapshotter=fuse-overlayfs
I have a PXE/NFS booted pi cluster, it looks like everything installs and it hangs up trying to check if the server nodes have joined.
Expected Behavior
Setup should finish
Current Behavior
Setup errors out:
Steps to Reproduce
ansible-playbook site.yml -i inventory/valhalla/hosts.ini
Context (variables)
Operating system:
Raspbian Bookworm
Hardware:
Rapsberry PI 4 8gb with POE hats
Variables Used
all.yml
Hosts
host.ini
Possible Solution
I did connect in and try to see what I could see with get nodes:
I also ran k3s check-config and that came back clean for the box