Closed rkage closed 3 years ago
My run on a 3 node 1x2 cluster failed at joining the two nodes to the single control plane node:
fatal: [k-test02]: FAILED! => changed=true
cmd:
- kubeadm
- join
- --config
- /etc/kubernetes/kubeadm-join.yaml
delta: '0:05:06.400650'
end: '2021-02-12 01:10:44.492383'
msg: non-zero return code
rc: 1
start: '2021-02-12 01:05:38.091733'
stderr: |2-
[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: couldn't validate the identity of the API Server: Get "https://192.168.91.240:8443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
To see the stack trace of this error execute with --v=5 or higher
stderr_lines: <omitted>
stdout: '[preflight] Running pre-flight checks'
stdout_lines: <omitted>
I'm investigating
My run on a 3 node 1x2 cluster failed at joining the two nodes to the single control plane node:
fatal: [k-test02]: FAILED! => changed=true cmd: - kubeadm - join - --config - /etc/kubernetes/kubeadm-join.yaml delta: '0:05:06.400650' end: '2021-02-12 01:10:44.492383' msg: non-zero return code rc: 1 start: '2021-02-12 01:05:38.091733' stderr: |2- [WARNING SystemVerification]: missing optional cgroups: hugetlb error execution phase preflight: couldn't validate the identity of the API Server: Get "https://192.168.91.240:8443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) To see the stack trace of this error execute with --v=5 or higher stderr_lines: <omitted> stdout: '[preflight] Running pre-flight checks' stdout_lines: <omitted>
I'm investigating
False alarm. PEBCAK
possibly needs local ansible to install netaddr
via pip/3 -- at least on macbooks
there are some weird issues with /etc/cni/*
not getting removed, so clearing a nuke
doesn't necessarily undo changes.
after running all.yml
, i was stuck with a dead cluster. nuke.yml
followed by rm -rf /etc/cni/*
allowed me to deploy successfully.
i'm not sure why, but the remove cni net.d folder
task isn't running?
I know I kind of shoehorned a few fixes in here, but it looks like everything passes from all.yml
to nuke.yml
-- at least as far as 1x2 config with calico
goes; the bare minimum is working.
Description
This PR refactors the cluster init and join role.