Closed neolit123 closed 1 week ago
@neolit123 It seems that we cannot run kubeadm join --dry-run
without an actual Kubernetes control-plane. :sweat_smile:
The join
phase will try to fetch the cluster-info
ConfigMap even though in dry-run mode.
I0412 10:01:39.066015 149 join.go:530] [preflight] Discovering cluster-info
I0412 10:01:39.067243 149 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "127.0.0.1:6443"
I0412 10:01:39.101380 149 round_trippers.go:553] GET https://127.0.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s in 10 milliseconds
I0412 10:01:39.104282 149 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://127.0.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 127.0.0.1:6443: connect: connection refused
I0412 10:01:45.026633 149 round_trippers.go:553] GET https://127.0.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s in 5 milliseconds
I0412 10:01:45.027445 149 token.go:217] [discovery] Failed to request cluster-info, will try again: Get "https://127.0.0.1:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 127.0.0.1:6443: connect: connection refused
Therefore, we need at least one worker
node to complete the dry-run tests.
Hm, i think we should fix that and use a fake CM or the dry run client...In dryrun other API calls work like that. But if using the dry run client it means we will probably will have to skip validation of the CM as well 🤔
If you prefer we can merge the initial test job without the join test but it seems we have to fix it in k/k eventually. . EDIT: Or maybe ... join does need a control plane and it will fail later even if we use fake cluster info?
Emm... It doesn't look like the kubeadm join --dry-run
command is using a fake client at all.
https://github.com/kubernetes/kubernetes/blob/7380fc735aca591325ae1fabf8dab194b40367de/cmd/kubeadm/app/cmd/join.go#L551-L563
...
Maybe it requires a major refactor to use a fake client for kubeadm join --dry-run
:(
i think the refactor is doable. we need to rename the "init" dry-run client to be a generic one and use it for "join" as well. it's probably not that much work, but i haven't looked at all the details.
we can merge the current PR, but keep this issue open until we can do that in k/k after code freeze for 1.24.
@SataQiu looks like the current e2e tests are flaky.
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-dryrun-latest
the error is:
preflight] Some fatal errors occurred: [ERROR CRI]: container runtime is not running: output: time="2022-04-20T18:56:52Z" level=fatal msg="connect: connect endpoint 'unix:///var/run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded" , error: exit status 1
my guess is that we are running kubeadm ...
inside the nodes before the container runtime has started. i can't remember if kinder's kubeadm-init
action has a "wait for CRI" of sorts, but likely it does. one option would to add e.g. sleep 10
before the first kubeadm init
in task-03-init-dryrun
.
EDIT: unclear if sleep
is in the node images, possibly yes.
alternatively this could be a weird bug where containerd in the nodes is simply refusing to start for some reason, and despite:
I0420 20:57:18.894578 107 initconfiguration.go:117] detected and using CRI socket: unix:///var/run/containerd/containerd.sock
@SataQiu looks like the current e2e tests are flaky.
https://testgrid.k8s.io/sig-cluster-lifecycle-kubeadm#kubeadm-kinder-dryrun-latest
the error is:
preflight] Some fatal errors occurred: [ERROR CRI]: container runtime is not running: output: time="2022-04-20T18:56:52Z" level=fatal msg="connect: connect endpoint 'unix:///var/run/containerd/containerd.sock', make sure you are running as root and the endpoint has been started: context deadline exceeded" , error: exit status 1
my guess is that we are running
kubeadm ...
inside the nodes before the container runtime has started. i can't remember if kinder'skubeadm-init
action has a "wait for CRI" of sorts, but likely it does. one option would to add e.g.sleep 10
before the firstkubeadm init
intask-03-init-dryrun
.EDIT: unclear if
sleep
is in the node images, possibly yes.alternatively this could be a weird bug where containerd in the nodes is simply refusing to start for some reason, and despite:
I0420 20:57:18.894578 107 initconfiguration.go:117] detected and using CRI socket: unix:///var/run/containerd/containerd.sock
It looks like the kubeadm-kinder-dryrun
job was deleted by https://github.com/kubernetes/test-infra/commit/c69405228b92b7037662b1f7e89db34350f5832f :sweat:
Oh looks like @RA489 's PR deleted it in the 1.24 updates and i didn't see that..
Can you please send it again.
Oh looks like @RA489 's PR deleted it in the 1.24 updates and i didn't see that..
Can you please send it again.
Sure!
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
Yeah, we can skip testing join in this PR but keep the related issue open. Once k/k code freeze is lifted we should make the change in k/k and add the join test. I don't think it's that complicated, but I haven't looked in more detail. On Apr 12, 2022 17:57, "SataQiu" @.***> wrote:
Emm... It doesn't look like the kubeadm join --dry-run command is using a fake client at all. https://github.com/kubernetes/kubernetes/blob/ 7380fc735aca591325ae1fabf8dab194b40367de/cmd/kubeadm/app/ cmd/join.go#L551-L563
Maybe it requires a major refactor to use a fake client for kubeadm join --dry-run :(
— Reply to this email directly, view it on GitHub https://github.com/kubernetes/kubeadm/issues/2653#issuecomment-1096840095, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACRATAKANSC2YZPXQAO733VEWFNLANCNFSM5N5WID2A . You are receiving this because you were mentioned.Message ID: @.***>
reopening to update e2e test after https://github.com/kubernetes/kubernetes/pull/126776
after this merges:
kubeadm is currently missing integration / e2e tests for
--dry-run
. this means if we happen to break our dry run support for a particular command (e.g.init
) we will not know about it until users report it to us.xref https://github.com/kubernetes/kubeadm/issues/2649
kubeadm has integration tests here: https://github.com/kubernetes/kubernetes/tree/master/cmd/kubeadm/test/cmd these tests execute a precompiled kubeadm binary to perform some checks and look for exist status 0.
we can use the same method for the
init
,join
,reset
tests with--dry-run
. because the dry-run will be reentrant.but we cannot use this method for
upgrade *
commands, because the--dry-run
for upgrade expects an existing cluster.to have everything in the same place we can add dry-run as part of kinder e2e test workflow: https://github.com/kubernetes/kubeadm/tree/main/kinder/ci/workflows
the workflow can look like the following:
kubeadm init --dry-run
on it (add--upload-certs
and other special flags, how to test external CA?)kubeadm join --dry-run
on it (add--control-plane
,--certificate-key
and other flags?)kubeadm reset --dry-run
on itkubeadm init ...
to create an actual k8s nodekubeadm upgrade apply --dry-run
to dry run the "primary node" upgrade of this nodekubeadm upgrade node --dry-run
to dry run the "secondary node" upgrade of this nodetasks: