kubermatic / kubeone

Kubermatic KubeOne automate cluster operations on all your cloud, on-prem, edge, and IoT environments.
https://kubeone.io
Apache License 2.0
1.37k stars 234 forks source link

Adding a new static worker node results in a preflight check failure on existing nodes #2802

Open xmudrii opened 1 year ago

xmudrii commented 1 year ago

What happened?

Trying to add a new static worker node results in the following error:

+ sudo kubeadm init phase preflight --config=./kubeone/cfg/master_0.yaml
W0613 19:21:47.950292   27890 initconfiguration.go:331] [config] WARNING: Ignored YAML document with GroupVersionKind kubeadm.k8s.io/v1beta3, Kind=JoinConfiguration
W0613 19:21:47.958412   27890 initconfiguration.go:119] Usage of CRI endpoints without URL scheme is deprecated and can cause kubelet errors in the future. Automatically prepending scheme "unix" to the "criSocket" with value "/run/containerd/containerd.sock". Please update your configuration!
W0613 19:21:47.958515   27890 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.96.0.10]; the provided value is: [169.254.20.10]
    [WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR Port-6443]: Port 6443 is in use
    [ERROR Port-10259]: Port 10259 is in use
    [ERROR Port-10257]: Port 10257 is in use
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
    [ERROR Port-10250]: Port 10250 is in use
    [ERROR Port-2379]: Port 2379 is in use
    [ERROR Port-2380]: Port 2380 is in use
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

What happens is that joining a new static worker nodes triggers the WithFullInstall workflow that's used to provision the cluster from scratch as well. There we run preflight checks with kubeadm on each node to verify that VMs satisfy requirements to be a Kubernetes node.

That works the first time we provision the cluster, but subsequent runs (e.g. when adding a new static node) are failing on existing nodes because the cluster is provisioned, so files are already created and ports are taken by Kubernetes components.

Expected behavior

How to reproduce the issue?

What KubeOne version are you using?

```console ```

Provide your KubeOneCluster manifest here (if applicable)

```yaml { "kubeone": { "major": "1", "minor": "6", "gitVersion": "v1.6.0-rc.2-36-g0536063a", "gitCommit": "0536063ab064601ba217c2abd41abd4c80a02477", "gitTreeState": "", "buildDate": "2023-06-13T21:16:41+02:00", "goVersion": "go1.20.4", "compiler": "gc", "platform": "darwin/arm64" }, "machine_controller": { "major": "", "minor": "", "gitVersion": "8e5884837711fb0fc6b568d734f09a7b809fc28e", "gitCommit": "", "gitTreeState": "", "buildDate": "", "goVersion": "", "compiler": "", "platform": "linux/amd64" } } ```

What cloud provider are you running on?

Baremetal

What operating system are you running in your cluster?

Ubuntu 20.04.6

Additional information

We can mitigate this issue by ignoring those failures, in some cases, those failures can be real issues that's going to prevent cluster from being provisioned.

kubermatic-bot commented 1 year ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

xmudrii commented 1 year ago

/remove-lifecycle stale

kubermatic-bot commented 10 months ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

xmudrii commented 10 months ago

/remove-lifecycle stale

kubermatic-bot commented 6 months ago

Issues go stale after 90d of inactivity. After a furter 30 days, they will turn rotten. Mark the issue as fresh with /remove-lifecycle stale.

If this issue is safe to close now please do so with /close.

/lifecycle stale

xmudrii commented 6 months ago

/remove-lifecycle stale

xmudrii commented 1 month ago

I'm not working on this at the moment. /unassign