outscale / cluster-api-provider-outscale

BSD 3-Clause "New" or "Revised" License
4 stars 10 forks source link

[Bug]: unable to create a cluster with 3 control plane replicas #383

Open pli01 opened 3 days ago

pli01 commented 3 days ago

What happened

Unable to add 3 control plane with all templates provided in templates or example directory

Only configuration with 1 ctrl plane are working

Step to reproduce

Choose any templates, or default https://github.com/outscale/cluster-api-provider-outscale/blob/main/templates/cluster-template.yaml Choose any images ubuntu-2204-2204-kubernetes-v1xxxx Add 3 replicas in control-plane section

Expected to happen

a cluster with 3 ctrl plane

Add anything

Second control plane failed to to join the cluster

...
[  388.990481] cloud-init[1077]: [2024-10-29 14:39:23] {"level":"warn","ts":"2024-10-29T14:39:23.610631Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0001f8e00/10.0.4.234:2379","attempt":0,"error":"rpc error: code = FailedPrecondition desc = etcdserver: can only promote a learner member which is in sync with leader"}
[  388.990591] cloud-init[1077]: [2024-10-29 14:39:25] [etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[  388.990708] cloud-init[1077]: [2024-10-29 14:39:25] The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
[  388.990820] cloud-init[1077]: [2024-10-29 14:39:25] [mark-control-plane] Marking the node ip-10-0-4-95 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[  388.990986] cloud-init[1077]: [2024-10-29 14:39:25] [mark-control-plane] Marking the node ip-10-0-4-95 as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[  388.991056] cloud-init[1077]: [2024-10-29 14:39:58] [kubelet-check] Initial timeout of 40s passed.
[  388.991170] cloud-init[1077]: [2024-10-29 14:41:25] error execution phase control-plane-join/mark-control-plane: error applying control-plane label and taints: nodes "ip-10-0-4-95" not found
[  388.991285] cloud-init[1077]: [2024-10-29 14:41:25] To see the stack trace of this error execute with --v=5 or higher
[  388.991403] cloud-init[1077]: [2024-10-29 14:41:25] 2024-10-29 14:41:25,857 - cc_scripts_user.py[WARNING]: Failed to run module scripts_user (scripts in /var/lib/cloud/instance/scripts)
[  388.991514] cloud-init[1077]: [2024-10-29 14:41:25] 2024-10-29 14:41:25,857 - util.py[WARNING]: Running module scripts_user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed

cluster-api output

# logs capi-controller-manager
I1029 14:36:39.251111       1 machine_controller_noderef.go:61] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/osc-c1-dev-control-plane-6dxf4" namespace="default" name="osc-c1-dev-control-plane-6dxf4" reconcileID="484c2cb6-c494-4601-9107
-1c965c79ee2a" KubeadmControlPlane="default/osc-c1-dev-control-plane" Cluster="default/osc-c1-dev" OscMachine="default/osc-c1-dev-control-plane-6dxf4"
I1029 14:44:32.319060       1 machine_controller_phases.go:306] "Waiting for infrastructure provider to create machine infrastructure and report status.ready" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/osc-c1-dev-control-plane-6dxf4" namespace="default" name="osc-c1-dev-control-plane-6dxf4" recon
cileID="cd3f6a37-ecb7-4354-bdcb-a64c5d0b8cb4" KubeadmControlPlane="default/osc-c1-dev-control-plane" Cluster="default/osc-c1-dev" OscMachine="default/osc-c1-dev-control-plane-6
dxf4"
I1029 14:44:32.319170       1 machine_controller_noderef.go:61] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="default/osc-c1-dev-control-plane-6dxf4" namespace="default" name="osc-c1-dev-control-plane-6dxf4" reconcileID="cd3f6a37-ecb7-4354-bdcb
-a64c5d0b8cb4" KubeadmControlPlane="default/osc-c1-dev-control-plane" Cluster="default/osc-c1-dev" OscMachine="default/osc-c1-dev-control-plane-6dxf4"

Environment

- Kubernetes version: (use `kubectl version`): 
- OS (e.g. from `/etc/os-release`):
- Kernel (e.g. `uname -a`): ubuntu
- cluster-api-provider-outscale version: v0.3.1
- cluster-api version: v1.8.4
- Install tools:
- Kubernetes Distribution:
- Kubernetes Distribution version:
pierreozoux commented 2 days ago

I work with @pli01 and I came to same conclusion.

I think it is linked to this issue: https://github.com/outscale/cluster-api-provider-outscale/issues/380

With my tests, when I add a public IP to the first node and/or the second node, at some point, it starts to work. I didn't maange to find the failing curl :/

@outscale-hmi, I'd love to pair program with you to debug this :)