kubernetes-sigs / cloud-provider-azure

Cloud provider for Azure
https://cloud-provider-azure.sigs.k8s.io/
Apache License 2.0
259 stars 272 forks source link

Crash on enforcement failure of clustercidr on existing nodes #6670

Closed vpatelsj closed 1 month ago

vpatelsj commented 1 month ago

What happened:

When the cloud-controller manager detects a mismatch between the clustercidr provided to it via argument and node.spec.podcidrs of existing nodes, it assumes this is a fatal condition and halts its execution and crash. This mismatch is not a fatal condition, controller manager should tolerate this mismatch and keep trying to reconcile instead of crashing. The side-effect of crashing is that when a new node joins the cluster, its not able to get the new podcidr from controller manager because its been crashing.

What you expected to happen:

If controller-manager tolerates this mismatch, we can have old nodes running old podcidr and new nodes running new podcidr and that's a perfectly ok condition to be in. In fact when a user wants to live migrate a running cluster from one podcidr to another, it will run thru this exact scenario and get blocked if this issue is not fixed.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment: