Open CecileRobertMichon opened 2 years ago
/assign
I tried this with main
and make tilt-up
+ the machinepool flavor from Tilt, and it behaved correctly within a few minutes:
% k get azuremachinepool
NAME REPLICAS READY STATE
machinepool-27094-mp-0 2 true Succeeded
% k get azuremachinepoolmachines
NAME VERSION READY STATE
machinepool-27094-mp-0-0 v1.23.9 true Succeeded
machinepool-27094-mp-0-1 v1.23.9 true Succeeded
I'll try again specifically with v1.5.1 and the quickstart route.
I can repro by following the quick start:
% clusterctl init --infrastructure azure
Fetching providers
Installing cert-manager Version="v1.9.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v1.2.3" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v1.2.3" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v1.2.3" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-azure" Version="v1.5.2" TargetNamespace="capz-system"
...
% k get azuremachinepool
NAME REPLICAS READY STATE
test-mp-mp-0 Updating
% k get azuremchinepoolmachines
NAME VERSION READY STATE
test-mp-mp-0-0 v1.24.5 Succeeded
Edit: I think this failed because I hadn't followed through with installing Calico CNI to the workload cluster. In further testing, that seems to be they key.
have you tried with tilt + v1.5.1 tag? Just to know if this is a tilt vs. clusterctl or v1.5.1 vs main branch difference
Machinepool works just fine using make tilt-up
in CAPZ with the v1.5.1 tag. Seems to be a clusterctl- or Quick Start-related issue, rather than a change in our code.
The template generated by clusterctl generate cluster test-mp --flavor machinepool
is basically identical to that generated by clicking the "machinepool" link in CAPZ Tilt. I just wanted to rule that out as a difference. I'll use the "known working" cluster template for further testing regardless.
I'm seeing this behavior (AzureMachinePoolMachines come up but the AzureMachinePool stays stuck at "updating") if I don't install Calico as recommended for Azure in the Quick Start. Once I install the manifest and Calico starts running, both AMP resource types soon move to READY=true
and STATE=Succeeded
.
Maybe there's a more informative status we could apply to an AMP in this case?
this is my experience too, without a working CNI the nodes never become ready and so the AMP get stuck
@mboersma - will this be fixed or is fixed with any of your PRs? People shouldn't have to install calico to make it work (i.e. versus Azure CNI) and if we require a CNI provider (even if not Calico), we definitely should document this.
/milestone v1.8
@CecileRobertMichon: The provided milestone is not valid for this repository. Milestones in this repository: [next
, v1.9
]
Use /milestone clear
to clear the milestone.
/milestone v1.9
/milestone v1.11
/milestone next
@willie-yao: You must be a member of the kubernetes-sigs/cluster-api-provider-azure-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Cluster API Provider Azure Maintainers and have them propose you as an additional delegate for this responsibility.
/unassign /milestone next
I haven't made any progress on this unfortunately and I'm not likely to for this release cycle.
/milestone next
@Jont828: You must be a member of the kubernetes-sigs/cluster-api-provider-azure-maintainers GitHub team to set the milestone. If you believe you should be able to issue the /milestone command, please contact your Cluster API Provider Azure Maintainers and have them propose you as an additional delegate for this responsibility.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
/remove-lifecycle rotten
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
/remove-lifecycle stale
/kind bug
[Before submitting an issue, have you checked the Troubleshooting Guide?]
What steps did you take and what happened: [A clear and concise description of what the bug is.]
Create a cluster with "machinepool" flavor following quickstart instructions:
export WORKER_MACHINE_COUNT=1
clusterctl generate cluster test-mp --flavor machinepool | kubectl apply -f -
Notice that the VMSS becomes ready, and the MachinePoolMachines are in Succeeded state but the AzureMachinePool staus stuck in Updating:
This repros with v1.5.1.
What's interesting is that this is seemingly not reproducing on our e2e tests which are testing release-1.5: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-provider-azure#capz-periodic-e2e-full-v1beta1 (double checked that the test waits for the MachinePool ready replicas to be
==
to the spec replicas, which would timeout above).What did you expect to happen:
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):