Closed saurabrana closed 2 months ago
We are facing a similar issue. Did you find any solution?
Hi @saurabrana, have you tried with a newer version of CAPZ? could you please share repro steps (what do your cluster, machines look like)
@hermesimi the VM extension failing just means that k8s node join failed, it could be for a variety of reasons. Are you also seeing this on windows? what k8s version? what CAPZ version?
Hey there @CecileRobertMichon. We did not test windows. Updated all versions last week just to make sure.
NAME NAMESPACE TYPE CURRENT VERSION NEXT VERSION
bootstrap-kubeadm capi-kubeadm-bootstrap-system BootstrapProvider v1.5.2 Already up to date
control-plane-kubeadm capi-kubeadm-control-plane-system ControlPlaneProvider v1.5.2 Already up to date
cluster-api capi-system CoreProvider v1.5.2 Already up to date
infrastructure-azure capz-system InfrastructureProvider v1.11.1 Already up to date
Same thing happened again today. vmss serial logs showed
[[0;1;31mFAILED[0m] Failed to start [0;1;39mExecute cloud user/final scripts[0m.
[ 338.420057] cloud-init[1554]: [2023-09-26 17:03:23] Cloud-init v. 23.2.2-0ubuntu0~22.04.1 running 'modules:final' at Tue, 26 Sep 2023 17:03:23 +0000. Up 33.74 seconds.
See 'systemctl status cloud-final.service' for details.
[ 338.420195] cloud-init[1554]: [2023-09-26 17:03:25] [preflight] Running pre-flight checks
[[0;32m OK [0m] Reached target [0;1;39mCloud-init target[0m.
[ 338.420593] cloud-init[1554]: [2023-09-26 17:08:27] error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "e9xgm3"
[ 338.421017] cloud-init[1554]: [2023-09-26 17:08:27] To see the stack trace of this error execute with --v=5 or higher
[ 338.421413] cloud-init[1554]: [2023-09-26 17:08:27] 2023-09-26 17:08:27,867 - cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
[ 338.421840] cloud-init[1554]: [2023-09-26 17:08:27] 2023-09-26 17:08:27,867 - util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_scripts_user.py'>) failed
[ 338.422256] cloud-init[1554]: [2023-09-26 17:08:27] Cloud-init v. 23.2.2-0ubuntu0~22.04.1 finished at Tue, 26 Sep 2023 17:08:27 +0000. Datasource DataSourceAzure [seed=/dev/sr0]. Up 338.38 seconds
2023-09-26T17:18:15.965353Z INFO Daemon Agent WALinuxAgent-2.9.1.1 launched with command 'python3 -u bin/WALinuxAgent-2.9.1.1-py3.8.egg -run-exthandlers' is successfully running
Any pointers?
Are those VMs getting created as part of the original cluster creation or is this a scaling event outside of CAPZ by any chance (autoscaler, Azure portal, etc)?
Can you please share repro steps?
error execution phase preflight: couldn't validate the identity of the API Server: could not find a JWS signature in the cluster-info ConfigMap for token ID "e9xgm3"
seems like the bootstrap token isn't valid
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
/kind bug
[yes, we have you checked the Troubleshooting Guide?]
What steps did you take and what happened: [We are trying to launch an Azure VM, but eventually, it fails to come into service due to failure in the CAPz extensions .]
What did you expect to happen: VM should be launched successfully and able to connect to the cluster always.
Anything else you would like to add: cloudbase-init.log Logs.zip
Environment: