Closed supershal closed 1 month ago
we were able to downgrade the cloud-init to 23.2.1-0ubuntu0~20.04.2
and create cluster successfully. https://github.com/mesosphere/konvoy-image-builder/pull/938
cc: @voor @cnmcavoy
We are still not sure of the root cause and change in cloud-init that resulted in this issue.
I was able to provide following override file to the image-builder and build AMI that can run CAPA cloud-init script successfully. pin-cloud-init-override.json :
{
"ansible_extra_vars": "pinned_debs=\"cloud-init=23.1.2-0ubuntu0~20.04.2\""
}
I built the image using following makefile target of image-builder
make build-ami-ubuntu-2004 PACKER_VAR_FILES=pin-cloud-init-override.json
We will have to now investigate what changes in 23.3.1-0ubuntu1~20.04.1
broke the CAPA cloud-init script.
Moving over some comments from slack so they're not lost in the sands of time:
- name: Downgrade cloud init.
apt:
deb: http://launchpadlibrarian.net/679992659/cloud-init_23.2.2-0ubuntu0~20.04.1_all.deb
state: present
force: true
- name: Pin cloud init to prevent version issues.
dpkg_selections:
name: "{{ item }}"
selection: hold
loop:
- cloud-init
For image-builder users who have hit this bug and are reading this issue:
We believe the root cause to be in cloud-init, and would like to fix it there (see https://github.com/canonical/cloud-init/issues/4572). We prefer to do this to the alternative, which is to "pin" an older, known-good cloud-init version in image-builder itself.
For now, if you use image-builder to create an Ubuntu 20.04 AMI, please use the workaround described in https://github.com/kubernetes-sigs/image-builder/issues/1333#issuecomment-1782093042.
This might be related to https://github.com/kubernetes-sigs/image-builder/pull/406 which historically caused issues with CAPA.
@supershal and I found that the feature override mechanism used in #406 does not work in the recent versions of cloud-init in Ubuntu 20.04. This mechanism was removed from cloud-init in https://github.com/canonical/cloud-init/pull/4228.
Patching cloud-init is the officially documented mechanism now:
Currently used upstream values for feature flags are set in
cloudinit/features.py
. Overrides to these values should be patched directly (e.g., via quilt patch) by downstreams.
I guess modifying the cloud-init python module to set ERROR_ON_USER_DATA_FAILURE = False
is something image-builder can do for now. But once Ubuntu 20.04 is EOL, the feature flag itself will be removed.
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close not-planned
@k8s-triage-robot: Closing this issue, marking it as "Not Planned".
What steps did you take and what happened:
The latest cloud-init version
23.3.1-0ubuntu1~20.04.1
that is shipped with base AMI for Ubuntu 20.04 is unable to run boothook https://cloudinit.readthedocs.io/en/latest/explanation/format.html#cloud-boothook provided by CAPA, https://github.com/kubernetes-sigs/cluster-api-provider-aws/blob/0bf78b04b305a77aec37a68c107102231faa7a16/pkg/cloud/services/secretsmanager/secret_fetch_script.go#L20 As a result the CAPA VMs are not initializing as expected.Steps to reproduce:
create an AMI using image-builder
Create CAPA cluster using the AMI created in step 1 using instructions at: https://cluster-api-aws.sigs.k8s.io/getting-started.html
Check logs at
/var/log/cloud-init-output.log
What did you expect to happen: Cloud-init run successfully on the VM
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.] Log from cloud-init.
Environment:
Project (Image Builder for Cluster API:
Additional info for Image Builder for Cluster API related issues:
/etc/os-release
, orcmd /c ver
): ubuntu-20.04kubectl version
):/kind bug [One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]