lxc / lxc-ci

LXC continuous integration and build scripts
https://jenkins.linuxcontainers.org
Apache License 2.0
259 stars 136 forks source link

debian/11/cloud doesn't run cloud-init in lxd #771

Open antoonhuiskens opened 1 year ago

antoonhuiskens commented 1 year ago

Hi, I tend to run images with lxd on non-lxd managed networks (unmanaged networks in lxd speak).

With Debian/11/cloud (bullseye/cloud) I run into the situation that networking doesn't come up, which seems to be related to the fact that cloud-init doesn't run, when compared to e.g. debian/12/cloud (bookworm/cloud).

Here's my environment:

$ snap list lxd
Name  Version       Rev    Tracking       Publisher   Notes
lxd   5.15-002fa0f  25112  latest/stable  canonical✓  -
$ lxc --version
5.15
$ lxc network list -f compact | egrep 'NAME|br100'
    NAME       TYPE    MANAGED  IPV4  IPV6  DESCRIPTION  USED BY  STATE
  br100      bridge    NO                                16
$ lxc remote ls -fcompact | egrep 'NAME|linux'
       NAME                          URL                       PROTOCOL      AUTH TYPE   PUBLIC  STATIC  GLOBAL
  images           https://images.linuxcontainers.org        simplestreams  none         YES     NO      NO
$ lxc profile show default  # output is edited for brevity
config: {}
description: Default LXD profile
devices:
  eth0:
    name: eth0
    nictype: bridged
    parent: br100
    type: nic
  root:
    path: /
   ...
name: default
used_by:
- /1.0/instances/d11c
- /1.0/instances/d12c

Note that at this stage I'm not passing in any cloud-init yet. (that happens in images that derive from this).

Now when I launch "vanilla" cloud images from both, I observe (in addition to networking being down) that cloud-init is disabled on bullseye but not on bookworm:

$ lxc launch --ephemeral images:debian/11/cloud d11c
Creating d11c
Starting d11c
$ lxc launch --ephemeral images:debian/12/cloud d12c
Creating d12c
Starting d12c
$ lxc exec d11c -- cloud-init status
status: disabled
$ lxc exec d12c -- cloud-init status
status: done

This behaviour is triggered by the systemd generator for cloud-init:

$ lxc exec d11c -- cat /run/cloud-init/cloud-init-generator.log
/usr/lib/systemd/system-generators/cloud-init-generator normal=/run/systemd/generator early=/run/systemd/generator.early late=/run/systemd/generator.late
kernel command line (container[lxc]: pid 1 cmdline): /sbin/init
kernel_cmdline found unset
etc_file found unset
default found enabled
checking for datasource
ds-identify rc=1
ds-identify _RET=notfound
cloud-init is enabled but no datasource found, disabling
already disabled: no change needed [no /run/systemd/generator.early/multi-user.target.wants/cloud-init.target]

whereas the debian/12/cloud image reports this:

$ lxc exec d12c -- cat /run/cloud-init/cloud-init-generator.log
/usr/lib/systemd/system-generators/cloud-init-generator normal=/run/systemd/generator early=/run/systemd/generator.early late=/run/systemd/generator.late
kernel command line (container[lxc]: pid 1 cmdline): /sbin/init
kernel_cmdline found unset
etc_file found unset
default found enabled
checking for datasource
ds-identify rc=0

Similarly:

$ for i in d11c d12c ; do echo -n "${i}: " ;lxc exec $i -- cat /run/cloud-init/.ds-identify.result ; done
d11c: 1
d12c: 0

In short: I think my woes are due to the fact that cloud-init is disabled due to a failing cloud-init datasource identification in bullseye, that works in bookworm.

Upon further investigation:

$ for i in d11c d12c ; do echo -n "${i}: " ;lxc exec $i -- fgrep '_LXD(' /usr/lib/cloud-init/ds-identify; echo ;done
d11c:
d12c: dscheck_LXD() {

And really, there's no lxd datasource capability defined:

$ for i in d11c d12c ; do echo -n "${i}: " ;lxc exec $i -- grep -A3 '^DI_DSLIST_DEFAULT' /usr/lib/cloud-init/ds-identify; echo ;done
d11c: DI_DSLIST_DEFAULT="MAAS ConfigDrive NoCloud AltCloud Azure Bigstep \
CloudSigma CloudStack DigitalOcean AliYun Ec2 GCE OpenNebula OpenStack \
OVF SmartOS Scaleway Hetzner IBMCloud Oracle Exoscale RbxCloud"
DI_DSLIST=""

d12c: DI_DSLIST_DEFAULT="MAAS ConfigDrive NoCloud AltCloud Azure Bigstep \
CloudSigma CloudStack DigitalOcean Vultr AliYun Ec2 GCE OpenNebula OpenStack \
OVF SmartOS Scaleway Hetzner IBMCloud Oracle Exoscale RbxCloud UpCloud VMware \
LXD NWCS"

My current take is that this lack of lxd capability is due to the cloud-init version:

$ for i in d11c d12c ; do echo "${i}: " ;lxc exec $i -- dpkg -l cloud-init ; echo ;done
d11c:
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version          Architecture Description
+++-==============-================-============-========================================================
ii  cloud-init     20.4.1-2+deb11u1 all          initialization system for infrastructure cloud instances

d12c:
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-========================================================
ii  cloud-init     22.4.2-1     all          initialization system for infrastructure cloud instances

When I install cloud-init from bullseye-backports and reboot (in order to trigger the systemd generator for cloud-init), it all starts to work as expected and consistent with bookworm:

root@d11c:~# dpkg -l cloud-init
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version        Architecture Description
+++-==============-==============-============-========================================================
ii  cloud-init     22.2-1~bpo11+1 all          initialization system for infrastructure cloud instances
root@d11c:~# cloud-init status
status: done

Summed up, I'd like to make the case for setting up debian/11/cloud with cloud-init 22.2 from bullseye-backports since that version includes support for LXD environments (i.e. cloud-init is broken for bullseye) and will have consistent cloud-init behaviour across os families.

To the very least I hope the above documents the work around well enough, though implementing the workaround requires a reboot of the container in my packer build pipeline which is a bit of a burden.

antoonhuiskens commented 1 year ago

Below snippet is my from my packer workaround:

  provisioner "file" {
    destination = "/etc/apt/sources.list.d/bullseye-backports.list"

    content = <<-EOF
      deb http://deb.debian.org/debian bullseye-backports main
      EOF
    only = ["lxd.debian-bullseye-cloud"]
  }

  provisioner "shell" {
    inline = [
      "dhclient",
      "apt update && apt install -y -t bullseye-backports cloud-init",
      "systemctl enable systemd-networkd.service",
      "shutdown -r now"
    ]
    only        = ["lxd.debian-bullseye-cloud"]
    pause_after = "5s"
stgraber commented 11 months ago

Is this still happening? We have automated daily tests of our images and they've not flagged this image as having an issue.

kkremitzki commented 10 months ago

This may be caused by the debian/11/cloud image having an incorrect sources.list; unless one installs cloud-init from bullseye-backports, the file /etc/cloud/templates/sources.list.debian.tmpl will not have logic to handle the /updates -> -updates change.

root@d11-cloud:~# grep updates /etc/apt/sources.list
## Major bug fix updates produced after the final release of the
deb http://security.debian.org/ bullseye/updates main
deb-src http://security.debian.org/ bullseye/updates main
deb http://deb.debian.org/debian bullseye-updates main
deb-src http://deb.debian.org/debian bullseye-updates main
root@d11-cloud:~# apt update
Hit:1 http://deb.debian.org/debian bullseye InRelease
Ign:2 http://security.debian.org bullseye/updates InRelease
Hit:3 http://deb.debian.org/debian bullseye-updates InRelease
Err:4 http://security.debian.org bullseye/updates Release
  404  Not Found [IP: 199.232.30.132 80]
Hit:5 http://deb.debian.org/debian bullseye-backports InRelease
Reading package lists... Done
E: The repository 'http://security.debian.org bullseye/updates Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.