vexxhost / magnum-cluster-api

Cluster API driver for OpenStack Magnum
Apache License 2.0
41 stars 15 forks source link

magnum-cluster-api-image-builder creates ubuntu 2204 images with broken networking #378

Open jrosser opened 1 month ago

jrosser commented 1 month ago

magnum-cluster-api-image-builder pulls the latest daily build of the ubuntu service live-cd installer iso, and passes this to image-builder as "iso_url".

The daily build ubuntu ISO has the subiquity installer run from a snap, revision 5741 corresponding to subiquity 24.04.1.

curl -H 'Snap-Device-Series: 16' http://api.snapcraft.io/v2/snaps/info/subiquity | jq .

{
  "channel-map": [
    {
      "channel": {
        "architecture": "amd64",
        "name": "stable",
        "released-at": "2024-04-25T14:19:35.106214+00:00",
        "risk": "stable",
        "track": "latest"
      },
      "created-at": "2024-04-17T14:39:49.085646+00:00",
      "download": {
        "deltas": [],
        "sha3-384": "859092c0d2e92279e23827f920421fa730a2ab4dd4435e26710a27612032d41be01d7f1729222df94e7532d9d43836da",
        "size": 21020672,
        "url": "https://api.snapcraft.io/api/v1/snaps/download/ba2aj8guta0zSRlT3QM5aJNAUXPlBtf9_5741.snap"
      },
      "revision": 5741,
      "type": "app",
      "version": "24.04.1"
    },

The release-day ISO for ubuntu 22.04.4 contains revision 5495 of the subiquity snap.

Subiquity has changed the name of the files it drops during installation for network configuration https://github.com/canonical/subiquity/commit/2af582984c47ebe0a8d3bbc2733e0767d04cda0f

image-builder only accounts for the previous behaviour of subiquity https://github.com/kubernetes-sigs/image-builder/blob/main/images/capi/ansible/roles/sysprep/tasks/debian.yml

The result is a file is left in /etc/cloud/cloud.cfg.d/90-installer-network.cfg

ubuntu@capi-image-build:~$ sudo cat /mnt/etc/cloud/cloud.cfg.d/90-installer-network.cfg
# This is the network config written by 'subiquity'
network:
  ethernets:
    ens4:
      dhcp4: true
  version: 2

End result is the configuration for (the non-existent) ens4 interface is read by cloud-init when the magnum instance boots, and the IP previously acquired by DHCP from neutron is removed from the interface by cloud-init

May 28 09:03:53 localhost.local cloud-init[524]: [CLOUDINIT]2024-05-28 09:03:53,168 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'del', '0.0.0.0/0', 'via', '192.168.10.1', 'dev', 'enp3s0'] with allowed return codes [0] (shell=False, capture=True)
May 28 09:03:53 localhost.local cloud-init[524]: [CLOUDINIT]2024-05-28 09:03:53,170 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'del', '169.254.169.254/32', 'via', '192.168.10.1', 'dev', 'enp3s0'] with allowed return codes [0] (shell=False, capture=True)
May 28 09:03:53 localhost.local cloud-init[524]: [CLOUDINIT]2024-05-28 09:03:53,171 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'link', 'set', 'dev', 'enp3s0', 'down'] with allowed return codes [0] (shell=False, capture=True)
May 28 09:03:53 localhost.local cloud-init[524]: [CLOUDINIT]2024-05-28 09:03:53,173 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'del', '192.168.10.73/24', 'dev', 'enp3s0'] with allowed return codes [0] (shell=False, capture=True)

The magnum instance networking is now broken.

mnaser commented 1 month ago

dang, I was bit by this for a while, I was hoping that we can speed up the deploy by reducing the apt upgrade time... but it seems that this has somehow rolled out another issue.

jrosser commented 3 weeks ago

Fixed by https://github.com/kubernetes-sigs/image-builder/pull/1480