tinkerbell / cluster-api-provider-tinkerbell

Cluster API Infrastructure Provider
Apache License 2.0
101 stars 36 forks source link

Figure out Tinkerbell workflow for provisioning Ubuntu instance with cloud-init userdata #6

Closed invidian closed 3 years ago

invidian commented 3 years ago

To be able to use existing cloud-init configs generated by Cluster API, we should figure out a way how Tinkerbell can provision Ubuntu on worker nodes with specific cloud-init userdata specified.

This should allow easy boostrapping of clusters.

Regarding the cloud-init userdata delivery to the provisioning process, I think for MVP we could store userdata base64 encoded in workflow template directly, the same as we do for installing Flatcar on Tinkerbell: https://github.com/kinvolk/tinkerbell.org/blob/invidian/flatcar/content/examples/flatcar-container-linux/_index.md#preparing-template.

While a bit hacky, this will allow delivering userdata without any middleware like Hegel, which should make things simple for the start.

#

I figured out how to install Ubuntu Cloud ISO image on machine and make it use cloud-init. Here are the steps:

Now, your VM should boot Ubuntu Cloud with provided user-data applied.

invidian commented 3 years ago

Next step is convert the script into Tinkerbell workflow.

Some things to consider:

displague commented 3 years ago

https://github.com/canonical/cloud-init/pull/680 is relevant since the metadata formats are related.

displague commented 3 years ago

I was discussing this with @dustinmiller1337 @pereztr5 and @mmlb very recently. In the Equinix Metal world, the DMI fields do not uniformly identify that the devices are running in EM. It is advantageous for us to do this so that users and tools like cloud-init can identify their environment.

In Tinkerbell, I don't know if we can make the same assumptions about manipulating the device DMI. In some cases, the DMI can only be updated with hardware-specific software.

This problem is top on my mind now, especially as it relates to this conversation: https://github.com/canonical/cloud-init/pull/680#discussion_r536386252. It is not preferable for cloud-init to base its data source detection on the existence of a known metadata service.

invidian commented 3 years ago

Okay, I figured out how to run the code above as a Workflow. Findings:

Dockerfile content for for ubuntu-install image:

FROM alpine:3.12

RUN apk add -U qemu-img

And workflow template:

version: "0.1"
name: ubuntu-install
global_timeout: 1800
tasks:
  - name: "ubuntu-install"
    worker: "{{.machine1}}"
    volumes:
      - /dev:/dev
      - /statedir:/statedir
    actions:
      - name: "dump-cloud-init"
        image: ubuntu-install
        command:
          - sh
          - -c
          - |
            echo '{{.cloudInit}}' | base64 -d > /statedir/90_dpkg.cfg
      - name: "download-image"
        image: ubuntu-install
        command:
          - sh
          - -c
          - |
            # TODO: Pull image from Tinkerbell nginx and convert it there, so we can pipe
            # wget directly into dd.
            /usr/bin/wget https://cloud-images.ubuntu.com/focal/current/focal-server-cloudimg-amd64.img -O /statedir/focal-server-cloudimg-amd64.img
      - name: "write-image-to-disk"
        image: ubuntu-install
        command:
          - sh
          - -c
          - |
            /usr/bin/qemu-img convert -f qcow2 -O raw /statedir/focal-server-cloudimg-amd64.img /dev/vda
      - name: "write-cloud-init-config"
        image: ubuntu-install
        command:
          - sh
          - -c
          - |
            set -eux
            partprobe /dev/vda
            mkdir -p /mnt/target
            mount -t ext4 /dev/vda1 /mnt/target
            cp /statedir/90_dpkg.cfg /mnt/target/etc/cloud/cloud.cfg.d/
            umount /mnt/target
      - name: "reboot" # This task shouldn't really be there, but there is no other way to reboot the Tinkerbell Worker into target OS in Tinkerbell for now.
        image: ubuntu-install
        command:
          - sh
          - -c
          - |
            echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger

Then, create a workflow like:

tink workflow create -t <template ID> -r '{"machine1": "<mac>", "cloudInit": "<base64 encoded cloud-init data>"}'
invidian commented 3 years ago

The MVP part is figured out, so let's close this as done.