lavabit / robox

The tools needed to robotically create/configure/provision a large number of operating systems, for a variety of hypervisors, using packer.
630 stars 140 forks source link

generic/ubuntu2204 (libvirt): poplulated machine-id file in image #257

Closed davehewitt closed 1 year ago

davehewitt commented 1 year ago

When using the generic/ubuntu2204 (https://app.vagrantup.com/generic/boxes/ubuntu2204) base box image, I get a duplicate IP address when creating multiple VMs. This is due to an accidentally hard-coded /var/lib/dbus/machine-id file that is baked into the image. This was introduced sometime after version 4.1.8 (which works; the file is symlinked to /etc/machine-id). Versions 4.2.2 and 4.2.4 contain the hard-coded machine-id file in /var/lib/dbus and get duplicate IPs.

lmm-git commented 1 year ago

This also seems to affect (at least) generic/ubuntu2210

ladar commented 1 year ago

Hi @davehewitt thanks for catching this. We have the machine.sh script, which is one of the few "common" where we try to get away with a single script for every box, and it should be truncating /etc/machine-id. So I'm wondering why /var/lib/dbus/machine-id isn't getting truncated And/or more importantly the proper fix is. Can we add a check for the /var/lib/dbus/machine-id file and remove or truncate it as well?

I'm going to try and troubleshoot the issue now, before I kickoff the 4.2.8 build run, but I might run out of time.

ladar commented 1 year ago

So I'm looking at the 4.2.6 images, and it looks liike the /etc/machine-id file is being reset. What's strange, is the /var/lib/dbus/machine-id file is NOT a symlink or hardlink to the /etc/machine-id file, but yet it still appears to hold the same value. And at least according to the metadata, the dbus version hasn't been modified since the the box was built.

I'm wondering if what's really happening is the the machine id generation process is broken, and it's generating the same :random" value multiple times (ie. during the box build process, and then again when a box is cloned cloned). Anybody have a thought on this?

root@ubuntu2204:~# stat /var/lib/dbus/machine-id
  File: /var/lib/dbus/machine-id
  Size: 33          Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 3671148     Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-12-08 07:28:01.245916803 +0000
Modify: 2022-11-30 01:50:51.253914374 +0000
Change: 2022-11-30 01:50:51.257914381 +0000
 Birth: 2022-11-30 01:50:51.253914374 +0000
root@ubuntu2204:~# stat /etc/machine-id 
  File: /etc/machine-id
  Size: 33          Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 1442489     Links: 1
Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2022-12-08 06:27:52.176000000 +0000
Modify: 2022-12-08 06:27:52.164000000 +0000
Change: 2022-12-08 06:27:52.164000000 +0000
 Birth: 2022-11-30 01:44:43.533016426 +0000
root@ubuntu2204:~# cat /etc/machine-id 
9119b8e5947f4efb83c018b73cf9d13e
root@ubuntu2204:~# cat /var/lib/dbus/machine-id 
9119b8e5947f4efb83c018b73cf9d13e
ladar commented 1 year ago

Interesting, I did an experiment, and truncated just /etc/machine-id and rebooted. The same value popped up after the reboot. Then I tried truncating both /etc/machine-id and /var/lib/dbus/machine-id and rebooted. That led to a new value.

ladar commented 1 year ago

And I think I found the answer buried in a mac page:

When a machine is booted with systemd(1) the ID of the machine will be established. If systemd.machine_id= or --machine-id= options (see first section) are specified, this value will be used. Otherwise, the value in /etc/machine-id will be used. If this file is empty or missing, systemd will attempt to use the D-Bus machine ID from /var/lib/dbus/machine-id, the value of the kernel command line option container_uuid, the KVM DMI product_uuid or the devicetree vm,uuid (on KVM systems), and finally a randomly generated UUID.

electrofelix commented 1 year ago

@ladar looks like this hasn't been released yet based on looking at https://github.com/lavabit/robox/compare/4.2.6...master given that 4.2.6 is the latest available for the boxes in https://app.vagrantup.com/generic. Should there be one soon enough? Looks like it's also after 4.2.8 which didn't appear for the vagrant boxes either, but did for the docker ones.

cwegener commented 1 year ago

@electrofelix The commit with the fix is in 4.2.10 but there's no 4.2.10 version on the Vagrant hub. :frowning_face: My work around is to pin my generic/ubuntu boxes to the old 4.1.8 version that the OP mentioned.

ladar commented 1 year ago

It takes between 48 and 72 hours to build all of the boxes at this point, depending on the hypervisor (each platform has a blade server dedicated to building just those box files, or in the case of Parallels, a Mac mini). Once the build is finished, I have to follow up and look for/resolve any failures. That's why there is usually a lag between when I push a new tag/version to the repo, and when the box files end up on Vagrant Cloud.

I often find/fix problems after a build has been kicked off, like with this issue, but unless it's super critical I usually don't go back and remove/rebuild the affected boxes. As a result a fix might end up being applied to one platform, but not another. In this case there was was an issue with the blade used to build the Docker images, so I used a robot typically used to build another platform, once it finished.

Hopefully I'll be able to find someone willing to donate newer/faster blades at some point, so I can get the build/release cycle back to 24 hours (or less). And if it's a lot less, add more distros/arches and/or specialized variants, like boxes with a graphical desktop (right now only the Magma developer box includes a GUI), etc.