Open MattSnow-amd opened 10 months ago
Hi,
If I understand you right, your problem is not with this packer plugin, but with the image and the infrastructure that image is intended to be used later. You were able to run the linked packer build and it did run the provisioning steps and produced an image. Then later, you tried to use that image with (what I guess was) dmacvicar's libvirt terraform provider, only to see that it failed to run the provided cloud-init script.
I would suggest to you to closely monitor the booting process of the image (boot with nosplash
, without quiet
, etc.) to see if you can spot any log/message related to cloud-init. You must at least should be able to see the cloud-init related systemd units starting.
Also, cloud-init should create log files like /var/log/cloud-init-output.log
. The logging configuration should reside in /etc/cloud/cloud.cfg.d/05_logging.cfg
(or something similar).
You can also try to run cloud-init manually (with cloud-init init -d
or something similar, check the docs).
Also, to narrow things down, can you run your cloud-init script on the original cloud image from Cannonical? If that image also fails to run your cloud-init script, then the issue is probably with your infra environment / cloud-init script, and not with this packer plugin and config.
Another idea would be to make sure that you are building the image with a similar VM config to the environment intended for later use. For example, make sure that you are running on the same chipset and both system runs either BIOS or UEFI.
Probably I should also mention that I have a very vague memory of something similar happening to me previously, when for some mysterious reason, the image I created won't booted when the vm was created with terraform. If I recall it correctly, it was something to do with the libvirt terraform plugin messing up bus types and addresses and for some unexplainable reason, it prevented the system from booting. I don't remember much, but that it was really annoying to debug.
I can share you some snippets from my terraform and packer configs that might inspire you on your debug journey.
This is the last provisioning step for my builds:
build {
// ...
provisioner "shell" {
inline = [
"echo 'Cleaning up cloudinit'",
"sudo cloud-init clean --logs",
"",
"truncate -s 0 ~/.ssh/authorized_keys",
]
}
}
I have a separate terraform module for managing "compute" nodes and another for IPAM. Here is the cloud-init part.
resource "libvirt_cloudinit_disk" "this" {
name = "${var.id}-${var.name}-cloudinit"
pool = local.root_storage_pool
meta_data = local.meta_data
network_config = local.network_config
user_data = var.user_data
}
locals {
default_network_config = {
version = 2
ethernets = {
eth = {
match = {
macaddress = macaddress.this.address
}
"set-name" = "eth"
addresses = [
"${module.ipam.ip_address}/${module.ipam.cidr}"
]
gateway4 = module.ipam.gateway
nameservers = {
search = module.ipam.search_domains
addresses = [
module.ipam.nameserver
]
}
}
}
}
default_meta_data = <<EOM
instance-id: ${var.id}-${var.name}
local-hostname: ${var.name}
EOM
meta_data = var.meta_data != null ? var.meta_data : local.default_meta_data
network_config = var.network_config != null ? var.network_config : jsonencode(local.default_network_config)
}
To generate the cloud-init file, I use data "template_cloudinit_config"
.
data "template_cloudinit_config" "vm" {
gzip = false
base64_encode = false
part {
filename = "init.cfg"
content_type = "text/cloud-config"
content = yamlencode({
ssh_authorized_keys = local.ssh_authorized_keys
users = [
{
name = "terraform"
groups = ["sudo"]
shell = "/bin/bash"
hashed_passwd = random_password.vm.bcrypt_hash
lock_passwd = false
ssh_authorized_keys = concat(local.ssh_authorized_keys, [
tls_private_key.mgmt.public_key_openssh,
tls_private_key.terraform.public_key_openssh
])
}
]
packages = [
"python3",
"python3-pip",
"python3-wheel",
"python3-virtualenv",
"python3-netaddr",
"git",
"ipvsadm",
]
})
}
}
module "vm" {
// ...
user_data = data.template_cloudinit_config.vm.rendered
}
And here is my XSLT causing perpetual diffs but making my nodes as I wanted them to be:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:strip-space elements="*" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Making the worker headless -->
<xsl:template match="/domain/devices/graphics" />
<xsl:template match="/domain/devices/video" />
<xsl:template match="/domain/devices/audio" />
<xsl:template match="/domain/devices/input[@type='mouse' or @type='keyboard']" />
<!-- SEE https://github.com/dmacvicar/terraform-provider-libvirt/issues/667 -->
<!-- Thanks dariush, https://gist.github.com/dariush/7405cbf62835e03d0b5c953d798a87cd -->
<!-- replace <target dev='hdd'...> with <target dev='sdd'...> -->
<xsl:template match="/domain/devices/disk[@device='cdrom']/target/@dev">
<xsl:attribute name="dev">
<xsl:value-of select="'sdd'"/>
</xsl:attribute>
</xsl:template>
<!-- replace <target bus='ide'...> with <target bus='sata'...> -->
<xsl:template match="/domain/devices/disk[@device='cdrom']/target/@bus">
<xsl:attribute name="bus">
<xsl:value-of select="'sata'"/>
</xsl:attribute>
</xsl:template>
<!-- replace <target bus='ide'...> with <target bus='sata'...> -->
<xsl:template match="/domain/devices/disk[@device='disk' and target/@bus='scsi']">
<xsl:copy>
<xsl:apply-templates select="@*|*[not(self::wwn) and not(self::target)]"/>
<target bus="sata">
<xsl:attribute name="dev"><xsl:value-of select="target/@dev" /></xsl:attribute>
</target>
</xsl:copy>
</xsl:template>
<!-- replace <alias...> with nothing ie delete the <alias...> element -->
<xsl:template match="/domain/devices/disk[@device='cdrom']/alias" />
</xsl:stylesheet
I'm using the 0.7.1
version of dmacvicar's libvirt plugin.
Also, all my machines are now provisioned on q35
machine type and UEFI.
Hope this helps. Let us know if and how you managed to figure out your issue.
Hi,
If I understand you right, your problem is not with this packer plugin, but with the image and the infrastructure that image is intended to be used later. You were able to run the linked packer build and it did run the provisioning steps and produced an image. Then later, you tried to use that image with (what I guess was) dmacvicar's libvirt terraform provider, only to see that it failed to run the provided cloud-init script.
Correct. I am able to successfully run and build an image using packer with the libvirt builder and a cloud-init image. I am also successfully able to build domains using 'fresh' cloud-init images using dmacvicar's libvirt terraform provider.
In both of these cases, starting with an unmodified cloud image (so far I have only tried Ubuntu-22.04) I am able to successfully apply cloud-init configurations.
I would suggest to you to closely monitor the booting process of the image (boot with
nosplash
, withoutquiet
, etc.) to see if you can spot any log/message related to cloud-init. You must at least should be able to see the cloud-init related systemd units starting. Also, cloud-init should create log files like/var/log/cloud-init-output.log
. The logging configuration should reside in/etc/cloud/cloud.cfg.d/05_logging.cfg
(or something similar).I have not modified the grub boot options to remove those boot options, but am able to monitor the console of both terraform apply built domain and packer build by running
virsh console <domain>
. Again, starting from an unbooted cloud-init image I am able to see cloud-init start and run to completion in both packer and terraform built domains. As soon as I try to pass the packer built image as a source to terraform, I no longer see cloud-init starting and running, even with the various cloud-init clean options (sudo cloud-init clean [--logs|--seed|--machine-id]
DI_LOG=stderr /usr/lib/cloud-init/ds-identify --force
systemctl enable cloud-init[-local|-config|-final]
.You can also try to run cloud-init manually (with
cloud-init init -d
or something similar, check the docs).Also, to narrow things down, can you run your cloud-init script on the original cloud image from Cannonical? If that image also fails to run your cloud-init script, then the issue is probably with your infra environment / cloud-init script, and not with this packer plugin and config.
Great point! I had tried this already but was trying to keep my problem statement a bit too condensed. As mentioned above, I can successfully apply cloud-init configs in either packer or terraform, but cannot have the packer built image passed into terraform.
Another idea would be to make sure that you are building the image with a similar VM config to the environment intended for later use. For example, make sure that you are running on the same chipset and both system runs either BIOS or UEFI.
Probably I should also mention that I have a very vague memory of something similar happening to me previously, when for some mysterious reason, the image I created won't booted when the vm was created with terraform. If I recall it correctly, it was something to do with the libvirt terraform plugin messing up bus types and addresses and for some unexplainable reason, it prevented the system from booting. I don't remember much, but that it was really annoying to debug.
I can share you some snippets from my terraform and packer configs that might inspire you on your debug journey.
snip
And here is my XSLT causing perpetual diffs but making my nodes as I wanted them to be:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes"/> <xsl:strip-space elements="*" /> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <!-- Making the worker headless --> <xsl:template match="/domain/devices/graphics" /> <xsl:template match="/domain/devices/video" /> <xsl:template match="/domain/devices/audio" /> <xsl:template match="/domain/devices/input[@type='mouse' or @type='keyboard']" /> <!-- SEE https://github.com/dmacvicar/terraform-provider-libvirt/issues/667 --> <!-- Thanks dariush, https://gist.github.com/dariush/7405cbf62835e03d0b5c953d798a87cd --> <!-- replace <target dev='hdd'...> with <target dev='sdd'...> --> <xsl:template match="/domain/devices/disk[@device='cdrom']/target/@dev"> <xsl:attribute name="dev"> <xsl:value-of select="'sdd'"/> </xsl:attribute> </xsl:template>
I compared optical drive sections of both packer+packer-plugin-libvirt
and terraform+terraform-libvirt-provider
domains by running virsh dumpxml --domain [domain]
from both libvirt instances.
<disk type='volume' device='cdrom'>
<driver name='qemu' type='raw'/>
<source pool='packer' volume='ubuntu-2204-focal_PUBLIC-cloudinit' index='1'/>
<backingStore/>
<target dev='sdb' bus='sata'/>
<readonly/>
<alias name='sata0-0-1'/>
<address type='drive' controller='0' bus='0' target='0' unit='1'/>
</disk>
I am guessing you may already be aware but I feel it is worth pointing out, the terraform-provider-libvirt code hard codes the device type (cdrom), target bus(ide), and dev (hdd).
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<source file='/scratch/libvirt/terraform/pool/commoninit.my-tf-hostname01.example.com.iso' index='1'/>
<backingStore/>
<target dev='hdd' bus='sata'/>
<readonly/>
<serial>cloudinit</serial>
<alias name='sata0-0-3'/>
<address type='drive' controller='0' bus='0' target='0' unit='3'/>
</disk>
As a test, I did the following:
1) Build the packer image and did the cloud-init clean
commands to reset the cloud-init state.
2) exported the packer image to the staging directory for terraform to pickup.
3) Ran terraform apply
using the packer built image.
4) I quickly destroyed
the domain while the kernel was still loading after terraform saw it successful, but before . virsh destroy --domain my-tf-hostname01.example.com
5) Modified the terraform built domain virsh edit --domain my-tf-hostname01.example.com
and changed the target
attributes dev
and bus
to match the packer domain, specifically the dev
value is changed to sdb
and nothing else is changed.
6) Started the domain.
The result: The terraform deployed domain starts up and runs cloud-init successfully as expected. I have run through this a couple of times now and can confirm this process produces the desired result. However, if I let the terraform created domain continue with startup and systemd runs, any future modification to the domain XML definition will not enable cloud-init to run without further intervention.
<!-- replace <target bus='ide'...> with <target bus='sata'...> --> <xsl:template match="/domain/devices/disk[@device='cdrom']/target/@bus"> <xsl:attribute name="bus"> <xsl:value-of select="'sata'"/> </xsl:attribute> </xsl:template> <!-- replace <target bus='ide'...> with <target bus='sata'...> --> <xsl:template match="/domain/devices/disk[@device='disk' and target/@bus='scsi']"> <xsl:copy> <xsl:apply-templates select="@*|*[not(self::wwn) and not(self::target)]"/> <target bus="sata"> <xsl:attribute name="dev"><xsl:value-of select="target/@dev" /></xsl:attribute> </target> </xsl:copy> </xsl:template> <!-- replace <alias...> with nothing ie delete the <alias...> element --> <xsl:template match="/domain/devices/disk[@device='cdrom']/alias" />
</xsl:stylesheet
I'm using the `0.7.1` version of dmacvicar's libvirt plugin. Also, all my machines are now provisioned on `q35` machine type and UEFI. I am using the same version as well. Same machine type on the terraform end. It seems my packer domain is starting with `pc-i440fx-focal` machine type. I don't believe UEFI is enabled anywhere on my environment yet. Hope this helps. Let us know if and how you managed to figure out your issue.
This was extremely helpful and I appreciate all the support very much!
I have what I believe is a simple use case but am struggling to figure out a solution. I will try to explain.
context: I have an ubuntu machine setup as a hypervisor. I use this ansible playbook as the basis.
I have this repo of terraform code that provisions several instances of Ubuntu-22.04 from a cloud-init image source (see osimage.tf ) and customized with a cloud-init config.
intended use: I want to create a CI pipeline for building golden images that have some basic configurations like packages, CA certs, users. I would like to take make this process an intermediate step that generates "golden" images that are used by the terraform code.
Gist of packer build HCL: https://gist.github.com/MattSnow-amd/3b36f82364fe6105ac52cc7a68dc3812 I have tried a variety of combinations of manually deleting files generated by the packer build process and running
cloud-init clean
commands from the documentation.Problem: The terraform created VM/domain boots up, but none of the cloud-init configurations are applied and the network is not configured. I am able to communicate between virsh and the VM's qemu-guest-agent via
virsh domifaddr --domain mymachinename.example.com --source agent
. Sample outget fromvirsh domifadd
:I can also
virsh console
into the running domain and confirm that the cdrom at /dev/sr0 is presented in the domain, and the cidata image can be mounted and contains all of the terraform templated values in the user-data file.Any guidance or pointers are much appreciated. Thank you for your effort in writing such a useful tool!