thomasklein94 / packer-plugin-libvirt

Packer Plugin for Libvirt
Mozilla Public License 2.0
20 stars 15 forks source link

External source doesn't work due to volume is not created #50

Closed minhuw closed 1 year ago

minhuw commented 1 year ago

Thank you for your great work, first of all. I tried the example given in another issue and got the following error messages:

Warning: Pool isn't set for volume , using the 'default' pool

  on ./ubuntu.pkr.hcl line 1:
  (source code not available)

Warning: Volume name was not set, using 'packer-cgnvt9dem3486dkseji0-ua-artifact' as volume name instead.

  on ./ubuntu.pkr.hcl line 1:
  (source code not available)

libvirt.example: output will be in this color.

==> libvirt.example: Unsupported communicator type 'none', no communication will be established to domain
==> libvirt.example: Preparing volumes...
    libvirt.example: Preparing volume default/packer-cgnvt9dem3486dkseji0-ua-artifact
==> libvirt.example: Retrieving default/packer-cgnvt9dem3486dkseji0-ua-artifact
==> libvirt.example: Trying https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64-disk-kvm.img
==> libvirt.example: Trying https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64-disk-kvm.img?checksum=af663afe6d6352e6c0dde61eea75535e73e74bf4703f424b22d82e327d31c0d3
    libvirt.example: jammy-server-cloudimg-amd64-disk-kvm.img 622.19 MiB / 622.19 MiB [===============] 100.00% 29s
==> libvirt.example: https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64-disk-kvm.img?checksum=af663afe6d6352e6c0dde61eea75535e73e74bf4703f424b22d82e327d31c0d3 => /home/minhu/.cache/packer/42606e149ce4f7270a800006d2f2526019fad555
==> libvirt.example: Sending the domain definition to libvirt
==> libvirt.example: Starting the Libvirt domain
==> libvirt.example: DomainCreate.RPC: internal error: process exited while connecting to monitor: 2023-04-07T11:30:17.326428Z qemu-system-x86_64: -blockdev {"driver":"file","filename":"/var/lib/libvirt/images/packer-cgnvt9dem3486dkseji0-ua-artifact","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}: Could not open '/var/lib/libvirt/images/packer-cgnvt9dem3486dkseji0-ua-artifact': Permission denied

If I understand the code correctly, images downloaded from external sources are uploaded to libvirtd as a volume. But I found nothing on /var/lib/libvirt/images after I execute the command. The definition of the default pool is as follows:

<pool type='dir'>
  <name>default</name>
  <uuid>f031c16c-4b2b-4084-ad6c-6f0406145d53</uuid>
  <capacity unit='bytes'>500802019328</capacity>
  <allocation unit='bytes'>18854342656</allocation>
  <available unit='bytes'>481947676672</available>
  <source>
  </source>
  <target>
    <path>/var/lib/libvirt/images</path>
    <permissions>
      <mode>0711</mode>
      <owner>0</owner>
      <group>0</group>
    </permissions>
  </target>
</pool>

The /var/lib/libvirt directory, created by apt, seems to have mixed users/groups. I am not sure whether this is the source of the trouble:

drwxr-xr-x  7 root         root 4096 Apr  6 15:09 .
drwxr-xr-x 46 root         root 4096 Apr  6 15:09 ..
drwx--x--x  2 root         root 4096 Apr 20  2022 boot
drwxr-xr-x  2 root         root 4096 Apr  7 11:30 dnsmasq
drwx--x--x  2 root         root 4096 Apr  7 11:30 images
drwxr-x---  9 libvirt-qemu kvm  4096 Apr  7 11:30 qemu
drwx------  2 root         root 4096 Apr 20  2022 sanlock

Does anyone have an idea about this error?

My environment if needed: Ubuntu 20.04 x86_64 + libvirtd 6.0.0 + Packer 1.8.6 + packer-plugin-libvirt 0.4.4

thomasklein94 commented 1 year ago

Hi @minhuw,

Based on the log lines you provided, libvirt should have accepted and saved the file stream without issues before creating and starting the domain. When creating a domain, libvirt starts a new qemu process, which might not have permission to read the file, while libvirt itself has the necessary permissions to manipulate the file.

Libvirt packer plugin tries to clean up after themselves, meaning it will delete the uploaded volume prior exiting. If you wish to inspect your system in case of an error, you can add -on-error=ask flag to your packer command. If you wish to debug your build, you can also add -debug flag to packer, making packer ask you questions after every step it executes.

Using the -debug flag, Please check if libvirt creates the file after the stepPrepareVolumes step, and if it does, what are the file permissions. Have you checked libvirt's logs? Any outstanding logline there that could bring us closer to understand this issue?

minhuw commented 1 year ago

Thank you for your help! You are correct that I could not find the volume because Packer cleaned it after the failure. I validated that libvirtd created the image file successfully with permission 600 and user root:root after the stepPrepareVolumes step. I modified /etc/config/qemu.conf to run the qemu process as root too, but the bug persisted.

Though I found nothing on libvirtd's log, I found AppArmor's audit log on dmesg output as following:

[1207134.568655] audit: type=1400 audit(1680931593.219:153): apparmor="DENIED" operation="open" profile="libvirt-1f6c65a1-390f-44ad-8a03-942ea8d7dd0f" name="/var/lib/libvirt/images/packer-cgoflv5em347qsbokbcg-ua-artifact" pid=114661 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=0 ouid=0

It seems that AppArmor rejects access to the image file. So I disabled the AppArmor in /etc/config/qemu.conf by setting security_driver = "none" and everything works well now.

Though the issue is resolved and may be closed, I still have some questions. This is the first time I have encountered with AppArmor problem when using libvirt. When I wrote libvirt domain XML manually and create virtual disks from a file source like the one below, AppArmor never complains about file permissions.

  <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/media/data/virt/jammy.img'/>
      <target dev='hda'/>
  </disk>

I checked the XML definition sent by Packer to libvirtd. It defines disk using volume as:

  <disk type='volume' device='disk'>
    <driver name='qemu' type='qcow2'/>
    <source pool='default' volume='packer-cgog54tem340h0t2v12g-ua-artifact'/>
    <target dev='sda' bus='sata'/>
    <alias name='ua-artifact'/>
    <address type='drive' controller='0' bus='0' target='0' unit='0'/>
  </disk>

I guess that libvirtd adds paths of file sources to AppArmor configs in the first case but is not smart enough to add files backing volumes in the second case, so AppArmor rejects the access.

minhuw commented 1 year ago

After some search, I found that libvirtd actually inserts a specific AppArmor profile for each VM. One such profile copied from my existing VMs created by virsh looks like this:

# DO NOT EDIT THIS FILE DIRECTLY. IT IS MANAGED BY LIBVIRT.
  "/var/log/libvirt/**/manager-nixos.log" w,
  "/var/lib/libvirt/qemu/domain-manager-nixos/monitor.sock" rw,
  "/var/lib/libvirt/qemu/domain-7-manager-nixos/*" rw,
  "/run/libvirt/**/manager-nixos.pid" rwk,
  "/run/libvirt/**/*.tunnelmigrate.dest.manager-nixos" rw,
  "/media/data/nixos-virt/nixos-manager.qcow2" rwk,
  "/var/lib/libvirt/qemu/domain-7-manager-nixos/{,**}" rwk,
  "/var/lib/libvirt/qemu/channel/target/domain-7-manager-nixos/{,**}" rwk,
  "/var/lib/libvirt/qemu/ram/7-manager-nixos/{,**}" rwk,
  "/var/lib/libvirt/qemu/domain-7-manager-nixos/master-key.aes" rwk,
  "/var/lib/libvirt/qemu/domain-7-manager-nixos/fs0-fs.sock" rwk,
  "/dev/net/tun" rwk,

It includes the file source /media/data/nixos-virt/nixos-manager.qcow2, explaining why AppArmor never rejected my VM before. I cannot get the AppArmor profile of the VM created by Packer, but I guess it doesn't include the source file. The problem has been discussed elsewhere: https://github.com/coreos/bugs/issues/2083.

Discussions there tell that adding /var/lib/libvirt/images/* r, to /etc/apparmor.d/abstractions/libvirt-qemu can solve the problem without disabling the security driver.

thomasklein94 commented 1 year ago

Thanks for the detailed description of the underlying issue. I did some additional research based on what you were able to find, and I think you are right: The difference in how the volume is referenced makes the difference on how libvirt is creating the apparmor config. Based on this issue from the libvirt project, they are aware of this bug in virt-aa-helper for at least 2 years.

Based on what you were able to find out, and adding my findings to that, I think this issue should be resolved on libvirt side, and not on packer side. Even if this packer plugin were able to work around this with filesystem based pools, other pool types like LVM would still experience the same issue, and I think it would be impossible to figure out the actual device path from packer. Therefor, I suggest to config apparmor to allow qemu access the whole pool, or to disable apparmor altogether until this is resolved.

I will add a small notice to the documentation of this plugin to make others aware of this bug in libvirt in the future. Thank you again for raising this issue.

hswong3i commented 1 year ago

I could add following lines to /etc/apparmor.d/local/abstractions/libvirt-qemu, in order to make local backed volume created being accessible:

"/var/lib/libvirt/images/" r,
"/var/lib/libvirt/images/**" rwk,