vmware-samples / packer-examples-for-vsphere

Packer Examples for vSphere
https://vmware-samples.github.io/packer-examples-for-vsphere/
Other
824 stars 325 forks source link

Unable to boot Debian 11 with secure boot EFI #964

Closed vwesisolak closed 1 month ago

vwesisolak commented 1 month ago

Code of Conduct

Project Version

latest

VMware vSphere

7.0.3

HashiCorp Packer

1.6.6

HashiCorp Packer Plugin for VMware vSphere

1.6.6?

Guest Operating System

Debian 11.9.0

Environment Details

No response

Description

After cloning from the Debian 11 VM template, the clone fails to boot, with all options having timed out or failed. The template and clone have UEFI with secureboot enabled and the install was done with UEFI only. If I boot into rescue mode from the media and choose to reinstall the bootloader it adds an entry pointing to /EFI/debian/shimx64.efi, and booting works. Manually adding this entry to the UEFI boot options also works. I believe this also happens with Debian 12, but haven't tested it recently. The creation of the template itself completes without any issues logged.

Expected Behavior

Successful boot after cloning.

Actual Behavior

All boot methods failed.

Steps to Reproduce

Deploy VM with UEFI enabled, Secure boot enabled, and in storage.pkrtpl.hcl:

# Force UEFI booting ('BIOS compatibility' will be lost). Default: false.
d-i partman-efi/non_efi_system boolean true

Log Fragments and Files

No response

Screenshots

deb11-efibootmgr Only Boot0000 through Boot0003 exist after cloning. Adding Boot0004 (or equivalent) allows successful boot.

Additional Context

No response

vwesisolak commented 1 month ago

I confirmed this also is affecting Debian 12.

burnsjared0415 commented 1 month ago

I having issues with the debian mirror on deployment. Tested with packer and also with manual build. once i can get the mirror figured out i will test this.

burnsjared0415 commented 1 month ago

i tested this with a build and clone on 11 with out issues, i can keep trying to reproduce the issue but was able to get this to work with current config

burnsjared0415 commented 1 month ago

tested 12.7.0 and had no issues with clone, can you send me any configuration you can?

vwesisolak commented 1 month ago

What would be helpful for you to see? For what it is worth we forked of of v0.19.0 or v0.19.1, though none of the commits or issues I saw for v0.20.0 seemed like it would have been related.

These are the variables we are using (which seems to match the example):


# Copyright 2023-2024 Broadcom. All rights reserved.
# SPDX-License-Identifier: BSD-2

/*
    DESCRIPTION:
    Debian 11 build variables.
*/

// Guest Operating System Metadata
vm_guest_os_language = "en_US"
vm_guest_os_keyboard = "us"
vm_guest_os_timezone = "UTC"
vm_guest_os_family   = "linux"
vm_guest_os_name     = "debian"
vm_guest_os_version  = "11"

// Virtual Machine Guest Operating System Setting
vm_guest_os_type = "other5xLinux64Guest"

// Virtual Machine Hardware Settings
vm_firmware              = "efi-secure"
vm_cdrom_type            = "sata"
vm_cdrom_count           = 1
vm_cpu_count             = 2
vm_cpu_cores             = 1
vm_cpu_hot_add           = false
vm_mem_size              = 2048
vm_mem_hot_add           = false
vm_disk_size             = 40960
vm_disk_controller_type  = ["pvscsi"]
vm_disk_thin_provisioned = true
vm_network_card          = "vmxnet3"

// Removable Media Settings
iso_datastore_path       = "iso/linux/debian"
iso_content_library_item = "debian-11.9.0-amd64-netinst"
iso_file                 = "debian-11.9.0-amd64-netinst.iso"

// Boot Settings
vm_boot_order = "-"
vm_boot_wait  = "5s"

// Communicator Settings
communicator_port    = 22
communicator_timeout = "30m"
burnsjared0415 commented 1 month ago

that helps, i will see if i can reproduce, need to figure out what is different

burnsjared0415 commented 1 month ago

one question, what is the vsphere version, it should not matter but want to see if there is a issue with the way hardware version are working.

vwesisolak commented 1 month ago

The vm template is being built on vcenter 8.0.2 and my clone test is being done on VCD 10.5.1 with vcenter 7.0.3. The VM HW version is 19. I wonder if you might be onto something with the versioning. That is a fairly large difference and the built happens without any issues.

tenthirtyam commented 1 month ago

Do you get the same result without a native deployment to vSphere - taking VCD out of the flow?

vwesisolak commented 1 month ago

I was able to test with VCD 10.5.1/vcenter 8.0.2 with the same result (not sure it is relevant but the ESXi version was still 7.0.3 on the host). I am working to see if I can test without VCD....

burnsjared0415 commented 1 month ago

let me do some testing around 7.0.3, need to figure out how to re-produce the issue

vwesisolak commented 1 month ago

I found the templates are being built on vcenter 7.0.3. I was also able to clone the template on that vcenter (no VCD) and the clone was able to boot properly. So that is progress for sure.

tenthirtyam commented 1 month ago

So it seems to be related to VCD at this time.

vwesisolak commented 1 month ago

Yeah, that seems most likely, though I am not finding any similar issues, or have idea what would cause it. That said, I am testing the VM created by our pipeline rather than the template that was exported as OVF and uploaded to the content library. It is possible that that process is introducing the issue, so not ready to close this issue yet, but it is on hold while I try to get more info for you (or confirm it is VCD).

vwesisolak commented 1 month ago

I got to the bottom of this issue and it was a combination of things. By default Packer does not include the nvram in the OVF export (related issue), but nvram is where the EFI firmware variables (and therefor boot options) are stored. In addition, Most Linux distros make use of the "removable media" EFI path as to fallback on for compatibility, but Debian does not. The end result was that with the fallback, most distro could boot enough to then restore the variables, but without it, Debian was just unable to boot.

The solution is to either include the nvram in the OVF export (I did not test this but it should work), or to configure Debian to use the removable media fallback (this worked in my testing). I added the following to builds/linux/debian/11/data/ks.pkrtpl.hcl:

d-i grub-installer/force-efi-extra-removable boolean true

Note that the preseed directive given for it in the wiki did not work on Debian 11.

tenthirtyam commented 1 month ago

If you need the .nvram file after performing an export to OVF add the following to the build definition from the plugin options:

 # ...
    vm_name = "example-ubuntu"
    # ...
    export {
      force = true
      output_directory = "./output-artifacts"
      image_files = true
    }
github-actions[bot] commented 1 week ago

I'm going to lock this issue because it has been closed for 30 days. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.