hashicorp / packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
http://www.packer.io
Other
15.11k stars 3.33k forks source link

Generation 2 Hyper-V VM boots too fast for boot_command to trigger #7278

Closed KimRechnagel closed 5 years ago

KimRechnagel commented 5 years ago

I'm trying to build a generation 2 Windows Server 2016 VM on Windows 10 with the hyper-v role installed. I have the exact same issue as janegilring in the quote below. More info in a minute, just need to get out of this "Reference new issue" popup which just closed on me.

"I was just setting up the same Packer build configuration in a different environment (lab - slower hardware). The issue in that environment seems to be the opposite: While Packer is in the "Starting the virtual machine..." state, the VM has already started and the "Press any key to start installation" screen is gone when Packer gets to the waiting state. Even when setting the boot wait to 0 seconds, Packer is too slow to type the boot commands. However, I suppose that`s another issue so I'll create one after some more testing.

Originally posted by @janegilring in https://github.com/hashicorp/packer/issues/6208#issuecomment-384878910"

KimRechnagel commented 5 years ago

Log output: ==> hyperv-iso: Creating build directory... ==> hyperv-iso: Retrieving ISO hyperv-iso: Using file in-place: file:///C:/Automation/ISO/Newest2016/windows2016.ISO ==> hyperv-iso: Starting HTTP server on port 8068 ==> hyperv-iso: Creating switch 'internal_switch' if required... ==> hyperv-iso: switch 'internal_switch' already exists. Will not delete on cleanup... ==> hyperv-iso: Creating virtual machine... ==> hyperv-iso: Enabling Integration Service... ==> hyperv-iso: Setting boot drive to os dvd drive C:/Automation/ISO/Newest2016/windows2016.ISO ... ==> hyperv-iso: Mounting os dvd drive C:/Automation/ISO/Newest2016/windows2016.ISO ... ==> hyperv-iso: Skipping mounting Integration Services Setup Disk... ==> hyperv-iso: Mounting secondary DVD images... ==> hyperv-iso: Mounting secondary dvd drive ./windows/2016/answer.iso ... ==> hyperv-iso: Configuring vlan... ==> hyperv-iso: Starting the virtual machine... ==> hyperv-iso: Attempting to connect with vmconnect... ==> hyperv-iso: Host IP for the HyperV machine: 192.168.10.103 ==> hyperv-iso: Typing the boot command... ==> hyperv-iso: Waiting for WinRM to become available...

When Packer gets to the "Typing the boot command...." part the VM is already way past the "Press any key to boot from cd or dvd" prompt.

I have tried to start up in headless mode but the VM still starts too fast. I'm not really sure if there is any solution to this other than building an ISO which doesn't prompt me to press a key to start the installation. I have had plenty of success with building generation 1 VMs on the same Windows 10 machine, but I don't see the prompt here though. Below is the template I'm using.

KimRechnagel commented 5 years ago

{ "builders": [ { "boot_wait": "0s", "boot_command": [ "aaaaaaa" ], "configuration_version":"9.0", "vm_name":"windows2016", "type": "hyperv-iso", "disk_size": 76800, "floppy_files": [], "secondary_iso_images": [ "./windows/2016/answer.iso" ], "headless": false, "http_directory": "./windows/common/http/", "guest_additions_mode":"disable", "iso_url": "../ISO/Newest2016/windows2016.ISO", "iso_checksum_type": "none", "iso_checksum": "e3779d4b1574bf711b063fe457b3ba63", "communicator":"winrm", "winrm_username": "vagrant", "winrm_password": "vagrant", "winrm_timeout" : "4h", "shutdown_command": "shutdown /s /t 10 /f /d p:4:2 /c \"Packer Shutdown\"", "ram_size": 2048, "cpu": 1, "generation": 2, "switch_name": "internal_switch", "enable_secure_boot":true } ], "provisioners": [ { "type": "powershell", "elevated_user":"vagrant", "elevated_password":"vagrant", "scripts": [ "./windows/common/cleanup.ps1" ] } ], "post-processors": [ { "type": "vagrant", "keep_input_artifact": false, "output": "{{.Provider}}_windows-2016.box" } ] }

SwampDragons commented 5 years ago

@marcinbojko I know you've done a lot with generation 2 windows VMs -- do you have any insights for a workaround here? I don't think there's really anything Packer can do here because Gen 2 vms just blast through the boot sequence so fast.

marcinbojko commented 5 years ago

@SwampDragons - what's funny - having a lots of different Hyper-V stacks (different baremetal and versions), different DVD/isos to test I can say one thing: it's unpredictable ;) Unfortunately, the only workaround I've found is to make boot loop with:

      "boot_command": [
        "a<enter><wait>a<enter><wait>a<enter><wait>a<enter>"
      ],
marcinbojko commented 5 years ago

@SwampDragons I'd suggest maybe using a feature called 'start delay', as it's better for packer to wait a sec or ten, then just let VM Gen2 to fly. image

The name of a feature is here:

 get-vm -Name ito-el6-n1.spcph.local|select name,automaticstartdelay

Name                   AutomaticStartDelay
----                   -------------------
ito-el6-n1.spcph.local                   0
SwampDragons commented 5 years ago

Startup_delay is a great hint! I'll add it to the hyper-v docs.

KimRechnagel commented 5 years ago

Hey guys, I really appreciate your suggestions on the issue here. Unfortunately the AutomaticStartDelay setting won't help much here as it doesn't slow down the boot process when the VM gets the initial start trigger.

What AutomaticStartDelay really does is preventing a boot storm when a hyper-v host, or an entire hyper-v cluster, running many VMs, are rebooted.

Example: VM1 is running on host1 VM1 has AutomaticStartDelay set to 60 seconds Host1 is rebooted VM1 was originally running on host1 prior to reboot so VM1 will automatically startup again when the hyper-v service has started Hyper-V waits 60 seconds before powering on/starting VM1 After 60 seconds VM1 powers on and runs through the boot process as fast as possible.

I'll take a look at the boot_command tweak suggested here. My current boot_command string is currently: "boot_command": [ "a<wait>a<wait>a<wait>a<wait>a<wait>a<wait>a" ],

It doesn't seem to have any effect in the VM though as I don't see the VM rebooting multiple times. It could actually work though I guess. I'll grab some screenshots in order to get you a better understanding of what happens at my end.

KimRechnagel commented 5 years ago

Hmm I tried the following settings, but the VM doesn't seem to get any input from packer at all: "boot_wait": "5s", "boot_command": ["<leftCtrlOn><leftAltOn><endOn><leftCtrlOff><leftAltOff><endOff><wait>a<enter>"],

marcinbojko commented 5 years ago

@KimRechnagel - my initial understanding of your problem was that packer was too slow to start interfering with VM's boot menu - in this case AutomaticStartDelay is a key to it. I don't recall to have these issues, even on super-duper fast hosts with SSD storage. Could you start gathering data? Packer version, what terminal you're using (cmd, powershell, conemu). Also I'd say - let's try change ISO as i recall some of latest releases ( I am using Partner channel though) had problems with boot_command - can you download and check just generic Windows 2016 Evaluation ISO? Last but not least, could you try my templates? https://github.com/marcinbojko/hv-packer

marcinbojko commented 5 years ago

@SwampDragons - it's not so great, as it has to be set by packer during the VM creation ;) I'd suggest to add this option to packer commands (of course in the code also) to be able to slow down a little for super fast VMs.

KimRechnagel commented 5 years ago

@marcinbojko Yes, packer is too slow to start interfering with the boot menu, or the VM is too fast for vmconnect.exe, which I can see in the code that packer is using, to connect to the VM.

I'm not trying to be rude, but AutomaticStartDelay has nothing to do with this issue as this setting works exactly as I described above. I tested it locally on my machine by setting AutomaticStartDelay to 10 seconds and then starting the VM. It doesn't delay anything after the start request has been sent to the VM, it just tells the host to wait X seconds to send the start request to the VM when the host eg. has been rebooted.

I'll test with another ISO and will also collect data about my system, versions etc. as per your suggestion.

Thanks for your feedback.

marcinbojko commented 5 years ago

@KimRechnagel - no worries, startdelay would be recommended in our first understanding of your problem - which we already ruled out.

KimRechnagel commented 5 years ago

Hmm maybe the "solution" could be as simple as getting packer to connect to the VM before sending the Start-VM cmdlet.

I just "tested" it manually and what happens is that I connect to the VM and see the black console. When I hit the start button it still takes vmconnect about 3-4 seconds to actually display the boot screen. I see the "Press any key to boot from CD or DVD..." for about 1 second before it times out and tries to PXE boot instead.

I guess the issue might just be that vmconnect.exe is too slow to connect. Well, I'll look into that as well.

marcinbojko commented 5 years ago

@KimRechnagel - what would happen if you'll switch to exhanced session (in vmconnect) for this particular packer VM?

KimRechnagel commented 5 years ago

@marcinbojko Enhanced session was already enabled. I disabled it but unfortunately it didn't change anything.

I did test something else, but it raises a lot of other challenges with DHCP/PXE etc. but if I change the boot order to be: Harddrive Network Adapter DVD Drive (my install ISO) DVD Drive (answer.iso with autounattend.xml etc.)

Then the VM waits for PXE to time out and vmconnect has plenty of time to connect to the VM. The problem with this is that then I only have a small window to send the boot commands during the end of the PXE timeout and when the "Press any key to boot...." times out. Furthermore if I had a DHCP/BOOTP on my network, that would complicate the boot process even more.

A question regarding boot_commands on hyper-v; The documentation states that I can add "On" to e.g. <LeftCtrl> in order for packer to hold down the key, which would allow me to send Ctrl, Alt, End (reboot). but it doesn't seem to work. Maybe because the scancodes haven't been implemented in the same way on hyper-v as e.g. VirtualBox, VMWare etc?

I tried with "boot_command": ["<leftCtrlOn><leftAltOn><endOn><leftCtrlOff><leftAltOff><endOff><wait>a<enter>"], But it didn't do anything. Well, maybe my issue is that the boot_command aren't sent at all :-)

I never saw the "Press any key to boot..." when creating Gen1 VMs, so I don't actually know if boot_command works on my setup.

Still waiting for the eval ISO to download.

marcinbojko commented 5 years ago

My current settings: image

KimRechnagel commented 5 years ago

How far into the installation is this? Did it just start? I don't see the bootmgfw.efi in my settings.

marcinbojko commented 5 years ago

That's interesting - my packer just goes through 3rd batch of WU. As far as I know (in 2016/2019) Gen2 machine should have this file.

KimRechnagel commented 5 years ago

I tested the templates you linked from your github repo. I used the ISO which I have downloaded from the VLSC site. Same issue. I'll test again with the eval ISO in about 20 minutes when it has finished downloading.

My settings with your template: image

KimRechnagel commented 5 years ago

It seems like your Hyper-V host is physical, or at least running on Server 2016? I'm testing with my llaptop with the latest version of windows 10. It might make a difference when building Gen 2 machines.

marcinbojko commented 5 years ago

True. I am not a windows guy, however I'll try with w10.

KimRechnagel commented 5 years ago

Ok the evaluation ISO finished downloading. I didn't change anything but the ISO, I used your templates... and it works. It's very odd... it seems like the eval ISO waits just about 1-2 seconds longer at the "Press any key to boot" prompt, which means that packer has time to connect and send the boot_command.

marcinbojko commented 5 years ago

Yup, that's what I noticed in thread you were mentioning. Switching to different ISO (Partner channel) broke my deployment flow. BLAME Microsoft?

KimRechnagel commented 5 years ago

It does not work with the template I modified myself. I tested two times now and the boot_command does not seem to be sent. I'll tweak the settings one line at a time until I figure out what triggers this.

KimRechnagel commented 5 years ago

Wow this is weird. I managed to "break" your template as well by changing: "iso_url": ".\\iso\\Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO",

To: "iso_url": "../ISO/Newest2016/Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO",

Changed it back, and it worked again

KimRechnagel commented 5 years ago

Well, now your template fails again. It consistently failed three times in a row. This is odd. There is a very very fine balance between when it works and not. I'll keep testing.

marcinbojko commented 5 years ago

I just tested with w10 1803 (don't have 1809 as it fails to upgrade). With Partner ISO i have no way of even booting, 'press a key displays for 1 second' and it's gone. @SwampDragons - I am sorry to say that but it's related to previous issue - packer has no chance to react so fast in current setup. As we probably cannot rely on Microsoft to rebuild all ISO images we need probably better controll over how fast vmconnect reacts.

With debug and headless: false:

==> hyperv-iso: Configuring vlan... ==> hyperv-iso: Starting the virtual machine... ==> hyperv-iso: Attempting to connect with vmconnect... ==> hyperv-iso: Host IP for the HyperV machine: 169.254.2.24 ==> hyperv-iso: Typing the boot command... 2019/02/06 11:30:05 packer.exe: 2019/02/06 11:30:05 Sending char 'a', code '1e9e', shift false 2019/02/06 11:30:05 packer.exe: 2019/02/06 11:30:05 Sending char 'a', code '1e9e', shift false 2019/02/06 11:30:05 packer.exe: 2019/02/06 11:30:05 Special code 'Press' '' found, replacing with: &{[1c] [9c]} 2019/02/06 11:30:05 packer.exe: 2019/02/06 11:30:05 Sending char 'a', code '1e9e', shift false 2019/02/06 11:30:05 packer.exe: 2019/02/06 11:30:05 Special code 'Press' '' found, replacing with: &{[1c] [9c]} 2019/02/06 11:30:05 packer.exe: 2019/02/06 11:30:05 Sending char 'a', code '1e9e', shift false 2019/02/06 11:30:05 packer.exe: 2019/02/06 11:30:05 Special code 'Press' '' found, replacing with: &{[1c] [9c]} 2019/02/06 11:30:11 packer.exe: 2019/02/06 11:30:11 [DEBUG] Unable to get address during connection step: No ip address. 2019/02/06 11:30:11 packer.exe: 2019/02/06 11:30:11 Waiting for WinRM, up to timeout: 8h0m0s

Before vmconnect displays, it's long gone, and I can get a PXE boot menu only.

KimRechnagel commented 5 years ago

I did some tests with your template marcinbojko here is the result as well as the changes I made during testing. I guess the conclusion is: Do not build Gen 2 VMs on Hyper-V using super fast storage.

Attempt #

  1. Failed
  2. Failed
  3. Failed
  4. Change: Set vmcomput.exe priority to low - worked once
  5. Failed
  6. Failed
  7. Change: Set vmcompute.exe affinity to CPU0 - worked once
  8. Failed
  9. Failed
  10. Failed
  11. Failed
  12. Change: Started five sequential checksum checks of the ISO with: fciv .\Windows_Server_2016_Datacenter_EVAL_en-us_14393_refresh.ISO - worked once
  13. Worked
  14. Change: Made an infinite loop checksum check .cmd file - Failed
  15. Failed
  16. Worked
  17. Failed
  18. Change: Started four parallel infinite checksum checks (poor SSD - CPU maxed out) - The VM took forever to start - Cancelled one checksum and the VM started - packer boot_command had timed out - Failed
  19. Change: Three parallel infinite checksum checks - Worked, I did see a boot manager shortly before the install started
  20. Change: Two parallel infinite checksum checks - Worked
  21. Worked
  22. Worked
  23. Worked
  24. Worked
  25. Worked
  26. Worked
  27. Tested my original template - still with two infinite checksum checks running - Worked
  28. Worked
  29. Worked
marcinbojko commented 5 years ago

Tested with EVAL ISO (in form of https://link) - works 100% time on SSD - boot manager from iso waits LONGER.

KimRechnagel commented 5 years ago

All the above tests were made with the evaluation ISO except for attempt 27-29. Evaluation ISO: https://www.microsoft.com/en-us/evalcenter/evaluate-windows-server-2016

marcinbojko commented 5 years ago

Hmmm... now it behaves even more funny (both isos). I can see PXE menu for 10-15 seconds. Packer still waits for WINRM (pass afterboot menu keystroke). Then out of the blue, when PXE fails, VM reboots to ISO DVD and continues deployment (succesfuly)

KimRechnagel commented 5 years ago

Did you change the boot order? It sound like your NIC is above the Install ISO.

marcinbojko commented 5 years ago

Nope. DVD still first.(iso), harddrive, network adapter, secondary iso. BUT I've enabled PackerDebug.

marcinbojko commented 5 years ago

The same with packer_log=0. It takes aprox 1 minute to timeout PXE then it just continues.

KimRechnagel commented 5 years ago

I wonder if anyone else has this problem. I started using packer last week as I'm going to build multiple base templates using packer for hyper-v and soon VMWare.

My next step is to get Ansible to interact with our hyper-v clusters and build VMs using the packer templates. I'm currently testing everything locally but will eventually move everything to a dedicated server. I guess when I move to a virtual server with hyper-v installed, then I probably won't run into this issue anymore.

KimRechnagel commented 5 years ago

When my build fails I see this for about 1 minute: image

Then this - it stays here until I stop the VM: image

marcinbojko commented 5 years ago

W10 1803 here, but I can test it into hv2019 cluster.

KimRechnagel commented 5 years ago

1809 here. I'll build a dedicated packer server and start testing on that instead.

marcinbojko commented 5 years ago

Yup, we'll compare notes.

KimRechnagel commented 5 years ago

@marcinbojko Thank you very very much for your help and feedback on this issue, I really appreciate it. I guess this "issue" is out of the hands of the packer developers as it's not really a packer coding issue.

marcinbojko commented 5 years ago

You're very welcome. I've started to test on 2019 end soon we'll know more.

marcinbojko commented 5 years ago

@KimRechnagel , @SwampDragons - I'd like to confirm what's been said: on W2019 (me) and w10 (Kim) packer is unable to boot from DVD if run on quite fast storage. In my case it was S2D built completly on SSD. I've used packer 1.3.5 from issue "spaces in switch name" as my switch does have a spaces in name.

SwampDragons commented 5 years ago

This is very frustrating but I don't think there's anything Packer can do about this; googling shows that this "windows moves through the boot screen too fast on gen 2 vms" issue exists for people who aren't using Packer, too. I'm going to mark this as an upstream bug and close, but if anyone has any good ideas for reliable workarounds, I'd love to add them to the documentation.

Thanks for all your help @marcinbojko.

marcinbojko commented 5 years ago

@SwampDragons sorry for answering to closed issue - I'd like to try aproach with -AutomaticStartDelay passed to New-VM or Set-Vm. So the sequence would be: run vmconnect and WAIT for VM to start. The problem is i have absolutely no clue about Golang. If it's not too much can you point me to a piece of code that builds or sets 'new-vm' or 'set-vm' part?

SwampDragons commented 5 years ago

Ah, sorry; didn't realize you were thinking of adding this option. The powershell scripts that comprize the hyperv driver are here, and the new-vm code specifically is here

The new-vm code uses golang templating to produce a minimal powershell script and allow us to work around passing a ton of parameters into our Powershell call.

KimRechnagel commented 5 years ago

@marcinbojko I tested your template on a standalone physical Dell Poweredge 815 Hyper-V 2012 R2 host with local harddrives. The funny thing is that I see the same behavior as you. The VM starts, I see the "Press any key" prompt for maybe 3-4 seconds (Packer seems to be connected here), then the VM goes into PXE boot, times out after 60 seconds goes back to "Press any key" and THEN starts the installation.

marcinbojko commented 5 years ago

2012/2016, windows 10 up to 1803. W10 1809/2019=packer unusable.

KimRechnagel commented 5 years ago

@marcinbojko Just an update. I have built a nested hyper-v host on a hyper-v 2016 cluster. I have used my original 1809 ISO from the VLSC site as well as the evaluation ISO and so far I have not had any issues with packer connecting too slow. It seems like vmconnect.exe connects way faster in my current setup, so missing the boot_command is not an issue.

marcinbojko commented 5 years ago

Interesting. DNS issues?

KimRechnagel commented 5 years ago

I don't think so, as it wouldn't make sense if vmconnect relies on DNS to lookup local VMs.