packer-community / packer-windows-plugins

A suite of Packer plugins for provisioning Windows machines
113 stars 20 forks source link

Batch file in shutdown_command non-zero after 1.0.0 #51

Open timsutton opened 9 years ago

timsutton commented 9 years ago

The weird side effects of me trying to sysprep as a shutdown_command continue!

Using the released 1.0.0 binaries, I can put this command into a batch file:

c:\windows\system32\sysprep\sysprep.exe /oobe /generalize /shutdown

..and call it using `shutdown_command: "a:\run_sysprep.bat"

I can add an /unattend:c:\path\to\unattend.xml flag as well.

With the latest 1.0.1 prerelease (https://github.com/packer-community/packer-windows-plugins/releases/tag/pre-release), I get a non-zero exit status on the shutdown:

2015/05/11 13:48:00 ui: ==> vmware-windows-iso: Gracefully halting virtual machine...
==> vmware-windows-iso: Gracefully halting virtual machine...
2015/05/11 13:48:00 packer-builder-vmware-windows-iso: 2015/05/11 13:48:00 Executing shutdown command: a:\run_sysprep.bat
2015/05/11 13:48:00 packer-builder-vmware-windows-iso: 2015/05/11 13:48:00 starting remote command: a:\run_sysprep.bat
2015/05/11 13:48:33 packer-builder-vmware-windows-iso: 2015/05/11 13:48:33 Executing: /Applications/VMware Fusion.app/Contents/Library/vmrun [-T fusion list]
2015/05/11 13:48:33 packer-builder-vmware-windows-iso: 2015/05/11 13:48:33 stdout: Total running VMs: 1
2015/05/11 13:48:33 packer-builder-vmware-windows-iso: /Users/cda/windows-builds/output-vmware-windows-iso/packer-vmware-windows-iso.vmx
2015/05/11 13:48:33 packer-builder-vmware-windows-iso: 2015/05/11 13:48:33 stderr:
2015/05/11 13:48:33 ui: ==> vmware-windows-iso: Stopping virtual machine...
==> vmware-windows-iso: Stopping virtual machine...
2015/05/11 13:48:33 packer-builder-vmware-windows-iso: 2015/05/11 13:48:33 Executing: /Applications/VMware Fusion.app/Contents/Library/vmrun [-T fusion stop output-vmware-windows-iso/packer-vmware-windows-iso.vmx hard]
2015/05/11 13:48:35 packer-builder-vmware-windows-iso: 2015/05/11 13:48:35 stdout:
2015/05/11 13:48:35 packer-builder-vmware-windows-iso: 2015/05/11 13:48:35 stderr:
2015/05/11 13:48:35 packer-builder-vmware-windows-iso: 2015/05/11 13:48:35 Deleting floppy disk: /var/folders/61/snzd00312v76hn7yzjk_mmg00000kr/T/packer251798140
2015/05/11 13:48:35 ui: ==> vmware-windows-iso: Deleting output directory...
==> vmware-windows-iso: Deleting output directory...
2015/05/11 13:48:35 ui error: Build 'vmware-windows-iso' errored: Shutdown command has non-zero exit status.

Stdout: 
C:\Users\vagrant>c:\windows\system32\sysprep\sysprep.exe /oobe /generalize /shutdown 

Stderr: 
2015/05/11 13:48:35 Builds completed. Waiting on interrupt barrier...
2015/05/11 13:48:35 machine readable: error-count []string{"1"}
2015/05/11 13:48:35 ui error: 
==> Some builds didn't complete successfully and had errors:
2015/05/11 13:48:35 machine readable: vmware-windows-iso,error []string{"Shutdown command has non-zero exit status.\n\nStdout: \r\nC:\\Users\\vagrant>c:\\windows\\system32\\sysprep\\sysprep.exe /oobe /generalize /shutdown \r\n\n\nStderr: "}
2015/05/11 13:48:35 ui error: --> vmware-windows-iso: Shutdown command has non-zero exit status.

Stdout: 
C:\Users\vagrant>c:\windows\system32\sysprep\sysprep.exe /oobe /generalize /shutdown 

Stderr: 
2015/05/11 13:48:35 ui: 
==> Builds finished but no artifacts were created.
2015/05/11 13:48:35 waiting for all plugin processes to complete...
Build 'vmware-windows-iso' errored: Shutdown command has non-zero exit status.

Stdout: 
C:\Users\vagrant>c:\windows\system32\sysprep\sysprep.exe /oobe /generalize /shutdown 

Stderr: 

==> Some builds didn't complete successfully and had errors:
--> vmware-windows-iso: Shutdown command has non-zero exit status.

Stdout: 
C:\Users\vagrant>c:\windows\system32\sysprep\sysprep.exe /oobe /generalize /shutdown 

Stderr: 

==> Builds finished but no artifacts were created.
2015/05/11 13:48:35 /Users/cda/.packer.d/plugins/packer-provisioner-powershell: plugin process exited
2015/05/11 13:48:35 /Users/cda/.packer.d/plugins/packer-builder-vmware-windows-iso: plugin process exited

At first I thought this was related to a powershell provisioning step (since as far as I could tell the current prerelease only includes a change to the powershell timeout), but removing the step I still see the same issue. So currently, the template I'm testing has no provisioning steps.

Any ideas? Thanks!

mefellows commented 9 years ago

That is really weird. I really can't see how that would work between versions.

That being said, the VMWare builder does seem to care about the exit status, unlike the other builders. This would appear to be the problem you're having - Windows is returning a non-zero status code, but (as we've seen) this quite possibly could be expected (a quick Googling didn't help ).

Also, it only took 33s - does that sound right? Is there any way of you validating that the script ran successfully (and what the actual exit code was)? That would help us track down whether or not the exit code was in fact a failure vs expected.

In the meanwhile, I'll look for differences between those versions.

If it is expected that non-zero exit codes are allowed in sysprep during shutdown_command, then we could look to enhance the builder with the ability to specify valid exit codes (as per the Powershell provisioner). Otherwise there really could be a problem!

mefellows commented 9 years ago

I've just updated the VMWare builder to log the exit code on failure - this is a nice thing to do anyway, but should be helpful to your own diagnosis. You should be able to get the latest pre-release shortly and then we can check what the exit code actually is and what it means.

timsutton commented 9 years ago

The length of time is about right, yes. I skipped the /unattend option to remove that variable, which makes the time to sysprep very short. This morning, the length of time is about the same for both 1.0.0 and the current prerelease (16 and 19 seconds respectively).

Thanks for adding the exit code reporting for the vmware builder. The exit code is 1 when it fails - but still, if I revert the plugins back to the 1.0.0 release, there's no failing on non-zero exit codes. Isn't that insane? I'm only working with the released binaries at this point, I haven't tried building myself from different points in the master. Maybe I'll try that.

I was able to verify that at least the sysprep.exe command was executed in both cases, just from quickly jumping to the task manager. It's hard to further diagnose the output, however - if the build fails, Packer deletes the output machine vmx and files.. and running sysprep needs either a shutdown or reboot, so I can't easily type out the setuperr.log file. I can also make a temporary modification to the builder code so that it doesn't consider it an error and just keep going, just so I can debug further with the resultant machine.

timsutton commented 9 years ago

Some more diagnosis today: I built the plugin myself from HEAD, and then from the v1.0.0 tag, and both runs failed with the non-zero exit error. Other deps were at HEAD, however.

Looking further, masterzen/winrm#25 was merged after 1.0.0 and before the current builds. In my case the command finished before 60 seconds, but I'm still wondering if maybe it's related (this for example: https://github.com/hach-que/winrm/commit/0a5c43281845807dea5e920d5102f998b1811569). I don't see much other changes in this short timeframe.

So, I should go back to an earlier winrm and verify. One question: If I want to rebuild the plugins based on the earlier earlier revision of winner, is it enough to just checkout an earlier commit in github.com/masterzen/winrm and re-run the packer-windows-plugins build script, or should I nuke the winrm binaries as well?

mefellows commented 9 years ago

Nice work, that looks like a promising lead! WRT to bisecting the issue, you shouldn't have to blow away any binaries, the correct source revision as you suggest should suffice to build a set of Packer plugins as intended.

timsutton commented 9 years ago

And.. that was totally it. Specifically this commit: https://github.com/masterzen/winrm/commit/0a5c43281845807dea5e920d5102f998b1811569

Not being familiar with this codebase it's a bit hard for me to follow exactly why this is the case - I read that commit as returning the error of the remote command execution, rather than nil. Might this be because the actual execution of sysprep causes a restart/shutdown, and therefore causes an interruption of the remote execution (which returns an error)? If that's the case and there's no other clear way to shutdown a machine with a sysprep, then perhaps your suggestion of adding a "valid exit codes" (a la https://github.com/packer-community/packer-windows-plugins/blob/master/provisioner/powershell/provisioner.go#L80-L83) might be a feasible workaround? It sucks that this would mean another template config item like: shutdown_command_valid_exit_codes, though..

Thanks for the link on Slack about exit codes and PS, btw :smile:

mefellows commented 9 years ago

Great work! So the first part of the problem is solved - what changed. The second part might be harder. Have you been able to determine yet if the sysprep actually worked? I reckon if you remove the check in the VMWare builder for the exit code and let it finish you could test by spinning up the resultant image and seeing if its magic was performed. If it has, then your theory might be correct. If not then we march on until we find out the cause of the exit code.

As a side note, to avoid the packer / vm import/startup costs you might want to consider using the winrm client (from the library you referred to) on a running machine to test sysprep. Hopefully that should get you to a repro faster.

timsutton commented 9 years ago

Yeah, the sysprep completes successfully. I can boot the machine and things are sysprepped as expected.

I built winrm and tried to duplicate the action exactly at the CLI by calling sysprep within a bat file, as I did via the Packer template shutdown:

$ ./winrm -hostname 172.16.176.128 -username vagrant -password vagrant "c:\run_sysprep.bat"

C:\Users\vagrant>c:\windows\system32\sysprep\sysprep.exe /shutdown /oobe /generalize 

$ $?
-bash: 0: command not found

According to this that command exits with the exit code returned by the client, and as you can see above that exits 0. And this is the winrm client built from the current master, with those timeout/error fixes in.

So that gives me more confidence that the sysprep command itself is returning zero.. and makes me wonder if the difference is just in how Packer invokes the communicator, and there is some condition that results in error output. It still seems odd to me that the commit I pointed out changed the behaviour since it does not change the ExitCode, but rather the error contents. But it also looks like a less-than-straightforward code path following the execution of the command through winrm and back, so I could understand that there's some reason the exit status is different when actually invoked by Packer?

escapenguin commented 9 years ago

For what it's worth, I was seeing similar if not the same behavior using vmware-windows-iso with a plain jane packer build. It would get to the end and bomb out on shutdown_command even though the generated image was fine:

    vmware-windows-iso: ==> Script exiting with errorlevel 0
==> vmware-windows-iso: Gracefully halting virtual machine...
==> vmware-windows-iso: Stopping virtual machine...
==> vmware-windows-iso: Deleting output directory...
Build 'vmware-windows-iso' errored: Shutdown command has non-zero exit status

After that, shutdown would begin writing its usage stuff a couple times as if it didn't like what it was receiving. Couldn't get an exit code, which was a little strange. After installing the pre-release binaries, everything worked fine.

mefellows commented 9 years ago

I'm convinced there is a defect. Just need to set aside some time to work thru. Perhaps @masterzen has something to add?

mefellows commented 9 years ago

Issue has been identified, see this comment in #54 for more detail.