packer-community / packer-windows-plugins

A suite of Packer plugins for provisioning Windows machines
112 stars 20 forks source link

Reboot provisioner has stopped working - can't figure out what has changed. #54

Open robcoward opened 9 years ago

robcoward commented 9 years ago

While debugging my last issue with windows updates timing out after 2 hours, I definitely had a series of reboot provisioners successfully rebooting a Win2012 packer build. Now we have solved the timeout problems, it seems the reboot provisioner is now returning non-zero and killing the packer run.

What I am trying to achieve is: fresh win2012 build from ISO, install windows upates, reboot, install further updates, reboot, one final check for outstanding updates, create vagrant box file.

The provisioners section of my packer template looks like this:

        {
            "type": "powershell",
            "elevated_user": "{{user `winrm_userid`}}",
            "elevated_password": "{{user `winrm_password`}}",
            "inline": [
                "Import-Module Boxstarter.WinConfig",
                "Install-WindowsUpdate -acceptEula -suppressReboots"
            ]
        },
        {
            "type": "restart-windows"
        },
        {
            "type": "powershell",
            "elevated_user": "{{user `winrm_userid`}}",
            "elevated_password": "{{user `winrm_password`}}",
            "inline": [
                "Import-Module Boxstarter.WinConfig",
                "Install-WindowsUpdate -acceptEula -suppressReboots"
            ]
        },
        {
            "type": "restart-windows"
        },

Running packer with debug output gives the following:

2015/05/18 13:08:02 ui: ==> VBoxWin2012: Restarting Machine
2015-05-18 13:08:02 ==> VBoxWin2012: Restarting Machine
2015/05/18 13:08:02 packer-builder-virtualbox-windows-iso: 2015/05/18 13:08:02 starting remote command: shutdown /r /c "packer restart" /t 5 && net stop winrm
2015/05/18 13:08:03 packer-builder-virtualbox-windows-iso: 2015/05/18 13:08:03 [INFO] RPC endpoint: Communicator ended with: 1
2015/05/18 13:08:03 [INFO] 0 bytes written for 'stderr'
2015/05/18 13:08:03 [INFO] 0 bytes written for 'stdout'
2015/05/18 13:08:03 [INFO] RPC client: Communicator ended with: 1
2015/05/18 13:08:03 [INFO] RPC endpoint: Communicator ended with: 1
2015/05/18 13:08:03 packer-provisioner-restart-windows.orig: 2015/05/18 13:08:03 [INFO] RPC client: Communicator ended with: 1
2015/05/18 13:08:03 packer-provisioner-restart-windows.orig: 2015/05/18 13:08:03 [INFO] 0 bytes written for 'stdout'
2015/05/18 13:08:03 packer-provisioner-restart-windows.orig: 2015/05/18 13:08:03 [INFO] 0 bytes written for 'stderr'
2015/05/18 13:08:03 ui: ask: ==> VBoxWin2012: Pausing before cleanup of step 'StepUploadGuestAdditions'. Press enter to continue.

The VM does reboot, but packer then terminates due to the non-zero exit from the provisioner.

If I try to manually run the shutdown command from a powershell window, it throws the following error - "The token '&&' is not a valid statement separator in this version."

I have even tried reverting to an earlier version of the packer-provisioner-restart-windows binary but I still get the same error.

Can anyone suggest anywhere else I can try looking to figure out why the reboot is not working ?

timsutton commented 9 years ago

On first glance it looked to me like the cause may be similar as for #51. You could try building the plugins yourself, but first checking out masterzen/winrm to an earlier revision: https://github.com/masterzen/winrm/commit/813d86ee814a2d07cb1153d0c1cb922f3f8239b7

Or if you wanted to try the earlier binaries, I think the one you would want to revert to would be packer-communicator-winrm, since it's this plugin which depends on winrm, which recently had a change in how it reports back error status.

robcoward commented 9 years ago

Thanks @timsutton - I've pulled the packer-communicator-winrm executable from the 1.0.0-RC zip file (from 25th March) and it still has the same problem. I'll try sorting out a build environment next and try as you suggest to build with an earlier version of winrm.

mefellows commented 9 years ago

I have been able to repro this with a minimal setup - packer 0.7.5 + pre-release (sha f1267c26f35f03cecc687bed1928e6fd7ca3d06d) and HEAD locally.

As @timsutton has identified, masterzen/winrm@813d86e is the commit that broke this behaviour. Unfortunately, that new behaviour seems sensible - it assumes the command fails unless it gets a valid exit code from the remote command. In our case, the restart (and shutdown in the case of #51) prevents the output from being slurped as that request is met with a unknown error Post http://127.0.0.1:6142/wsman: EOF.

I suppose we could look at improving the slurp mechanism in masterzen/winrm (such as detecting such edge cases as this), but it does feel like we're working around the problem and that's definitely not ideal at the library level. Conversely, we could look to excuse non-zero exit codes in our restart-windows and vmware-* builders so that the shutdown/restart behaviour is simplified, at the small expense of not detecting certain user-errors. I'm in favour of the latter approach, but welcome ideas.

@dylanmei what are your thoughts on this one?

pecigonzalo commented 9 years ago

Im experiencing the same issues, does anyone have a workaround till this is fixed? im trying to reboot after sp1 install. Tried

{
      "type": "powershell",
      "inline": [ "echo \"Restart Computer\"; Start-Sleep 5; Restart-Computer -force; echo \"Stop WinRM\"; Stop-Service WinRM" ]
    },

without luck as i get non-zero (1) exit or EOF error

pecigonzalo commented 9 years ago

Ok, found a working workaround for now:

{
      "type": "powershell",
      "inline": [ "echo \"Restart Computer\"; Start-Sleep 5; Restart-Computer -force" ]
    },
{
      "pause_before": "20s",
      "type": "powershell",
      "inline": [ "echo \"Sleep 60\"; Start-Sleep 60" ]
    },

The key is to pause before the next command as that gives time for windows to shutdown winrm and such. Also worth mentioning that adding a net stop winrm or Stop-Service winrm makes it return exit code 1 and fail.

mefellows commented 9 years ago

Sorry for the delay @pecigonzalo, I will look at this again over the weekend if I can (I should have some time). Looks like you have a workaround, alternatively you could look at modifying the Go source code and disabling the check for non-zero response in this plugin.

pecigonzalo commented 9 years ago

No problems @mefellows, i appreciate your work., im playing around with the code, will submit a pull request if i make any good progress although i would like to try and work on a permanent fix. I did some tests and seems like terminating WinRM does not exit the connection, it waits and timesout , ill try it when i get back on my lab today piping a Exit inside of the command ... maybe that gives us a better result.

pecigonzalo commented 9 years ago

We should either close this or #51 to avoid duplicated information. ~~I think the issue is related to this: https://github.com/masterzen/winrm/blob/132339029dfa67fd39ff8edeed2af78f2cca4fbb/winrm/command.go#L40~~

That change came as part of: https://github.com/masterzen/winrm/pull/25

Check that the default error code to be returned is 1 and i believe due to the command not returning an error code because it exists without one when winrm is terminated it returns 1.~

Tested this and did not fix anything, although i welcome someone else verifying im correct as go is not my goto lang.

pecigonzalo commented 9 years ago

I made some changes but cant make the code build:

==> Getting dependencies...
# github.com/packer-community/packer-windows-plugins/common
common\winrm_config.go:22: undefined: packer.ConfigTemplate
# github.com/packer-community/packer-windows-plugins/provisioner/powershell
provisioner\powershell\provisioner.go:86: undefined: packer.ConfigTemplate
# github.com/packer-community/packer-windows-plugins/provisioner/restart
provisioner\restart\provisioner.go:30: undefined: packer.ConfigTemplate
# github.com/packer-community/packer-windows-plugins/provisioner/windows-shell
provisioner\windows-shell\provisioner.go:63: undefined: packer.ConfigTemplate
Makefile:9: recipe for target 'dev' failed
bin\make.exe: *** [dev] Error 2

Seems like ConfigTempalte was removed from packer, if you check chef-solo or chef it implements its own ConfigTemplate probably exactly because of this.

dylanmei commented 9 years ago

There have been a number of refactors in Packer core that we are behind on just now. Coding is going to be rough until the dust settles.

pecigonzalo commented 9 years ago

Yeah that is why i oppose to go pointing to head instead of to a release tag for deps, i think thats a major flaw. Now we have to manually figure out the correct commit point for those dependencies if we want to compile version 1.0.0 or refactor the entire solution.

Seems like the main packer project is importing the work of this project, should we continue this work over there? Otherwise their version compiles, so could just pull in the changes manually.

mitchellh commented 9 years ago

I read through this whole thing. This is now in Packer core, I'd love to find a solution for this, but I don't thinkI have anything meaningful to add yet :)