hashicorp / packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
http://www.packer.io
Other
15.08k stars 3.33k forks source link

Packer provisioners not completing entirely #9757

Open ghost opened 4 years ago

ghost commented 4 years ago

When filing a bug, please include the following headings if possible. Any example text in this template can be deleted.

Overview of the Issue

A paragraph or two about the issue you're experiencing. Not sure if there is a current card/solution for this, but I am experiencing issues with my packer provisioners not completing/skipping to the next provisioner before completing. This doesnt happen during every run but it does happen frequently. My provisioners are a series of powershell scripts that execute at time of AMI bake in AWS. At times it will recognize the script and run the provisioner flawlessly and at other times completely skip. Example below of the skip: image

In the example above install_dependencies.ps1 have a bunch of Write-Host messages that arent writing to log and after checking the server the services that are supposed to be installed are also missing on the ec2.

Reproduction Steps

Steps to reproduce this issue

Packer version

From packer version

Simplified Packer Buildfile

If the file is longer than a few dozen lines, please include the URL to the gist of the log or use the Github detailed format instead of posting it directly in the issue.

Operating system and Environment details

OS, Architecture, and any other information you can provide about the environment.

Log Fragments and crash.log files

Include appropriate log fragments. If the log is longer than a few dozen lines, please include the URL to the gist of the log or use the Github detailed format instead of posting it directly in the issue.

Set the env var PACKER_LOG=1 for maximum log detail.

nywilken commented 4 years ago

Thanks for the bug report @TravRat. We'll try to take a look as soon as we can.

In the meantime, is it possible to get a redacted log PACKER_LOG=1 packer build template.json and a sample of your template so that we can try to reproduce. If you could provide a sample of the script you are running that may help.

As for this being an open issue we have seen other issues, which may still be open, about the provisioner exiting too soon. That may be able to provide some helpful tips.

SwampDragons commented 4 years ago

I wonder if this is a duplicate of https://github.com/hashicorp/packer/issues/4916. Are you setting $LastExitCode in your script? If it exits without a failing code, Packer doesn't know it's failed.

ghost commented 4 years ago

@nywilken PACKER_LOG=1 has been added:

·[1;32m==> amazon-ebs: Pausing 15s before the next provisioner...·[0m 658 | ·[1;32m==> amazon-ebs: Provisioning with Powershell...·[0m 659 | ·[1;32m==> amazon-ebs: Provisioning with powershell script: ./iac/Packer/install_dependencies.ps1·[0m 660 | 2020/08/13 15:16:47 packer: 2020/08/13 15:16:47 Opening ./iac/Packer/install_dependencies.ps1 for reading 661 | 2020/08/13 15:16:47 packer: 2020/08/13 15:16:47 Uploading env vars to c:/Windows/Temp/packer-ps-env-vars-5f355747-8339-85de-2e00-ee0ddb53a5d4.ps1 662 | 2020/08/13 15:16:47 packer: 2020/08/13 15:16:47 [INFO] 76 bytes written for 'uploadData' 663 | 2020/08/13 15:16:47 [INFO] 76 bytes written for 'uploadData' 664 | 2020/08/13 15:16:47 packer: 2020/08/13 15:16:47 Uploading file to 'c:/Windows/Temp/packer-ps-env-vars-5f355747-8339-85de-2e00-ee0ddb53a5d4.ps1' 665 | 2020/08/13 15:16:48 packer: #< CLIXML 666 | 2020/08/13 15:16:48 packer: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first use.0-1-1Completed-1 #< CLIXML 667 | 2020/08/13 15:16:48 packer: 2020/08/13 15:16:48 [INFO] 1515 bytes written for 'uploadData' 668 | 2020/08/13 15:16:48 [INFO] 1515 bytes written for 'uploadData' 669 | 2020/08/13 15:16:48 packer: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first use.0-1-1Completed-1 2020/08/13 15:16:48 Uploading file to 'c:/Windows/Temp/script-5f355747-11c6-2a10-5190-0e0468e3ffc4.ps1' 670 | 2020/08/13 15:16:49 packer: #< CLIXML 671 | 2020/08/13 15:16:49 packer: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first use.0-1-1Completed-1 #< CLIXML 672 | 2020/08/13 15:16:49 packer: System.Management.Automation.PSCustomObjectSystem.Object1Preparing modules for first use.0-1-1Completed-1 2020/08/13 15:16:49 [INFO] starting remote command: powershell -executionpolicy bypass "& { if (Test-Path variable:global:ProgressPreference){set-variable -name variable:global:ProgressPreference -value 'SilentlyContinue'};. c:/Windows/Temp/packer-ps-env-vars-5f355747-8339-85de-2e00-ee0ddb53a5d4.ps1; &'c:/Windows/Temp/script-5f355747-11c6-2a10-5190-0e0468e3ffc4.ps1'; exit $LastExitCode }" 673 | ·[0;32m amazon-ebs: Grabbing the software from bucket...·[0m 674 | 2020/08/13 15:16:53 packer: 2020/08/13 15:16:53 [INFO] command 'powershell -executionpolicy bypass "& { if (Test-Path variable:global:ProgressPreference){set-variable -name variable:global:ProgressPreference -value 'SilentlyContinue'};. c:/Windows/Temp/packer-ps-env-vars-5f355747-8339-85de-2e00-ee0ddb53a5d4.ps1; &'c:/Windows/Temp/script-5f355747-11c6-2a10-5190-0e0468e3ffc4.ps1'; exit $LastExitCode }"' exited with code: 0 675 | 2020/08/13 15:16:53 packer: 2020/08/13 15:16:53 [INFO] RPC endpoint: Communicator ended with: 0 676 | 2020/08/13 15:16:53 [INFO] 0 bytes written for 'stderr' 677 | 2020/08/13 15:16:53 [INFO] 37 bytes written for 'stdout' 678 | 2020/08/13 15:16:53 [INFO] RPC client: Communicator ended with: 0 679 | 2020/08/13 15:16:53 [INFO] RPC endpoint: Communicator ended with: 0 680 | 2020/08/13 15:16:53 packer: 2020/08/13 15:16:53 [INFO] 37 bytes written for 'stdout' 681 | 2020/08/13 15:16:53 packer: 2020/08/13 15:16:53 [INFO] 0 bytes written for 'stderr' 682 | 2020/08/13 15:16:53 packer: 2020/08/13 15:16:53 [INFO] RPC client: Communicator ended with: 0 683 | 2020/08/13 15:16:53 [INFO] (telemetry) ending powershell 684 | 2020/08/13 15:16:53 [INFO] (telemetry) Starting provisioner file ·[1;32m==> amazon-ebs: Pausing 15s before the next provisioner...·[0m

Sample packer.json:

{
  "variables": {
    "build_version": "1.0.0",
    "aws_instance_type": "t2.medium",
    "aws_region": "us-east-1",
    "aws_source_ami": "",
    "aws_security_group_id": "sg-exampleBlah",
    "aws_subnet_id": "subnet-exampleBlah",
    "aws_vpc_id": "vpc-exampleBlah",
    "iam_instance_profile": "exampleBlah",
    "pipeline_exec_id": "",
    "pipeline_name": "exampleBlah-XXXXX"
  },

  "builders": [
    {
      "type": "amazon-ebs",
      "region": "exampleBlah-XXXXX",
      "source_ami": "exampleBlah-XXXXX",
      "instance_type": "exampleBlah-XXXXX",
      "temporary_key_pair_name": "exampleBlah-XXXXX",
      "communicator": "winrm",
      "winrm_username": "Administrator",
      "user_data_file": "./iac/Packer/packer_userdata.ps1",
      "winrm_use_ssl": true,
      "winrm_insecure": true,
      "winrm_port": XXXX,
      "windows_password_timeout": "20m",
      "iam_instance_profile": "{{user `iam_instance_profile`}}",
      "security_group_id": "{{user `aws_security_group_id`}}",
      "subnet_id": "{{user `aws_subnet_id`}}",
      "vpc_id": "{{user `aws_vpc_id`}}",
      "ami_name": "exampleBlah-XXXXX",
      "ami_description": "exampleBlah-XXXXX",
    }
  ],
  "provisioners": [
    {
      "type": "powershell",
      "pause_before": "15s",
      "script": "./iac/Packer/install_dependencies.ps1"
    },
    {
      "type": "file",
      "pause_before": "15s",
      "source": "./Ec2-AutoScallingGroup-SampleApp/",
      "destination": "C:\\inetpub\\wwwroot"
    },
    {
      "destination": "C:\\Program Files\\Amazon\\SSM\\Plugins\\awsCloudWatch\\AWS.EC2.Windows.CloudWatch.json",
      "source": "./iac/Packer/CloudWatch/AWS.EC2.Windows.CloudWatch.json",
      "type": "file",
      "pause_before": "15s"
    },
    {
      "type": "powershell",
      "pause_before": "120s",
      "script": "./iac/Packer/install_aws_services.ps1"
    }
  ],

  "post-processors": [
    {
      "type": "manifest",
      "output": "manifest.json",
      "strip_path": true
    }
  ]
}

install_dependencies.ps1 script is nothing fancy pretty simple:

#  Get sample app file from s3
try{
    Write-Host("Grabbing the software from bucket...")
    copy-s3object -bucketname packerSampleScript -keyprefix packerSampleScript1 -localfolder c:\packerSampleScript;
    Write-Host("Successfully copied software from bucket...")
}catch{
    write-host ("Failed to copy windows sample app file...")
}
get-childitem -Path c:\packerSampleScript

# Install sample software
try{
    Write-Host ("Installing sample app service...")
    $params = @{
        Name = "SampleApp"
        BinaryPathName = "c:\packerSampleScript\packerSampleScript.exe"
        DisplayName = "SampleApp"
        StartupType = "Automatic"
        Description = "This is the sample app service."
        }
    New-Service @params

}catch{
    Write-Host "Failed to create sample app service..."
}
# Start Sample App service
try{
    Write-Host "Starting sample app service..."
    Start-Service -Name SampleApp
    Get-Service -Name SampleApp
}catch{
    Write-Host "Failed to start Sample App service..."
}
ghost commented 4 years ago

@SwampDragons we are not

SwampDragons commented 4 years ago

Okay, so that suggests to me this is a duplicate of #4916 and what's happening is that your script is failing, but not telling Packer there's an error, so Packer just says "exit status is 0, so all must be good" and moves on to run the next provisioner rather than returning an error and aborting the build.

The other possibility is that your install_dependencies.ps1 script is triggering a reboot which causes Packer to disconnect before the script has finished running. This is a known issue; winrm doesn't really handle disconnects and reconnects nicely. We recommend using https://www.packer.io/docs/provisioners/windows-restart to handle reboots so Packer knows when it's rebooting and is able to wait for the reboot to complete. This means using a different provisioner between each reboot.

ghost commented 4 years ago

@SwampDragons thanks for the advice will present what I find, at the moment Im noticing after removing some winRM stops from the provisioner script taking place prior it isn't helping Im going to try some other things and let you know thanks

tylernap commented 4 years ago

I have been working with @TravRat on this for a while and it looks like we do have a race condition between the instance userdata and the provisioners. In order to get the instance joined to our domain, we need to configure WinRM and then restart the service. I suspect that this would be why the script is suddenly stopping and continuing. We are currently looking at adding more time in the pause_before parameter to allow the userdata to properly do its thing.

In the meantime, is there a way for Packer to be a little more strict or nuanced on how it handles WinRM interruptions? Or is there anyway that Packer can find out the status of the userdata execution?

ghost commented 4 years ago

also @SwampDragons is there a builder we have to add to:

  "builders": [
    {
      "type": "amazon-ebs",
      "region": "exampleBlah-XXXXX",
      "source_ami": "exampleBlah-XXXXX",
      "instance_type": "exampleBlah-XXXXX",
      "temporary_key_pair_name": "exampleBlah-XXXXX",
      "communicator": "winrm",
      "winrm_username": "Administrator",
      "user_data_file": "./iac/Packer/packer_userdata.ps1",
      "winrm_use_ssl": true,
      "winrm_insecure": true,
      "winrm_port": XXXX,
      "windows_password_timeout": "20m",
      "iam_instance_profile": "{{user `iam_instance_profile`}}",
      "security_group_id": "{{user `aws_security_group_id`}}",
      "subnet_id": "{{user `aws_subnet_id`}}",
      "vpc_id": "{{user `aws_vpc_id`}}",
      "ami_name": "exampleBlah-XXXXX",
      "ami_description": "exampleBlah-XXXXX",
    }

that allows us to say if the $lastExitCode is 1 stop the build? If I define $lastExitCode in my winRM script to determine failure or not what builder or identifier does Packer use to determine that it is a 1 or 0?

SwampDragons commented 4 years ago

@TravRat I believe that Inside your ps script, you need to call "exit $lastExitCode", and Packer will pick up the exit code from there. We have an option valid_exit_codes which you can use to set which codes will be considered successful. By default, exiting with anything other than a 0 will lead Packer to determine that the provisioning run was a failure.

@tylernap there isn't really any way for Packer to know the userdata has finished running. I'd recommend having your user data script create a file , e.g. finished.txt, and have the first part of your first provisioner check for that file and wait until it's present before continuing. Something like while (!(Test-Path "C:\finished.txt")) { Start-Sleep 10 }

Packer uses a third party library for its winrm connection, and it could probably use some improvements. I'm not sure when I'll be able to prioritize that work, though.

Probably a better choice than pause_before in the provisioner is the similar-but-different pause_before_connecting communicator option. Set this to a long enough wait so that it waits until after you domain join and this should allow you to refresh your winrm connection. There was a recent bug I fixed in that option so you may want to wait to try this until the 1.6.2 release later this week, or use the "nightly" release: https://github.com/hashicorp/packer/releases/tag/nightly

jcarasick-geog8r commented 3 years ago

I have the same (or similar) problem. I have a packer build file trying to provision a windows 2019 aws instance. If i use the "Administrator" account that comes with the EC2 as the winrm user and allow packer to determine the password, the provisioning executes in a serial fashion. If i create a local user in the Administrators group, and use that username/password in the winrm user and password, i get the provisioners that don't seem to complete (they seem to literally just stop executing or get cutoff), logs do not track any failure the process just stops. If i add in to the build file, "pause_before": "60m", for example in the following step, the previous provisioner will complete what seems to be successfully.

kintarowins commented 2 years ago

I'm running into a very similar problem and adding "windows-restart" provisioner AFTER my dependencies installation script seems to be helping. This was very annoying as it doesn't show you any messages or clues as to why it just skips straight to finishing. Took me almost a day of trial and error to finally found this thread.

FlorianHeigl commented 2 years ago

Be proud that you found it so fast!

kintarowins @.***> schrieb am Do., 1. Sept. 2022, 04:30:

I'm running into a very similar problem and adding "windows-restart" provisioner AFTER my dependencies installation script seems to be helping. This was very annoying as it doesn't show you any messages or clues as to why it just skips straight to finishing. Took me almost a day of trial and error to finally found this thread.

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/packer/issues/9757#issuecomment-1233661851, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVSVG5WF3KNVJUKLNXNCJTV4AIN5ANCNFSM4P5F7XEQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>