hashicorp / packer-plugin-amazon

Packer plugin for Amazon AMI Builder
https://www.packer.io/docs/builders/amazon
Mozilla Public License 2.0
75 stars 112 forks source link

Amazon Import Post Processor Fails #52

Open ghost opened 3 years ago

ghost commented 3 years ago

This issue was originally opened by @jeremymcgee73 as hashicorp/packer#10873. It was migrated here as a result of the Packer plugin split. The original body of the issue is below.


Overview of the Issue

I am having a problem when I am using the Amazon Import post processor after building an ova with vmware-iso. This error does not happen every time, just seems to be random. I am setting AWS_POLL_DELAY_SECONDS=30 before running packer.

Packer version

1.70

Operating system and Environment details

RHEL 7.8 VMware Workstation 15

Log Fragments and crash.log files

==> vmware-iso.windows-base: Exporting virtual machine...
    vmware-iso.windows-base: Executing: ovftool
==> vmware-iso.windows-base: Running post-processor:  (type shell-local)
==> vmware-iso.windows-base (shell-local): Running local shell script: /packer/packer-shell731403097
==> vmware-iso.windows-base: Running post-processor:  (type amazon-import)
    vmware-iso.windows-base (amazon-import): Uploading 2019.ovaa to s3://NOT-REAL-BUCKET/packer-import-1617712990.ova
    vmware-iso.windows-base (amazon-import): Completed upload of 2019.ovaa to s3://NOT-REAL-BUCKET/packer-import-1617712990.ova
    vmware-iso.windows-base (amazon-import): Setting license type to 'AWS'
    vmware-iso.windows-base (amazon-import): Started import of s3://NOT-REAL-BUCKET/packer-import-1617712990.ova, task id import-ami-007c4145c618ddcd6
    vmware-iso.windows-base (amazon-import): Waiting for task import-ami-007c4145c618ddcd6 to complete (may take a while)
Build 'vmware-iso.windows-base' errored after 1 hour 26 minutes: 1 error(s) occurred:
* Post-processor failed: Import task import-ami-007c4145c618ddcd6 failed with status message: ClientError: The specified S3 resource does not exist. Reason 404 Not Found, error: ResourceNotReady: failed waiting for successful resource state
nywilken commented 3 years ago

Hi @jeremymcgee73 thanks for reaching out. Just a heads up this issue moved to its new home hashicorp/packer-plugin-amazon as the Amazon components are now being maintained in their own repository. With that said it is is possible that this is an eventual consistency issue.

Does the resource eventually get created and is reachable on S3?

If it is a timing issue you can workaround this by setting a very high value for AWS_MAX_ATTEMPTS and AWS_POLL_DELAY_SECONDS. If after a long wait time you still run into issues I would double check that you are able to create and read from the respective S3 resources.

jeremymcgee73 commented 3 years ago

Thanks for the reply! I am trying that now, and will let you know what I find out. That does make sense! I thought about that, but didnt think about setting those to be high.

If that is the problem, would it be worth adding a pause and retry for this?

jeremymcgee73 commented 3 years ago

I'm not 100% sure yet. But, I am pretty certain this has solved my problem. Usually 1 out of 5 failed.

Thanks!

nywilken commented 3 years ago

Thanks for the quick turn around on testing @jeremymcgee73. If that is working then I would say that the retry logic in place already is working its just that the default values are not high enough, at least for your use case. If this is an issue for other's then it might mean we need to reevaluate the defaults. Keeping this open for now.

To better assist with this issue I would recommend adding the the aws_polling configuration option to your templates to avoid having to set ENV variables each time. This was added to override defaults for Amazon services that were not longer enough for some users. Information on the configuration option can be found at https://www.packer.io/docs/builders/amazon/ebs#polling-configuration

jeremymcgee73 commented 3 years ago

I actually don't think that solved my problem. The OVA is getting copied to S3, because it remains after the run. The problem is very intermittent, maybe only happens every 10 runs. Let me know how else I can help.

Thanks!

Post-processor failed: Import task import-ami-0f0769d3d5ba527d6 failed with status message: ClientError: The specified S3 resource does not exist. Reason 404 Not Found, error: ResourceNotReady: failed waiting for successful resource state

The settings I'm passing in as ENV vars: AWS_POLL_DELAY_SECONDS=600 AWS_MAX_ATTEMPTS=100

jeremymcgee73 commented 3 years ago

I took another look at this, I don't believe the AWS poll intervals will help. I believe this is failing on the initial import step, not on the checks to see if its done. I think maybe this particular error could be caught, and tried again?

I believe it may be the size of the images, that creates the race condition. Maybe if you add a step to your tests that copy a big file to the image(get the total up to 5/6GB), before the post-processor is ran.

testworksau commented 2 years ago

I would recommend adding the the aws_polling configuration option to your templates

Hi @nywilken 👋🏼

Just wondering how to set the aws_polling configuration option on a build that is using the amazon-import post-processor?

We're using the hyperv-iso builder so we have something like the following defined:

source "hyperv-iso" "build" {
  communicator       = "winrm"
  ...
}
...
build {
  description = "Windows 10"
  ...
  provisioner "windows-restart" {
    restart_timeout = "15m"
  }
  ...
  post-processor "amazon-import" {
      aws_polling = {
        delay_seconds = 30
        max_attempts  = 600
    }
  }
}

The documentation for the post-processor only mentions environment variables. In other builds where we are using the amazon-ebs builder we can set the aws_polling configuration on the builder itself, but adding this configuration to the post-processor generates an error message:

Error: Failed preparing post-processor-block "amazon-import" ""
Unsupported argument; An argument named
 "aws_polling" is not expected here. Did you mean to define a block of type "aws_polling"?

Update: it seems I had the configuration wrong, as per the error message which I should have read properly. The aws_polling configuration should be defined as a block, i.e. without the equals sign:

  post-processor "amazon-import" {
      aws_polling {
        delay_seconds = 30
        max_attempts  = 600
    }
  }

Perhaps it would be useful if the documentation for the post-processor mentioned the aws_polling block can be specified on this post-processor too?