hashicorp / packer

Packer is a tool for creating identical machine images for multiple platforms from a single source configuration.
http://www.packer.io
Other
15.06k stars 3.33k forks source link

AWS authentication from environment variables is not working #2611

Closed kylegoch closed 8 years ago

kylegoch commented 9 years ago

We bumped from version 0.7.5 to 0.8.5 with no other changes to templates or outside resources. We use IAM roles on Jenkins (where Packer is installed). They did not change and worked fine in version 0.7.5.

When calling packer build, we now get this error:

==> amazon-ebs: Error querying AMI: AuthFailure: AWS was not able to validate the provided access credentials ==> amazon-ebs: status code: 401%!(PACKER_COMMA) request id: []

Switching back from version 0.8.5 to 0.7.5 resolved the issue and we could build as normal. I went back and did an update from 0.7.5 to 0.8.0 and still got the error.

We have switched back to 0.7.5 for now.

cbednarski commented 9 years ago

Thanks for the report. In 0.8.0 we switched from goamz to the official AWS SDK for go so this might be related to that change.

To clarify, you're using a role for the instance running packer, but you are not using a role or creds in your packer template? The strategy we use for looking up credentials for packer to use is this:

  1. Static configs (from a template)
  2. Environment variables (AWS_ACCESS_KEY_ID, etc.)
  3. Config file
  4. EC2 Role

So if you have one of those other configurations setup on Jenkins, packer will use that instead.

kylegoch commented 9 years ago

Thanks for the reply.

That is correct. There are no AWS creds/role in the packer template. There is however an IAM role that is being applied to the instance that packer launches, but that has not changed either.

Only way Jenkins gets its AWS permissions is through IAM role (there are no hard coded AWS creds on Jenkins).

rahart commented 9 years ago

I'm having the same issue, could it be related to https://github.com/aws/aws-sdk-go/issues/345?

cbednarski commented 9 years ago

@rahart Thanks for the link! If this has been fixed upstream then this should be resolved with the next packer release.

leg100 commented 9 years ago

I'm getting the same issue relying on an IAM instance profile for the amazon-chroot builder (0.8.5):

Build 'zookeeper' errored: Error querying AMI: AuthFailure: AWS was not able to validate the provided access credentials
    status code: 401, request id: []

Rolling out a new release would be much appreciated!

daleki commented 9 years ago

In a preliminary test we ran today IAM Role support seems fixed in 0.8.6.

synthe commented 9 years ago

Upgrading to 0.8.6 also allowed instance IAM roles to start working for me as well.

kylegoch commented 9 years ago

We upgraded Jenkins this morning to 0.8.6. However we are still getting the same issue.

Also, I mispoke in my original post. We are doing an STS Assume Role with an IAM role on Jenkins. So it's not a direct IAM role trying to run Packer builds in the same account.

Sorry, I should have specified that earlier. But that may explain why I am still getting the error but @daleki and @synthe are not on version 0.8.6.

kylegoch commented 9 years ago

@cbednarski Any update on this, or any further info I can provide to help out with this issue?

kylegoch commented 8 years ago

@cbednarski

This issue seems to be related to the use of STS in our Jenkins script that then calls Packer. These environmental variables get set from that script with the results from the AWS STS call:

AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SECURITY_TOKEN

I got this working with 0.8.6 by adding these lines to our template:

{
 "variables": {
 "aws_access_key": "{{env `AWS_ACCESS_KEY_ID`}}",
 "aws_secret_key": "{{env `AWS_SECRET_ACCESS_KEY`}}",
 "aws_security_token": "{{env `AWS_SECURITY_TOKEN}}",
 ....
}
"builders: {
  "type": "amazon-ebs",
  "access_key": "{{user `aws_access_key`}}",
  "secret_key": "{{user `aws_secret_key`}}",
  "aws_security_token": "{{env `aws_security_token}}}",
  ...
}
}

I know this is related to the SDK switch, but wasnt sure if that should be noted or documented if someone is using STS. Because in version 0.7.5, Packer just read our ENV variables.

Or if I should close this issue and note in another since it is not IAM role specific anymore.

cbednarski commented 8 years ago

@kylegoch Thanks for following up. From what you're describing I think this is still a bug. I'm not too familiar with the details of assume role, personally. Typically when I run these builds I am running from my laptop against the EC2 API on an account I own.

Are you using federated accounts in AWS? If so this will be pretty complicated for me to test, unfortunately. I will need to spend some time reading the API docs and compare against the current implementation.

kylegoch commented 8 years ago

@cbednarski I dont think you will have to delve into assuming role. As that is handled outside Packer. We just present Packer with the three Environment Variables outlined in last post.

In 0.7.5, Packer picked up those Environment Variables with no extra configuration on our aside. But 0.8.6 is where we had to explicitly state in the user variables on the template for Packer to pull the environment variables.

I have run into a similar issue when running Packer locally with my own Access Keys (but no security token). Before Packer would pull them from my Environment Variables set by my bash profile. But with 0.8.6 I had to manually add the keys to the profile to get it to work. Not sure if thats related our not. That laptop runs Ubuntu 14.04.

Our Jenkins runs RHEL 6.7 for reference.

cbednarski commented 8 years ago

Thanks for clarifying. I've updated the issue title to reflect what I think you're running into.

kylegoch commented 8 years ago

I would say the new title describes the issue perfect.

Thanks!

chriswgerber commented 8 years ago

@kylegoch

See the comment here: https://github.com/mitchellh/packer/issues/3070#issuecomment-174659483

jghward commented 8 years ago

In case it helps anyone, I was having this issue on all versions of Packer since the switch to the official AWS SDK. For me it turned out to be an issue with the system time having drifted by about ten minutes. Installing ntp fixed the issue for me. (Debian Jessie, Packer 0.10)

devops-dude commented 8 years ago

I also had this issue and it was due to system time being off by 5 minutes. I'd suggest outputting an error stating that is why packer fails rather than 401 auth. If I hadn't found this issue I might have spent days trying to figure this out.

rickard-von-essen commented 8 years ago

@dsmithwfm It's not possible to determine why authorization fails. You should use ntp to manage system time. Most secure communication systems depend on system time.

rickard-von-essen commented 8 years ago

I'm closing this since it should be fixed in 0.8.6 as long as you have correct system time. If you still have problems please open a new issue and reference this.

devops-dude commented 8 years ago

I'm a sysadmin so I always install ntpd. I'm cleaning up a mess a developer created because, well he's devops.

Now I don't know the details of why Packer fails when ntpd is off. Perhaps it's something inside AWS. However, I do think that you could update the error to say, "You might want to check that system time is up to date as that is one reason this can fail". I mean for the good of mankind you could potentially save 1000's of hours of people lives. You might even stop some poor dog from getting kicked by a frustrated BOFH.

devops-dude commented 8 years ago

Also note that AWS authorization was not failing. I could download objects from S3 buckets using the CLI without issues. Seems that kind of points to this being a Packer issue and not an AWS auth issue (when time isn't synched) to me.

rickard-von-essen commented 8 years ago

@dsmithwfm If this is something you can reproduce please open a new issue and provide necessary details there. Since packer doesn't really do much with auth for AWS, it's just handed over to aws-sdk-go this is probably a bug there or "intended behaviour".

(I'm sorry that your org has a broken implementation of DevOps)

k-ong commented 8 years ago

@dsmithwfm I believe this is an AWS error message issue as calling packer with a delayed time as well as issuing an AWS CLI call from the command line produces the same error message.

It's helpful to note that AWS honours the time a call is made to it's API endpoints, regardless of whether it's through the CLI or SDKs. Having non-synchronised time with the AWS servers will cause the calls to fail. Certain types of calls do, in past times, inform you of the difference in time between the server and current time but I cannot remember which, or if they have deprecated that for a more generic message like we are seeing here.

mwhooker commented 8 years ago

hmm, I'm sympathetic to the idea that some users might waste time trying to solve this while we could be helpful.

What if we added a section to https://github.com/mitchellh/packer/blob/master/website/source/docs/other/debugging.html.md explaining what to do if there's a 401 error, then

in builder/amazon/{chroot,instance,ebs}/builder.go we add something like

if rawErr, ok := state.GetOk("error"); ok {
    if ec2err, ok := rawErr.(awserr.Error); ok && ec2err.Code() == "AuthFailure" {
        ui.Message("See https://www.packer.io/docs/other/debugging.html for help debugging this error.")
    }
}

I think this is a good user experience, however it's without precedent so far

rickard-von-essen commented 8 years ago

@mwhooker 👍 sounds good.

mwhooker commented 8 years ago

unfortunately the "error" inside state is a fmt.Errorf, so we lose all the type information. I opened https://github.com/mitchellh/packer/pull/3957 which just adds the error you might see and how to fix it to the docs. Hopefully that's good enough for now

gokhansengun commented 7 years ago

Had the same problem, in my case the time was correct but the clock was set manually (not automatically set by a time server - in a Macbook). Setting to sync with time.apple.com fixed the issue for me.

jcrben commented 7 years ago

In my case it seems that the time was correct, but it was still throwing the error. Eventually I restarted my Macbook and now it seems to work. A trifle baffling.

macropin commented 6 years ago

Restarting the Docker vm worked for me! 2018 and VM time drift is still a thing...

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.