hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.94k stars 4.18k forks source link

Yet another AWS-based authentication method #948

Closed copumpkin closed 8 years ago

copumpkin commented 8 years ago

Adding to the plethora of AWS authentication methods already proposed (#828 and #805), I figured I'd throw my last one out there:

Unlike #828, this one requires no separate mapping from instance ID to IAM role and can work for Lambda functions too, but unfortunately also feels like a bit of a hack, so do with it as you may.

Let's start clean and simple: imagine if AWS had a WhoAmI IAM API call, that did nothing but map the caller back to their effective ARN.

In that case, a brand new autoscaling instance (or Lambda, or whatever) could authenticate to Vault in the following super-simple way:

  1. Initiate trusted connection to Vault
  2. An instance, using local IAM instance credentials (temporary), uses the standard AWS signature algorithm to sign a WhoAmI request, but don't actually issue the API call to Amazon. In the request, it embeds a representation (DNS name or public key or something fancier) of the Vault endpoint it is speaking to (that is outside of the control of the server).
  3. The Vault server that the instance is authenticating to then actually passes the signed API request along to AWS, observes the response, and thus learns the ARN identity of the instance it is speaking to. A malicious/compromised server can't forward the WhoAmI request along to someone else because of the signed representation of its endpoint.
  4. The Vault server has now authenticated the instance and can treat it with the policy assigned to its ARN (similar to what I describe in #828).

Before you ask, yes, this is basically an intentional MITM attack on the AWS signature scheme. On the other hand, the temporary credentials can rotate and everything will still be perfect.

Now let's step back and examine the reality: there is no WhoAmI API call. What we do have, for better or for worse (and this is where I start really calling it a hack), is that issuing an unauthorized API call (in most cases) will result in an error message containing the full ARN of the identity that tried to perform the call. Thus, instances would simply need to run the scheme I mentioned above, except instead of WhoAmI, they'd issue an API call that they knew ahead of time they weren't allowed to issue, sign that, and pass it along to Vault. Vault would then parse the error message and do everything else I described.

How's that for nasty? Anyway, I mostly just wanted to get this out there for feedback. I'm not sure which of the three schemes I'd actually prefer in practice, but this one has been my long-standing pet authentication scheme and I was wondering how others felt about it.

jefferai commented 8 years ago

I think it's a really cool idea that works ("works"? :-) ) as long as Amazon doesn't change their error API. I have no idea what their stability policy is there (if anything).

I also don't know how difficult it is for other actors to find out the full ARN of an instance. I'm guessing operators can, but if normal users with some kind of access to EC2 machines within a company can find it (perhaps by hopping on the server and issuing an API call? If that's possible?), that would probably be problematic.

copumpkin commented 8 years ago

I think it's a really cool idea that works ("works"? :-) ) as long as Amazon doesn't change their error API. I have no idea what their stability policy is there (if anything).

Which is why we should band together and petition for a real WhoAmI API call :smile: but agree with you that it feels kind of shitty.

I also don't know how difficult it is for other actors to find out the full ARN of an instance. I'm guessing operators can, but if normal users with some kind of access to EC2 machines within a company can find it (perhaps by hopping on the server and issuing an API call? If that's possible?), that would probably be problematic.

Other actors in what sense? Can you elaborate on the concern here? My only point is that you'd end up with a trustworthy assertion (i.e., by Amazon) about the ARN of the identity that's authenticating to you. Vault then uses that ARN to determine your access level.

jefferai commented 8 years ago

@copumpkin I guess I'm wondering if a user on the EC2 box would be able to discover the instance credentials in any ready/easy fashion, so that such a user could then create a new signed request, send it into Vault for submission to Amazon, and get a token issued that they can use with the machine's policies.

copumpkin commented 8 years ago

Oh, yes, they certainly could, but I think that's an issue with any of these proposals (the KMS one, the instance identity document, or this one). Basically they all rely on the local hypervisor-provided 169.254.169.254 endpoint being hard to reach. Anyone who can talk to it is effectively equivalent to the machine from an authentication standpoint.

Having said that, the machine's iptables can fairly trivially be set up to only allow certain users to hit that IP (I set it up fairly restrictively by default), and I don't generally envision the typical use case for this thing to be a multi-user system. If users were interacting with the machines on a regular basis, they could also inject Vault credentials into it. The primary use case here is autonomous machines that want secrets without those pesky meddling humans getting involved.

jefferai commented 8 years ago

That's fair. I just am not, myself, super well versed in the ins and outs of AWS and its API, so I'm trying to ensure i have a good picture.

issacg commented 8 years ago

I'm missing something here. What if, by accident, the AWS role had permissions to run the call in question? It's kinda hacky - #828 took care of authenticating the ARN already, so I'm not sure what you're suggesting can be gained here...

copumpkin commented 8 years ago

@issacg the point is the person sending the signed request would presumably know what they can and cannot do, and would decide what to send accordingly. There might also be things that are almost universally not allowed, like running certain s3 actions on a widely known public bucket that belongs to someone else. Anyway, I don't know. Just throwing ideas out there.

The delta from #828 is that:

  1. It works with Lambda and other things that don't have an identity document (I use Hologram to get credentials locally, for example)
  2. It doesn't require Vault to actually query any structured information from AWS beyond running the signed command. It's "natively" telling you the thing you (or I, at least) care about, which is the ARN of the principal that's authenticating.
issacg commented 8 years ago

1) Excellent point! 2) That depends on what you want to get from AWS :) The document from the instance-data contains all of the information needed to contruct the same ARN without fetching anything else

copumpkin commented 8 years ago

@issacg your point 2 is super interesting to me. The example I linked to in #828 didn't show that and last time I checked (admittedly a while ago) I don't remember seeing that, but if you're right then that makes me pretty excited!

issacg commented 8 years ago

Well, according to http://docs.aws.amazon.com/general/latest/gr/aws-arns-and-namespaces.html#arn-syntax-ec2 an EC2 ARN is arn:aws:ec2:region:account-id:instance/instance-id

In the document, we get the region, account id and instance id...

copumpkin commented 8 years ago

@issacg hmm, when I run aws iam list-users (from an instance where I'm not allowed to do that), I get an error message telling me that my ARN is:

arn:aws:sts::<account>:assumed-role/<role-name>/<instance-id>

The <role-name> component is the part I was missing from the signed identity document :frowning:

joelthompson commented 8 years ago

@issacg -- the role not accidentally having access is easy to solve for. In the IAM language, denies take precedence over allows, so all you need to do is, e.g., create your own S3 bucket, throw a deny all IAM policy on it (it's possible to lock yourself out of an S3 bucket to the point you need the root account to remove the IAM policy using the API/CLI), and then you're good to go.

issacg commented 8 years ago

@copumpkin ahh, that's the ARN of the role making the signed request, which makes complete sense, but isn't quite what you said above: ARN identity of the instance it is speaking to :smile:

@joelthompson there are ways to ensure it; the point is (IMHO) it's hack-y as we're requiring the user to specifically reserve something which must fail. It works, but it just seems... wrong.

joelthompson commented 8 years ago

@issacg -- of course it's wrong! I mean, @copumpkin started the last paragraph in his first post with "How's that for nasty?" :)

cstrahan commented 8 years ago

Has Amazon been apprised of the desirability of a WhoAmI API?

copumpkin commented 8 years ago

Well, I've submitted a feature request via standard support mechanisms :) I'm sure it won't hurt for more people to do the same! On Sat, Jan 23, 2016 at 15:30 Charles Strahan notifications@github.com wrote:

Has Amazon been apprised of the desirability of a WhoAmI API?

— Reply to this email directly or view it on GitHub https://github.com/hashicorp/vault/issues/948#issuecomment-174218881.

fieldju commented 8 years ago

I have spent the last couple months dealing with this problem and would like to chime in here too. I have dug through the following issues / PR's: https://github.com/hashicorp/vault/issues/406, https://github.com/hashicorp/vault/issues/828, https://github.com/hashicorp/vault/pull/805 and now this one. Each one has pros and cons.

I too have talked to AWS about the fabled WhoAmI call and have also submitted the following to our AWS Contacts and raise you one feature request of our own that would be slightly better suited to the problem we face. jfiel2-AWSFeatureRequest-260116-1614-68.pdf

In the mean time we have worked out a solution that works today without any additional work required in Vault using the default auth backend of Tokens. I will post a more formal write up about how it works later. The brief description is as follows.

You generate a token for the policies that you want a given IAM Role to have and stuff it into a DynamoDB Table using the IAM Role Arn as the key and encrypt it at rest using KMS. You then create an inline policy for the IAM Role granting it read permissions for that entry only. image You then Grant the IAM Role decrypt access to the KMS key used to encrypt the token image

Now an Instance in AWS can grab its encrypted token from DynamoDB decrypt it using KMS and make requests to vault to retrieve secrets.

image

image

I have a lot more information I would like to share but I need to go through my management to get the proper blessings to share. We have written a nice little YAML driven admin cli that drives the management of the DynamoDB Table and grants and interaction with Vault. So all an admin would need to do is manage a YAML file with a block that looks like the following

image

The admin cli would manage the creation and clean up of all related resources from adding, editing and removing entries in the YAML. I would like to open source the cli soon.

joelthompson commented 8 years ago

@fieldju -- interesting solution. It still doesn't seem ideal because it doesn't work well in an autoscale solution -- you either need to create new tokens as a new node is autoscaled up, or the autoscaled nodes need to re-use the same token, neither of which is particularly elegant. It also requires the nodes to talk out to the Dynamo and KMS APIs to retrieve and decrypt the tokens, and AWS doesn't have VPC endpoints for Dynamo or KMS, so we would have to give all our nodes some degree of internet access :/

I like your feature request, though, it's pretty cool. I would want the token to be expiring, though, to guard against replay attacks and stolen tokens, and the like, and it might also be cool to be able to specify, in the token generation request, the IAM policy that is allowed to verify it (which prevents the service from using that token to authenticate to other services which use the same API).

fieldju commented 8 years ago

@joelthompson The DynamoDB fine grained access control solution limits access to a token to an IAM Role. Autoscaling Groups have one instance profile which would have one IAM Role defined. So all instances in an ASG would have access to the same bucket of secrets. Under what condition would you want an instance in an ASG to have access to a different set of secrets than the other instances in the ASG?

As for DynamoDB and KMS only being available on the public internet that I do not control, but wouldn't that be true for all of the AWS Api based solutions proposed?

as for the feature request I mentioned above I wished it worked simular to this gist: https://gist.github.com/fieldju/6c51e966388136df0bd4 but generateSignatureForToken() and verifySignatureForToken() are AWS IAM endpoints driven by IAM Roles / Instance Profiles. Basically we need AWS to implement and IAM based JWT: https://jwt.io/introduction/

fieldju commented 8 years ago

@copumpkin I had suggested more or less exactly what you are recommending in your original post here in https://github.com/hashicorp/vault/pull/805 and wrote up a different POC Gist proving out how it could work with an STS assume role request: https://gist.github.com/fieldju/630c8c5375772297a612. Obviously using a GetWhoAmI request would be way safer.

fieldju commented 8 years ago

@jefferai

I'm wondering if a user on the EC2 box would be able to discover the instance credentials in any ready/easy fashion, so that such a user could then create a new signed request, send it into Vault for submission to Amazon, and get a token issued that they can use with the machine's policies.

From https://gist.github.com/fieldju/630c8c5375772297a612

String credentialsResponse = new EC2MetadataClient().getDefaultCredentials();

String ACCESS_KEY_ID = "AccessKeyId";
String SECRET_ACCESS_KEY = "SecretAccessKey";
String TOKEN = "Token";

node = Jackson.jsonNodeOf(credentialsResponse);
accessKey = node.get(ACCESS_KEY_ID);
secretKey = node.get(SECRET_ACCESS_KEY);
token = node.get(TOKEN);

def credentials = new BasicSessionCredentials(accessKey.asText(),
        secretKey.asText(), token.asText());

def myRole = 'arn:aws:iam::000000000:role/vaultRoleThatMatchesAPolicy'

// REPLACE THIS REQUEST WITH ANY AWS REQUEST
AssumeRoleRequest assumeRoleRequest = new AssumeRoleRequest()
assumeRoleRequest.setDurationSeconds(900)
assumeRoleRequest.setRoleArn(myRole)
assumeRoleRequest.setRoleSessionName('vault-role-check')

Request<AssumeRoleRequest> request = new AssumeRoleRequestMarshaller().marshall(assumeRoleRequest)
request.setEndpoint(new URI('https://sts.amazonaws.com'))
request.setResourcePath('/')
request.setHttpMethod(HttpMethodName.GET)

AWS4Signer signer = new AWS4Signer()

signer.sign(request, credentials)

@jefferai What do you think about the DynamoDB solution that leverages already existing API's?

fieldju commented 8 years ago

@issacg, @copumpkin in https://github.com/hashicorp/vault/issues/828 whats the sequence for mapping an instance to a vault policy. How would your suggested solution work end to end from administration to an instance obtaining a secret from vault?

joelthompson commented 8 years ago

@fieldju

Under what condition would you want an instance in an ASG to have access to a different set of secrets than the other instances in the ASG?

It's not about wanting different sets of secrets, it's about being able to audit every instance individually. If one instance gets compromised, for example, then it would be easier to track down what the compromised instance tried to do in a forensics investigation.

As for DynamoDB and KMS only being available on the public internet that I do not control, but wouldn't that be true for all of the AWS Api based solutions proposed?

True, though, for example, with #828, only the Vault server would need access to the APIs, not every single server that wishes to access Vault, which is a materially different security concern. And while I hope AWS expands VPC endpoints to additional services, they still feel a bit clunky in the way they work.

joelthompson commented 8 years ago

Also, with regards to different instances using different tokens, it also makes it simpler to expire tokens and rotate them out if every instance uses a unique token.

jefferai commented 8 years ago

@fieldju I'm not really sure what I'm looking at in that gist (I don't really know much about AWS other than the super basics) and I don't really have an opinion on the DynamoDB solution (for the same reason). :-/

jefferai commented 8 years ago

@fieldju I'm mostly staying out of the way while people that know way more about AWS than I do discuss methodology. I'm more than happy to evaluate/think about a proposed solution if one can be found that works for everyone.

issacg commented 8 years ago

@fieldju yes, anyone who can access the machine (no OS-level root required) can get all of the metadata required to authenticate to Vault.

Some suggestions I remember seeing include using an OS-level firewall to restrict access to Amazon metadata (root required to remove), or including a time-based timeout to limit how long after an instance started will Vault issue tokens. I'm also playing in my head for an idea that a given instance ID can only ever get a single vault token (meaning you can't reboot the instance ever, only create a new instance to replace it), but I haven't completely worked out the details for that.

But as I repeatedly say, it just makes sense to me to not have this as part of core vault just to make hacky solutions like this easier (for things like allowing state, and accessing AWS APIs any which way - all of which would make life much easier for folks with other AWS setups).

issacg commented 8 years ago

@fieldju can you explain how "IAM based JWT" would help things? Why is STS not enough for anything you'd use JWT for?

olly commented 8 years ago

Unless I've misunderstood, it looks like you need the iam get-user API call.

From the documentation:

If you do not specify a user name, IAM determines the user name implicitly based on the AWS access key ID used to sign the request.

Output:

{
    "User": {
        "UserName": "Bob",
        "Path": "/",
        "CreateDate": "2012-09-21T23:03:13Z",
        "UserId": "AKIAIOSFODNN7EXAMPLE",
        "Arn": "arn:aws:iam::123456789012:user/Bob"
    }
}
copumpkin commented 8 years ago

That only uses users, whereas we're talking about IAM roles. GetUser does not tell you about roles and GetRole does not work in the same way :(

On Feb 5, 2016, at 03:58, Oliver Legg notifications@github.com wrote:

Unless I've misunderstood, it looks like you need the iam get-user API call.

From the documentation:

If you do not specify a user name, IAM determines the user name implicitly based on the AWS access key ID used to sign the request.

Output:

{ "User": { "UserName": "Bob", "Path": "/", "CreateDate": "2012-09-21T23:03:13Z", "UserId": "AKIAIOSFODNN7EXAMPLE", "Arn": "arn:aws:iam::123456789012:user/Bob" } } ― Reply to this email directly or view it on GitHub.

jefferai commented 8 years ago

I'm also playing in my head for an idea that a given instance ID can only ever get a single vault token (meaning you can't reboot the instance ever, only create a new instance to replace it), but I haven't completely worked out the details for that.

@issacg This is something I've recommended to people as well who are concerned about how to handle renewals -- especially for those that are trying to auth in an insecure environment and using something like cubbyhole. If your application can drop nodes gracefully, simply having that application quit itself (or be killed, or the instance be killed) and letting your scheduler re-populate is a decent way to do this. It's also something that a lot of people think is a great idea (although I don't buy into it myself), which is why you see Heroku-style scheduled reboots.

If you trust your scheduler, this lets your scheduler produce tokens for the application (whether directly via a secure channel or via cubbyhole for insecure channels) without needing to have the scheduler be aware of what to do at specific periods when the token is expiring, or having a separate daemon to handle this case.

Bit of an aside from the discussion, but just wanted to echo that I think it's a useful paradigm.

jefferai commented 8 years ago

1300 contains the backend that will be going into Vault as the official AWS auth backend. Closing, but I want to be clear that the final design was heavily inspired by all of the discussion around the various possibilities and we are very much appreciative of your efforts!

copumpkin commented 8 years ago

The horrible hack I needed for this is now no longer necessary, and this technique becomes far more viable: http://docs.aws.amazon.com/STS/latest/APIReference/API_GetCallerIdentity.html

copumpkin commented 7 years ago

this technique becomes far more viable

More than just viable, since @joelthompson has now implemented exactly this scheme (minus the hackery, due to AWS adding GetCallerIdentity) in #1962.

fieldju commented 7 years ago

Interesting indeed, much has happened since I was last watching all of these issues.