Closed copumpkin closed 8 years ago
I like it. You may want to look at https://github.com/hashicorp/vault/pull/805 for reference as another stab at authentication via AWS (also be sure to look at my first comment for notes about timing w.r.t. developing a backend yourself and the underlying frameworks). One thing I like about this method is that is that as long as the AWS key is known, you don't even need an AWS token in order for a Vault backend to verify validity.
Some thoughts: 1) Ensuring that an identity document is only ever used once could be problematic. If there are timestamps in the metadata, you could set an upper limit on how long it would be valid for authentication, and likewise could then purge old identity documents rather than storage increasing in an unbounded way (which would eventually become problematic for the physical backend).
2) There'd need to be a way -- possibly defined via instance metadata -- to identify what permissions such a machine should get. If the instance metadata come with some kind of group identifier, that could map to e.g. roles in the backend.
Thanks for the quick response and the reference to the other scheme! I'll take a closer look.
Ensuring that an identity document is only ever used once could be problematic. If there are timestamps in the metadata, you could set an upper limit on how long it would be valid for authentication, and likewise could then purge old identity documents rather than storage increasing in an unbounded way (which would eventually become problematic for the physical backend).
I'm not sure the "only once" aspect is essential for the security of the scheme, but feels like good practice. Enforcing a time limit since the machine spun up seems like a reasonable approximation to the original goal, and a less stateful one.
There'd need to be a way -- possibly defined via instance metadata -- to identify what permissions such a machine should get. If the instance metadata come with some kind of group identifier, that could map to e.g. roles in the backend.
Good point. The document currently seems to contain information of the following form:
{
"instanceId" : "i-c495bb93",
"billingProducts" : [ "bp-xxx" ],
"accountId" : "xxx",
"imageId" : "ami-e80xxxx",
"instanceType" : "c3.xlarge",
"kernelId" : "aki-825ea7eb",
"ramdiskId" : null,
"pendingTime" : "2015-02-24T14:38:43Z",
"architecture" : "x86_64",
"region" : "us-east-1",
"version" : "2010-08-31",
"availabilityZone" : "us-east-1c",
"privateIp" : "w.x.y.z",
"devpayProductCodes" : null
}
(stolen from here)
Which likely wouldn't be much use without being able to cross-reference (via AWS API) against other properties of the instance.
I do have another (hackier) scheme that would tie a brand new instance to its IAM role (likely more interesting for auth purposes) while still retaining the "as long as the AWS key is known, you don't even need an AWS token in order for a Vault backend to verify validity" property you liked. I'd like to run a little further with this one, and then write up the other one if this proves too annoying.
Don't worry too much about not needing an AWS key. It's a neat property, consuming AWS auth without a key, but practicality trumps academic interest :-)
@copumpkin what are your thoughts about https://github.com/hashicorp/vault/pull/805 You may also check out https://github.com/hashicorp/vault/issues/406
Some background reading. I believe 805 is based on the following http://ryandlane.com/blog/2015/06/16/custom-service-to-service-authentication-using-iamkms/ Which is written by one of the devs for Confidant
This seems so much simpler than #805, but the downside is that it still doesn't help to match the "user-id" (which seems to be what you're suggesting we use this to verify) with an "app-id", or am I missing something?
Regarding the time limit, AWS support just told me The "pendingTime" date time value in the identity document represents when the instance was launched in a UTC format.
so there's a date we can trust (because it's part of the signed document) to limit the timespan to some degree. I feel like that could possibly be coupled with the cubbyhole authentication model to further force a single token for a given instance-id in a single timeframe, too.
In this model how would a user-id get mapped to a policy?
You'd need some sort of separate mapping layer on the Vault side that can take the instance ID and account ID (probably the most useful parts of the document for this purpose) and map them back to permissions that are meaningful to Vault. It's not beautiful, but should only need minimal EC2 read-only access on Vault's side to map the instance ID to e.g., the autoscaling group it came from, tags on the instance, or the instance profile attached to it. Those can then be used to inform the actual authorization decisions in a user-specified manner.
Sorry this is vague :smile:
@copumpkin where is the hard-to-find key located?
I had to contact AWS support for it (:open_mouth:) since I couldn't find it anywhere on the public internet. I asked them to update their docs and they said they would do that soon, but also asked me not to distribute it myself. It's a certificate, but it has no chain that's traceable back to a trusted root.
I'll ping them to see if they can publish it on the sooner side.
This is all really great stuff. The one issue is that this method solves for server based assets while not solving for server less (Lambda, as an example). The one thing about #805 is that you could imagine it working for something like Lambda
I've really been mulling the idea of authenticating (specifically through auto-scaling groups) for a while (several months).
I think that at the end of the day, it boils down to what the Hashicorp folks mention in app-id: that there really needs to be an out-of-band process to decide who gets access and who doesn't. While there have been many ideas posted about how vault can do it (this one included), I don't think I'd want Vault to make the decision itself. Certainly, not by hard-coding a single method which needs to make API calls to Amazon or use a hardcoded secret embedded in Vault.
Also, at the end of the day, I want to design my setup to fit my needs as much as possible.
I might be more willing to go that path if/when Vault comes with an interface for external plugins, and the plugins can be managed out-of-band with my vault server.
If it's of any academic interest, I plan on using this document only as a means of verifying that the request is authenticated as from coming from AWS, and then using AWS APIs to query the instance-data from the machine (which contains the chef runlist and environment - I don't use roles). Since that essentially fits the idea of the "userid" and "appid" respectively, based on that, I'll issue a token to Vault. Because I'm free to implement any way I like I can further secure this by checking if the machine is in my AWS account, in a VPC that makes sense, and even if the instance is registered with the auto-scaling group it's supposed to be in. I plan on doing this externally to Vault.
Also, for acedemic interest, I got the following response from Amazon support yesterday:
I have received confirmation that you may share this public key outside your company.
The documentation team has been made aware of this and they will be publishing this information in a future revision of our docs (they did not give an ETA, but it should be added soon).
Please do let me know if I can do anything else to assist.
Best regards,
Michael M.
Amazon Web Services
Based on that, here's the public key needed to make this all work:
-----BEGIN CERTIFICATE-----
MIIC7TCCAq0CCQCWukjZ5V4aZzAJBgcqhkjOOAQDMFwxCzAJBgNVBAYTAlVTMRkw
FwYDVQQIExBXYXNoaW5ndG9uIFN0YXRlMRAwDgYDVQQHEwdTZWF0dGxlMSAwHgYD
VQQKExdBbWF6b24gV2ViIFNlcnZpY2VzIExMQzAeFw0xMjAxMDUxMjU2MTJaFw0z
ODAxMDUxMjU2MTJaMFwxCzAJBgNVBAYTAlVTMRkwFwYDVQQIExBXYXNoaW5ndG9u
IFN0YXRlMRAwDgYDVQQHEwdTZWF0dGxlMSAwHgYDVQQKExdBbWF6b24gV2ViIFNl
cnZpY2VzIExMQzCCAbcwggEsBgcqhkjOOAQBMIIBHwKBgQCjkvcS2bb1VQ4yt/5e
ih5OO6kK/n1Lzllr7D8ZwtQP8fOEpp5E2ng+D6Ud1Z1gYipr58Kj3nssSNpI6bX3
VyIQzK7wLclnd/YozqNNmgIyZecN7EglK9ITHJLP+x8FtUpt3QbyYXJdmVMegN6P
hviYt5JH/nYl4hh3Pa1HJdskgQIVALVJ3ER11+Ko4tP6nwvHwh6+ERYRAoGBAI1j
k+tkqMVHuAFcvAGKocTgsjJem6/5qomzJuKDmbJNu9Qxw3rAotXau8Qe+MBcJl/U
hhy1KHVpCGl9fueQ2s6IL0CaO/buycU1CiYQk40KNHCcHfNiZbdlx1E9rpUp7bnF
lRa2v1ntMX3caRVDdbtPEWmdxSCYsYFDk4mZrOLBA4GEAAKBgEbmeve5f8LIE/Gf
MNmP9CM5eovQOGx5ho8WqD+aTebs+k2tn92BBPqeZqpWRa5P/+jrdKml1qx4llHW
MXrs3IgIb6+hUIB+S8dz8/mmO0bpr76RoZVCXYab2CZedFut7qc3WUH9+EUAH5mw
vSeDCOUMYQR7R9LINYwouHIziqQYMAkGByqGSM44BAMDLwAwLAIUWXBlk40xTwSw
7HX32MxXYruse9ACFBNGmdX2ZBrVNGrN9N2f6ROk0k9K
-----END CERTIFICATE-----
@issacg great, thanks for releasing that! I wonder why my support person said I couldn't, but yours said you could :confused: maybe they like you better!
Anyway, this seems pretty straightforward to implement the basic idea for now. I don't think this has to be very complicated or need much external help. I also don't see much difference in teaching Vault how to speak to EC2 to query which IAM role a given instance ID is in vs. having it call out to someone else that can do the same thing.
To expand a bit:
PassRole
powers inside AWS)My use-case doesn't map the IAM role directly.
There are a lot of folks with a lot of use-cases. Once plugins are opened, it would make sense to add these as plugins, but I still (personally) don't think this should be part of "core" vault, to not force people to set up their AWS setup to need to be a certain way
@issacg just so I can better understand, what other mechanism would you want to use to automatically map your ASG nodes to a policy in vault? The only reason I'm going with IAM role is that it allows me to control access to it on the AWS side.
Just wanted to point people to #948 if they didn't see the reference, as more food for thought.
@copumpkin anything is controllable from the IAM side, based on what permissions you give your users. IAM is no safer than anything else - at the end of the day, any person or machine with the ability to launch an instance that works will be able to set the identifying data - if your instance needs AWS permissions, then you'll allow any operator authorized to launch machines the PassRole
permissions.
Anyway, currently I'm looking at instance data which is more flexible (for me) than the role (roles are more shared in my setup)
@issacg my only point is that on the AWS side, I can't meaningfully restrict some users from spinning up instances with certain tags or metadata. If someone has RunInstances
powers, they can trivially set a VaultRole = "Admin"
tag, or put something equivalent to that in user-data.
What's unique (to me at least, in a federated environment where there are multiple IAM users with different levels of power) about the IAM role is that it requires PassRole
. So I can give IAM user JoeShmoe
the power to PassRole
a VaultAdmin
role to an instance which gives that instance the power to do fancier things on Vault. IAM user BobLoblaw
can also spin up instances but I haven't granted him the power to use VaultAdmin
, so BobLoblaw
is effectively not able to make machines that have elevated access to Vault.
Does that make sense? I don't think the IAM language in AWS is powerful enough to say "BobLoblaw
can only create instances if he applies certain tags to them" (yet?).
P.S: in practice, managed policies might be a better fit for this sort of thing, but also add complexity that a first iteration of the idea wouldn't want.
BobLoblaw
@copumpkin just won the debate, show's over folks
(edit: no, not seriously)
I understood that, but that only works if VaultAdmin
role isn't actually needed by the EC2 instance for anything other than authenticating to vault. Since you can only have a single role on EC2, it creates a bit of a problem if you want to use the Role for something else, for example S3 or CloudWatch. In that case, JoeShmoe
needs to PassRole
to VaultAdmin
anyway...
I suppose you could avoid it by using the Role exclusively for Vault, and then using Vault to get AWS credentials to actually do anything with S3, but it seems a bit wasteful, IMHO
@issacg that's what I was saying about managed policies (the limit is now 10 per role, which seems sufficient), but yes, you're right. It's not ideal to use the role directly, which then forces me to go back to needing to query EC2 even in #948. Why can't these things just be simple? :smile:
:)
I had the same thought when I realized I wasn't going to get my user-data signed by AWS, and would need to fetch it by querying EC2 here.
This is narrowing in on the solution we ended up with (which isn't ideal, but works). We knew we wanted to use IAM roles to secure access, so what we built is an out of band token management system that is protected via IAM roles. There are a bunch of ways you could go about doing this, we landed on a combination of using the vault token auth system in combination with IAM protected S3/Dynamo storage with at rest encryption. Think of it as manual #805 :)
Basically, in order to get into the "token" store you'll need access to those OOB resources, which means you will need to have a specific managed policy attached to the Instance/Lambda Function/etc that you are operating.
Not to stir up an anthill here, but I wanted to update that the EC2 identity document keys are now in the official AWS documentation: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-identity-documents.html
I just wanted to return to this thread to thank all involved in bringing this into vault. I had been watching patiently for some time and the discussion/design around this feature was both enlightening and came to a very good end. Getting to delete a bunch of custom code and SQS/SNS complexity to obtain the same outcome has really simplified our lives over here :).
@ewdurbin Happy to know that you find it useful!
I've asked AWS's security team if they would consider adding a timestamp or nonce value to the Instance Identity Document so there could be some kind of expiry but they basically told me no. I see that Vault does take some measures to prevent a signed document being maliciously used but to me they don't feel like enough.
I've often wondered how I might bootstrap an AWS instance to authenticate to vault with no out-of-band manual intervention, by trusting AWS as an identity provider.
The existing
app-id
auth backend specifies thatwhich can be painful for fully automated deployments, like in an AWS Autoscaling Group.
I'm wondering if we could use a simple scheme that relies on the oft-overlooked signed identity document provided by an EC2 instance's metadata server. It lives at http://169.254.169.254/latest/dynamic/instance-identity/pkcs7 and is a simple JSON document that includes the instance ID, some pieces of metadata, and a signature by a canonical AWS key (which is unfortunately remarkably hard to find, but can be obtained).
Since the document is signed once and there's no opportunity to inject a nonce into the document being signed, that document would need to be treated as a secret single-use authentication token. But if we do accept that, a brand new EC2 autoscaling instance could bootstrap itself into Vault in the following manner:
Are my goals clear? Does this seem like a sensible way to achieve them?