hashicorp / vault

A tool for secrets management, encryption as a service, and privileged access management
https://www.vaultproject.io/
Other
30.93k stars 4.18k forks source link

RFC: First-Class AWS VPC Endpoint Support #5228

Open joelthompson opened 6 years ago

joelthompson commented 6 years ago

Overview

AWS supports VPC "interface" endpoints for both EC2 and STS, in certain regions. (Sadly, there are no IAM endpoints.) What this means is that AWS will put what looks like a virtual NIC for these services inside of your VPC so you can access them without ever hitting the internet. The NIC will have a unique DNS name associated with it and it can also have the in-VPC DNS resolver resolve queries for the default endpoints to it.

The benefit to a Vault operator from using these is clear -- Vault no longer needs to go to the public internet (whether directly or through an HTTP proxy) to access these endpoints. Vault already has some rudimentary support for them (thanks to the endpoint, iam_endpoint, and sts_endpoint config items in both the auth and secret backend), but I think it would be good to build it in a first-class way through a combination of changes in the code as well as documentation changes.

I'm opening this up to start to catalog what I think needs to get done and to get some feedback on it.

One important limitation of VPC endpoints before going further: VPC endpoints can only be used to access services in the same region as the VPC. Accessing different regions still requires going over the internet in some fashion.

AWS Auth Backend

The AWS Auth backend requires talking to the STS, EC2, and IAM endpoints.

IAM

This is probably the easiest to describe -- not currently possible because no IAM VPC endpoint exists. Further, IAM is a global service and, unlike STS, has no regional endpoints, so I don't know what's going to happen long-term with this.

The IAM endpoint is used at these points in the code:

  1. When creating/updating a role, to resolve the role ARN to the unique ID. This can be worked around by disabling unique ID resolution.
  2. When logging in using either the EC2 auth method or the IAM auth method with inferencing, and there's a bound_iam_role_arn on the role, then look up the EC2 instance's IAM instance profile to find the role associated with the profile.
  3. When using the IAM auth method and the role has a wildcard bind, resolve the role to the full ARN

STS

While STS is a global service like IAM, unlike IAM, it also has regional endpoints in addition to a single global endpoint. STS VPC endpoints (as of right now) are only available in us-west-2. The region is embedded in the AWS signature, and for the global endpoint, AWS appears to expect a region of us-east-1, while for the regional endpoints, AWS appears to expect the region of the endpoint. See, e.g., https://github.com/hashicorp/vault-ruby/pull/161#issuecomment-355723269 and https://github.com/hvac/hvac/issues/251#issuecomment-416985958 for some more background. Furthermore, when using a VPC endpoint, AWS expects it to have the region of the VPC that it's in (e.g., I'm using a VPC endpoint in us-west-2, and I have to embed us-west-2 as the region in the AWS signature).

The STS endpoint is used in two locations:

  1. When logging in using the IAM auth method, the signed sts:GetCallerIdentity request is forwarded to this endpoint
  2. When accessing either the IAM or the EC2 client, to use assumed-role credentials for those connections.

I've verified that the first one works when overriding the sts_endpoint to be the VPC endpoint hostname. However, as I noted in the linked issues, the caveat is that the client has to generate signed headers for the region of the endpoint. Now, the good news is that the client really only needs to know the region -- that is, the client can generate headers assuming the endpoint is sts.us-west-2.amazonaws.com and send them to Vault which uses a VPC endpoint URL that is different than this, and it will work. So, I think the main thing that needs to be done here is to enhance the Vault CLI to support alternate regions when calling vault login -method=aws and then documenting this.

As for the second, I haven't tested it (yet). I suspect it'll get the region wrong when using a VPC endpoint in another region, so it needs to be tested (and possibly fixed/enhanced/documented).

EC2

EC2 is used when using the EC2 auth method or the IAM auth method with EC2 inferencing, in order to query the EC2 APIs about instance metadata.

When using the EC2 auth method, Vault extracts the region out of the instance identity document and then uses that to decide which EC2 endpoint to query. However, VPC endpoints can only be used for a single region, so any out-of-region EC2 calls would still need to go over the public internet. However, if the endpoint config is set, it will always go out that endpoint, so in effect, EC2 clients in multiple regions when using the endpoint config value will never work. My inclination is to leave this unchanged and just document this. If Vault users have EC2 instances in only a single region that matches the Vault deploy, they can either configure Vault with the specific hostname of the VPC endpoint or configure the VPC to resolve the default endpoint hostname to the VPC endpoint. If Vault users have EC2 instances outside the region Vault is in, they can't configure the EC2 endpoint in Vault, so they will need to configure the VPC resolver (which will work for only the region they're in) and accept that traffic from outside the region will go to the internet or just accept that all traffic will go out the internet. This should all be documented.

AWS Secret Backend

The secret backend requires talking to the IAM and/or STS endpoints.

IAM

This is used only when the credential_type is iam_user and is use for creating and deleting IAM users. As above, because there are no IAM VPC endpoints, this will still need to go out over the internet.

STS

This is significantly simpler than with the IAM auth method. This is used for assumed_role and federation_token credential types. However, simply setting the sts_endpoint may not be sufficient here. I need to test this and, if it requires an explicit region, make sure the code supports what it takes to get this running and then document it appropriately. However, as with all others, it requires the VPC Endpoint to be in the same region as the KMS key used.

AWS KMS Seal

AWS KMS Seal only uses the KMS endpoint. I did some tests and it seems to work just fine, but it needs to be documented. I've submitted #5618 to update the docs and then I think this should be good.

joelthompson commented 5 years ago

With the movement of auto-unseal from enterprise to OSS, I realized the discussion above missed the AWS KMS seal. I've updated it with a note about the KMS seal.

joelthompson commented 5 years ago

One other interesting note: when I wrote this originally, AWS did not support accessing VPC Interface Endpoints over cross-region peering, so it required Vault to be in the same region as the service being accessed. However, that is now no longer the case, and so it might make sense to think about changing the notion of an "endpoint" to an endpoint/region. Not quite sure what makes sense here.