hashicorp / vault-lambda-extension

Mozilla Public License 2.0
138 stars 29 forks source link

Improvement: Add New Runmode/Option to Remove Need for Network Connectivity #130

Open briankohler opened 8 months ago

briankohler commented 8 months ago

This extension is great and super useful. In particular, this extension gives a way to avoid a CI pattern I've seen too much - one where it's the build server/system that has access to the secrets which it injects into the function as part of the build. While looking over the code (admittedly not in great detail), it seems that it wouldn't be too much work to add a couple of features that would be helpful with regard to connectivity requirements and minimizing reliance on Vault to a degree. t

Introducing a runmode called SET_ENV_VARS (name needs work) would cause the Vault extension to lookup the secrets and store them as function env vars, optionally (probably should be) encrypted with a KMS key. As I understand it, the extension uses the same IAM role that the function does, so assuming the KMS key is created as part of some deployment and IAM is correctly set, seems this would work. There is a specific IAM permission to change funcion env vars. What this means is that the extension can first check for the existence of a set of defined secret env var keys on init, and (possibly using the secret's TTL/MaxTTL which could also be stored in an env var to persist execution over execution), could avoid having to make any calls at all to Vault in the vast majority of cases. Lambda functions as a whole can be encrypted, so the need to encrypt env vars is up for debate. Also up for debate is whether the extension should be involved in the decryption of the encrypted env var, or if the function code itself should be written to expect its secrets are KMS-encrypted env vars. And while I realize that using a KMS key to encrypt a value is similar to making a call to Vault, it varies in a few critical ways. First is that KMS is a managed service compared to Vault, which is self-hosted (or at least not hosted by AWS). It's been my experience that if you can eliminate a dependency on something you run, it's beneficial to do so. There's also the issue of scale - I know the extension only runs once for the lifetime of the lambda instance (so Vault wouldn't be called on every invocation), but lambdas can scale practically infinitely, which does pose some risk of overwhelming the Vault instance.

A variant of this type of Vault lambda could invert the whole paradigm. The Vault lambda could have no Vault rights at all, and instead simply have the rights to encrypt data with a set of KMS keys and edit function env vars. The lambda could receive SQS events that included another lambda's set of secrets (which could themselves be wrapped or otherwise be secured). A sort of event driven ulility lambda that pushed secrets into lambda functions. Just an idea.

And then there's network connectivity. This extension assumes the lamba has connectivity to Vault. If the Vault instance is on a private network as a matter of company policy, then all lambdas that use this would need VPC connectivity, even if they didn't actually need any other VPC resources. Personally, I've used public-facing lambdas with no VPC connectivity to ingest inbound webhooks and send them to an SQS queue, a use case that needs no VPC connectivity. I didn't dig deep enough into the code, but if there was a way to invoke some sort of standing variant of the extension as a lambda, presumably, any lambda with the right IAM could invoke this lambda, which would have the necessary connectivity. There'd be some assume role nuance or something to ensure that any one lambda could only request its secrets, but that seems to be a solvable problem. Depending on how capabable and robust some sort of dedicated lambda version of this extension could be, it could even make sense to not run the vault extension on every lambda, and instead simply invoke a standing lambda proxy to Vault in all cases. This could also solve the risk of overwhelming the Vault server to, as the invoked lambda could have some concurrency limit set.

I hope I've explained myself well. This might be my first comment on any public Github codebase. I'm a huge fan of HashiCorp's software. It's entirely possible that this has been considered and maybe would make the code too complex. Please do not take my lack of direct code contribution as anything other than due to my relative inexperience with Golang. However, if it's the case that even a novice attempt at first run at this change would make it easier or faster, I'd be happy to give it a Go (see what I did there?).