hashicorp / terraform-provider-vault

Terraform Vault provider
https://www.terraform.io/docs/providers/vault/
Mozilla Public License 2.0
458 stars 538 forks source link

Vault provider lookup-self on 127.0.0.1 instead of provided vault address in plan phase #829

Open t3hami opened 4 years ago

t3hami commented 4 years ago

Hi there,

I'm using Terraform to create GKE cluster, deploy vault helm charts in the cluster, initialise vault and then create policies, auth, secrets etc. I'm passing vault address (data.kubernetes_service.vault.load_balancer_ingress.0.ip) to vault provider which is coming from kubernetes service data source. The problem is when I use terraform plan, terraform hits some local URL https://127.0.0.1:8200/v1/auth/token/lookup-self instead of going to the URL which will be fed by the kubernetes service data source (The thing is terraform can't use this as the GKE isn't deployed and the kubernetes service data depends on that). When I set VAULT_ADDR to my local vault, it passes the error and then I use terraform apply which also works fine. The terraform documentation says that it automatically handles the depends_on graph when you use data from one resource to another, as it knows what to create first. I need a way to ignore vault lookup-self at the time of terraform plan.

Vault provider

provider "vault" {
  address = "http://${data.kubernetes_service.vault.load_balancer_ingress.0.ip}"
}

Note: I'm using depends_on = [null_resource.vault_init] in all vault resources.

matttrach commented 3 years ago

I have found that if you want to provision Vault and configure it in the same Terraform file/directory it does not handle the dependencies properly as @t3hami described.

I would like to use modules which provision servers and modules which configure Vault after those servers are ready, but this provider doesn't seem to respect module dependency.

I am using Terraform 13 and module dependencies as described here: https://github.com/hashicorp/terraform/tree/guide-v0.13-beta/module-depends

GJKrupa commented 3 years ago

I'm seeing the same thing if I run an import and pass a hard-coded variable for the address into my submodule so I don't think this is a dependency issue

provider "vault" {
  address      = var.vault_url
  token        = var.vault_token
  ca_cert_file = "certs/my-ca.pem"
}

What it is using is the VAULT_ADDR environment variable if that's set.

mcanevet commented 3 years ago

Same problem here. I looked at the code but I could not figure out exactly what happens. I can see here that the provider configuration function calls DefaultConfig function of the api which configure the client to use http://127.0.0.1:8200 as default address here. I guess that at the time the provider is configured, address is empty, and hence it does not override with the proper server address. I'm not sure on what side this should be fixed, the terraform provider or the API.

The thing I'm wondering is how when the provider initialization is supposed to be done. The pattern of configuring a provider with outputs of a resource clearly work for some providers (an example with the Kubernetes provider here), but it clearly does not work with the Vault provider (even with proper dependencies set to every vault resources).

Maybe some advices from a Terraform ninja could be welcome here. /cc @apparentlymart

mcanevet commented 3 years ago

I have a minimal working example to reproduce this : https://gist.github.com/mcanevet/f698b53a32ac28a03b729c40d9d07b9f When removing vault_* resources, it works, but when trying to create vault_* resources I get Error: Get "https://127.0.0.1:8200/v1/auth/token/lookup-self": dial tcp 127.0.0.1:8200: connect: connection refused. If I add back the lines after the Vault is up, everything works fine.

mcanevet commented 3 years ago

I think this "feature" is not officially supported yet (https://github.com/hashicorp/terraform/issues/4149), but somehow works for some providers.

apparentlymart commented 3 years ago

I think the root cause here is that the current Terraform SDK (which has no real name of its own, but we often call it helper/schema) doesn't handle the case when provider arguments are unknown, and instead treats them as if they aren't set at all. A provider that then tries to make use of these values in its configuration step can run into trouble, because it can mistakenly apply a default value as seems to be happening here with the vault provider assuming 127.0.0.1.

A way that other providers manage to avoid this situation is by deferring their connection until later on, when they are ready to perform an operation. For example, the hashicorp/mysql provider doesn't connect to the server until it's performing a real action, such as creating an object. Because most operations in that provider don't happen until the apply step, it rarely encounters the situation where its configuration is incomplete.

The vault provider could potentially take a similar strategy, but I don't think it would work out so well for this provider because it has a lot of data sources that are typically read during planning, and so that requires the provider configuration to be complete even to complete the plan operation.

I'm not familiar enough with the SDK implementation details to know if there's some way for the vault provider to actually detect when its address argument is unknown and treat that as different than it being unset. If so, it could potentially return an error explaining that the address argument must be known during planning, similar to what Terraform itself generates for unknown values in count and for_each, so it would at least fail explicitly rather than just doing something confusing and unexpected.


As @mcanevet noted, hashicorp/terraform#4149 is one way this might be addressed in the long run, by deferring certain operations entirely until a first round of changes have been applied. There is no plan to implement that in the short term because it's a significant change to Terraform's typical workflow (it might be necessary to run terraform apply multiple times to fully apply the plan, which is unprecedented), and once the Terraform Core team has more time to research that area further we're hoping to find other technical designs that don't have that disadvantage, though it remains to be seen what other designs are possible.

As I proposed it, hashicorp/terraform#4149 is basically the same as running Terraform with the -target option except that Terraform would calculate the necessary targets automatically and print out information about what it excluded and why. Given that, you can get the same effect today by explicitly adding the -target option. That is inconvenient when you're running Terraform in automation, but in practice I've seen that workaround work for most folks because typically it's only needed once when initially bootstrapping a configuration, unless they end up later on recreating some foundational object like a Kubernetes cluster or MySQL server.

Splitting the configuration into two parts that can be applied separately in sequence is the most robust, repeatable answer with today's Terraform.

seanamos commented 2 years ago

For those who are running into this, there is a workaround. You can set skip_child_token = true.

Be aware of the potential security implications when using this workaround: https://registry.terraform.io/providers/hashicorp/vault/latest/docs#skip_child_token https://registry.terraform.io/providers/hashicorp/vault/latest/docs#using-vault-credentials-in-terraform-configuration

It seems the vault provider wants to do a token capabilities lookup, probably to check if it can create child tokens, but this happens regardless of resource dependencies. It will do this lookup out of order and can end up using empty/default values (address/token).

I know providers having dependencies isn't really something Terraform fully supports at the moment. However, it is something that can be supported/worked around at an individual provider level. The Vault provider is very close to having this working already, it just needs some changes around how it handles child tokens.

The Consul/Nomad providers don't have the same issue and do work well already with Terraform's existing resource dependencies.

4FunAndProfit commented 3 weeks ago

works with opentofu 😇