A way to refresh provider credentials

dak1n1 commented 3 years ago

Current Terraform Version

v0.15.5

Use-cases

I'm one of the maintainers of the Kubernetes provider and I'm trying to solve an issue that we see pretty often. Users would like to be able to pass credentials from other providers into the Kubernetes and Helm providers. This allows them to fetch their cluster credentials from EKS, AKS, GKE, etc, and pass them into the Kubernetes provider. The problem is that when they do this in a single apply with their cluster build, they run into the issue where they can't pass an unknown value to a provider configuration block. It's because this is not supported in Terraform Core. To quote the docs:

You can use expressions in the values of these configuration arguments, 
but can only reference values that are known before the configuration is applied.

TL;DR: I would like to be able to pass unknown values into the provider block via data sources which are refreshed after provider initialization but before each CRUD operation. This would keep the credentials "fresh" and mitigate credential expiration.

Attempted Solutions

If the configuration is set up with the right dependencies, I can successfully build an EKS cluster and put Kubernetes resources on top of it in a single apply, despite not knowing the EKS credentials prior to apply time. However, since this is not supported, it does not work reliably in subsequent applies when the credentials change.

As a work-around, I tried removing the Kubernetes provider resources from state. Removing the Kubernetes provider resources from state did solve the problem in many cases, but this manual intervention is not intuitive to users, and it sometimes has unwanted consequences like orphaned infrastructure.

Many other work-arounds were tried in an attempt to accommodate our users who prefer to use the unsupported single-apply configurations. But the end result is that work-arounds place too much of an operational burden on the user. And the alternative (supported configuration of using two applies or two Terraform states) places a requirement on the user to maintain a separate Terraform State for the cluster infrastructure, which becomes burdensome if the user needs different environments for Dev/Stage/Prod. The number of managed states then goes from 3 to 6 (dev/stage/prod state for their cluster infrastructure resources and dev/stage/prod for the Kubernetes resources). They would need to separate out their databases similarly, since the database providers also read in credentials from the cluster infrastructure. The users are understandably burdened by this.

Proposal

I'm hoping Core can consider the idea of refreshing the data sources used in a provider prior to using that provider. For example, in this provider block, I would want data.aws_eks_cluster and data.aws_eks_cluster_auth to be refreshed prior to running any CRUD operations using the Kubernetes provider:

provider "kubernetes" {
  host                   = data.aws_eks_cluster.default.endpoint
  cluster_ca_certificate = base64decode(data.aws_eks_cluster.default.certificate_authority[0].data)
  token                  = data.aws_eks_cluster_auth.default.token
}

The token needs to be refreshed before any CRUD operations take place because the token expires every 15 minutes, and because it may have changed during apply.

References

jbardin commented 3 years ago

Thanks @dak1n1, this is a good set of example use cases to go by. I think we have a couple different issues here if I understand correctly. Not being able to pass an unknown value to a provider during apply is separate from refreshing the credentials being passed to a provider, and would need to be solved in different ways.

The fundamental issue with the unknown values is not that Terraform isn't refreshing data sources before calling the provider (in fact they are always read as soon as possible in conjunction with configuring the provider as late as possible), but if the provider depends on resources that are not yet created, there is no way to read that data which does not yet exist. The only feasible method for configuring providers with resource data which is not known until after apply is to apply in multiple stages. This has to be done manually at the moment, but we are using #4149 to track any possible methods for extending terraform to do this in the future. This may be in the form of automatically targeting the required resources and creating multiple plans internally, or it may be done with other workflow tools which have yet to be devised. There is a lot of design work to do around a feature like this, because not only would these be large changes to Terraform, but these new workflows would not fit into the current automation workflows like Terraform Cloud.

You have also mentioned the expiration of credentials being passed to a provider, which would need a separate solution, since there is no mechanism within terraform to accomplish this right now. In order to create a consistent plan, and ensure that we apply exactly what is indicated in the plan, data sources can only be read once in the process. If a data source were to return different values on each read, and they were updated multiple times during the process, we wouldn't have a way to create a stable plan with which to apply. We also run into the problem that providers are not necessarily equipped to be configured multiple times, or concurrently if this refresh needs to happen during the apply operation. Since changes are always applied concurrently when possible, there is always the likelihood that API calls are in flight at any moment, and we ave no method for reconfiguring a provider that is already in use.

In order to do this within a Terraform configuration we would first need a new construct of some sort which expects to be called multiple times, maybe some type of provider-hosted function similar to a data source (which of course requires support from providers and the SDKs). We would also need a safe way to concurrently refresh these updated credentials in providers through the SDK.

I'm inclined to target this enhancement request more towards solutions for refreshing of credentials, since the progressive apply concepts have been discussed elsewhere. Having a solution to do that would also benefit the cases where apply operations can take longer than the lifetime of the credentials themselves, or when credentials are generated during plan but expire before that plan can be applied.

dak1n1 commented 3 years ago

Thanks @dak1n1, this is a good set of example use cases to go by. I think we have a couple different issues here if I understand correctly. Not being able to pass an unknown value to a provider during apply is separate from refreshing the credentials being passed to a provider, and would need to be solved in different ways.

Ah, I see, let's focus on the refresh part then. I think that's the issue really plaguing my users.

they are always read as soon as possible in conjunction with configuring the provider as late as possible)

This is why the first apply works -- because the data source that reads the EKS cluster information is not called until after the cluster is built. That is, if you set up dependencies correctly. If you don't, then you hit the issue of "unknown values". But yeah, let's put aside that case, as you said, since it's covered in #4149.

but if the provider depends on resources that are not yet created, there is no way to read that data which does not yet exist. The only feasible method for configuring providers with resource data which is not known until after apply is to apply in multiple stages.

This makes sense, but this particular scenario (for my users) mainly comes up when something changes the EKS cluster which the data source reads to fetch the credentials, which are passed into the Kubernetes provider. That's why I thought that refreshing the data source multiple times during apply might fix this. Like refresh prior to each call of the Kubernetes provider. (I'm not sure if that's an easier thing to accomplish in Terraform core, but I wanted to propose the idea since the solution of "multiple apply stages" seemed to be a very complicated engineering feat that might take a long time to implement).

You have also mentioned the expiration of credentials being passed to a provider, which would need a separate solution, since there is no mechanism within terraform to accomplish this right now. In order to create a consistent plan, and ensure that we apply exactly what is indicated in the plan, data sources can only be read once in the process.

Ohh, of course. 🤦🏻 Ok, so what I'm suggesting would give users one plan and then apply something different! That isn't good. Thanks for explaining. Although maybe just for credentials like this, being passed into a provider specifically, showing the values as known after apply might work.

If a data source were to return different values on each read, and they were updated multiple times during the process, we wouldn't have a way to create a stable to with which to apply. We also run into the problem that providers are not necessarily equipped to be configured multiple times, or concurrently if this refresh needs to happen during the apply operation. Since changes are always applied concurrently when possible, there is always the likelihood that API calls are in flight at any moment, and we ave no method for reconfiguring a provider that is already in use.

There is this concept of "deferred initialization" that we use in the Kubernetes provider which might help here. Before we had deferred initialization, the provider failed right away when the credentials were missing. But now we don't create the Kubernetes API Client until a CRUD operation is being performed. That helps us to read the credentials later in the process. Here's where it was implemented, in case that helps: https://github.com/hashicorp/terraform-provider-kubernetes/commit/ea15241b71ea3490988e5babb3d580234af6dfeb

I'm inclined to target this enhancement request more towards solutions for refreshing of credentials, since the progressive apply concepts have been discussed elsewhere. Having a solution to do that would also benefit the cases where apply operations can take longer than the lifetime of the credentials themselves, or when credentials are generated during plan but expire before that plan can be applied.

Perfect. Thank you!

apparentlymart commented 3 years ago

This use-case makes me think of an early proposal I wrote up internally a while back, called "Ephemeral Resources". We haven't had a chance to iterate on it any further yet so please do take it with a giant pinch of salt, but just for the sake of keeping all of this related discussion in one spot I've exported the proposal text I originally wrote: Ephemeral Resources.

After some initial discussion about that proposal I heard feedback which seems even more relevant to this particular issue: explicit expiration/renew times on ephemeral resources. I've been intending to write another draft of this which includes a way for the provider to include a "valid until" or "renew after" timestamp in the response, such that Terraform would make an additional request to the provider if a particular operation takes long enough to exceed the validity period of the ephemeral object, allowing the provider to "refresh" or "renew" the object where possible, or in the unhappy path to detect more promptly that the object cannot be renewed and thus not to try any other operations with it.

This does have most of the same knock-on effects already described in this issue. In particular, Terraform Core would need to track which other objects depend on expired ephemeral resources (active provider configurations) and understand that it needs to close and re-open the provider with a different configuration before using it any further.

However, I had proposed this as a new resource mode rather than simply a new use-case for data resources in order to allow us the flexibility to define some different rules for them that better meet the use-cases, to make it clear that their behavior would be quite different, and to constrain their use to situations where the new behavior makes sense.

However, this idea still needs considerably more research and iteration before we could move forward with it. I'm sharing it here not as a concrete proposal to implement (please don't try to write a PR for this without further discussion!) but just to avoid the discussion getting split into two parts.

dsogari commented 3 years ago

Hi everybody! Would it be possible to allow credential configuration in the resources themselves? For example, it would be great if I could could declare a Kubernetes resource with the following config:

resource "kubernetes_storage_class" "example" {
  # credential config
  config_path    = var.kubeconfig_filename
  config_context = var.kubeconfig_context

  # resource config
  ...
  depends_on = [...]
}

By default, if not set, these additional arguments could be filled from the provider configuration. I understand that this is not an elegant solution, but it may cover other usage scenarios, such as deploying to different clusters created in the same apply operation (although this could be achieved with provider aliases, if the original issue could be solved elegantly). Plus, if Terraform somehow allowed sharing configuration between resources (much like a YAML anchor), the boilerplate code would be limited in this palliative solution.

Please correct me if I'm wrong in my assumptions.

P.S.: in the example above, the kubeconfig file is generated during the apply operation from depended-on resources (only the filename and context are parameterized with input variables).

atamgp commented 10 months ago

Is this still relevant?

What about the exec plugin?

E.g.

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws "
      args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
    }
  }
}

What I personally miss is that the token retrieved from the exec can still timeout after e.g. 10 min. So a refresh option where you specify the time to reexecute would be usefull?

atamgp commented 10 months ago

Is this still relevant?

What about the exec plugin? exec-plugins

E.g.

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws "
      args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
    }
  }
}

What I personally miss is that the token retrieved from the exec can still timeout after e.g. 10 min. So a refresh option where you specify the time to reexecute would be usefull?

gmauleon commented 7 months ago

I believe the solution to the problem mentioned here, could eventually solve my use case as well. Consider a CI pipeline where you have one job for the Terraform plan that save the plan file and one job for the Terraform apply that uses the plan file generated before.

In my case I use Vault data sources to generate short lived credentials for my other providers, but the main Vault token used in the Vault provider is revoked when a job end. In that case, on the Terraform apply with a plan file, the creds acquired from Vault data source in the preceding job will have been revoked, so I'm forced to plan/apply again in the apply phase.

apparentlymart commented 7 months ago

I am working on a prototype that includes features similar to what I described in an earlier comment over in https://github.com/hashicorp/terraform/pull/35078. If that prototype is successful then I expect that ephemeral values broadly, and ephemeral resources specifically, will be a plausible solution to this issue.

However, it's still early in that prototype and so there is a significant possibility that I will encounter "unknown unknowns" that will then require additional design iteration.

FarhadF commented 4 months ago

we have similar issue, we have a lot of kubernetes resources that we are applying via terraform and sometimes it takes more than 15 minutes (token expiry). Would be nice if provider would refresh the token in the background before hitting the expiry.

hashicorp / terraform