hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.43k stars 9.51k forks source link

Configuring one provider with a dynamic attribute from another (was: depends_on for providers) #2430

Closed dupuy closed 4 months ago

dupuy commented 9 years ago

This issue was inspired by this question on Google Groups.

I've got some Terraform code that doesn't work because the EC2 instance running the Docker daemon doesn't exist yet so I get "* Error pinging Docker server: Get http://${aws_instance.docker.public_ip}:2375/_ping: dial tcp: lookup ${aws_instance.docker.public_ip}: no such host" if I run plan or apply.

There are providers (docker and consul - theoretically also openstack but that's a stretch) that can be implemented with Terraform itself using other providers like AWS; if there are other resources in a Terraform deployment that use the (docker or consul) provider they cannot be provisioned or managed in any way until and unless the other resources that implement the docker server or consul cluster have been successfully provisioned.

If there were a depends_on clause for providers like docker and consul, this kind of dependency could be managed automatically. In the absence of this, it may be possible to add depends_on clauses for all the resources using the docker or consul provider, but that does not fully address the problem as Terraform will attempt (and fail, if they are not already provisioned) to discover the state of the docker/consul resources during the planning stage, long before it has completed the computation of dependencies. Multiple plan/apply runs may be able to resolve that specific problem, but having a depends_on clause for providers would allow everything to be managed in a single pass.

abdennour commented 4 years ago

After checking the majority of solutions, I can see that all your workaround are implemented according to the provider(s).

There is no a single of truth from Terraform side how to make it work.

I am struggling , not with helm provider, but with helmfile provider. The helmfile provider does not require any argument. as consequence, there is no workaround :(

However, I adopt the solution of BastienM .

containerpope commented 4 years ago

@abdennour the new terraform 13 beta already includes the needed feature: see here

Terraform 0.13 highlights include:

  • Module-centric workflows are getting a boost with the count, depends_on, and for_each features of the Terraform confirmation language.
Jasper-Ben commented 4 years ago

@abdennour the new terraform 13 beta already includes the needed feature: see here

Terraform 0.13 highlights include:

  • Module-centric workflows are getting a boost with the count, depends_on, and for_each features of the Terraform confirmation language.

@mkjoerg does it already, though? I tried installing Terraform v0.13.0-beta3 and adding a depends_on argument to a provider block still throws the following error:

Error: Reserved argument name in provider block

  on main.tf line 53, in provider "kubernetes":
  53:     depends_on          = [

The provider argument name "depends_on" is reserved for use by Terraform in a
future version.

P.S.: I just noticed your comment was referring to the usage within modules. I'd hoped however, that this would work with providers as well :disappointed:

chrisjaimon2012 commented 4 years ago

Similar issue, I'm trying to create an account using a provider, and then use another provider to deploy resources into it. Is there any other way to get this working?


### master account's provider ###
provider "aws" {
  region = var.aws_region
}
resource "aws_organizations_account" "account" {
  name  = "my_new_account"
  email = "john@doe.org"
  iam_user_access_to_billing = "ALLOW"
  role_name                  = "org-access"
  parent_id                  = var.root_id
}

### member accounts provider ###
provider "aws" {
  region  = var.aws_region
  alias   = "infra"
  assume_role {
    role_arn = "arn:aws:iam::${aws_organizations_account.account.id}:role/org-access"
  }
}
resource "aws_s3_bucket" "b" {
  provider = aws.infra
  bucket = "my-tf-test-bucket"
  acl    = "private"
}
MoritzKn commented 4 years ago

@chrisjaimon2012 unfortunately this use case is currently not properly supported. Current way to do this would be using two terraform projects. But it's a good example of why this is needed.

gibsonje commented 4 years ago

This is well discussed so far but adding another scenario.

I create an AWS Managed Active Directory instance, then I want to create AD resources within it using the new active directory provider.

Same situation as creating an RDS instance and wanting to use a provider to manage resources related to that database.

Same situation as creating an EKS cluster and wanting to create kubernetes resources within it.

Some situations it makes more sense to split into a different folder. But sometimes I just want to get this done in one apply. Active Directory is one of those situations.

Lucasjuv commented 4 years ago

Just adding another case here:

I have a terraform config that deploys an AKS instance and installs some helm charts on AKS. So far no problems but when I needed to use the kubernetes-alpha to deploy some CRDs it simply tries to connect to the AKS cluster before it being deployed. Gives me a GRPC Error.

KernelPryanic commented 3 years ago

Just adding another case here:

I have a terraform config that deploys an AKS instance and installs some helm charts on AKS. So far no problems but when I needed to use the kubernetes-alpha to deploy some CRDs it simply tries to connect to the AKS cluster before it being deployed. Gives me a GRPC Error.

Good example, I'm facing that issue right now, trying to install cert-manager and then create cert-issuer with kubernetes-alpha, failing with Error: rpc error: code = Unknown desc = no matches for cert-manager.io/v1alpha2, Resource=Issuer due to the wrong order of resource execution.

ademariag commented 3 years ago

@mtekel @derFunk FYI I was in despair but it appears lazy initialisation for terraform-provider-postgresql is coming soon https://github.com/terraform-providers/terraform-provider-postgresql/pull/199

thenonameguy commented 3 years ago

The above PR for the postgresql provider got merged, but not released. I needed this so I published it as a fork for the time being. You can use it like this:

terraform {
  required_providers {
    postgresql = {
      # FIXME: check new official release for lazy connections: https://github.com/cyrilgdn/terraform-provider-postgresql
      source = "thenonameguy/postgresql"
      version = "2.0.1"
    }
  }
}

provider "postgresql" {
  # Configuration options
}

Thanks for Deep Impact for sponsoring this work.

Hoping for a more general solution eventually.

goodacres commented 3 years ago

Adding another case, although this scenario has already been briefly touched upon above:

If you have a simple TF project which looks to provision a new EKS cluster and then also tries to deploy k8s resources to the newly created cluster, this cannot be achieved through a single "terraform apply". Instead, as it is understood that providers are initialised during the plan phase, the example project would have to be split into two separate configurations, or layers. For anyone who has also come across this, I found the scenario and (a) solution detailed here: https://blog.logrocket.com/dirty-terraform-hacks/ - Break up dependent providers into staged Terraform runs.

From my experience of this scenario in an enterprise setting, I can say that having isolated TF state & configs for different "layers" of an app infrastructure can actually be quite handy. For example, if you are doing a lot of development work regarding TF deployed k8s resources and find yourself doing a lot of destroys & apply's, it is a lot easier to do this if all of your k8s resource config are separated from the rest of your infrastructure which enables those k8s deployments to exist (EKS, ASG, VPC etc..). It saves you having to specify --target as much and also protects the more static parts of your infra that you probably don't want to be dropping/spinning-up every 5 mins..

Opened a discussion @ https://discuss.hashicorp.com/t/eks-gke-aks-kubernetes-resources-provider-dependency/18217

iTaybb commented 3 years ago

Would like to see this supported.

patuzov commented 3 years ago

Just adding another case here:

I have a terraform config that deploys an AKS instance and installs some helm charts on AKS. So far no problems but when I needed to use the kubernetes-alpha to deploy some CRDs it simply tries to connect to the AKS cluster before it being deployed. Gives me a GRPC Error.

"So far no problems" - there is already a problem before you even use the kubernetes-alpha.

After creating the cluster and any resource in it, if you change a field in the cluster configuration that leads to recreate, you will get an error during refresh:

"http://localhost/api/v1/namespaces/ingress": dial tcp 127.0.0.1:80: connect: connection refused

Here is a minimalistic script to reproduce it: gist

Can someone explain what exactly is happening here? Somehow during initial create, it gets correctly the order of creating resources. But what happens at refresh? Why can't it find the cluster at refresh already?

doctor-eval commented 3 years ago

I'm really new to Terraform - but, in the case of Postgresql and Docker providers, at least, the problem seems to be that the provider assumes that it's connecting to some static IP address. I'd suggest that docker and postgresql hosts should be resources, not part of the provider configuration. It actually seemed really weird to me that the configuration for these providers contained a hostname. I don't see why a PG (or Docker) host is different from any other resource.

I know I'm probably missing something super obvious, but in terms of the difference between plan and apply, given a putative "postgresql_db" which refers to a "postgresql_host" which references a "vultr_vps", clearly if the VPS isn't up, the host isn't up, and the database isn't up, so the plan needs to recreate the host and then the database.

Anyway I'm sorry if this has all been hashed out already. For the record, I want to deploy thousands of PG servers and Docker hosts; every VM I spin up needs one of each. In this context, it seems nuts that I'd have to create thousands of providers with unique aliases... it feels like the assumption is that we have one big, precious docker or PG host, but that's not how it works in my place.

lukemarsden commented 3 years ago

I was able to workaround this by specifying depends_on for the output of one module, using that output as an input to another module, and specifying the provider configuration in the second module in terms of that input.

https://www.terraform.io/docs/language/values/outputs.html#depends_on-explicit-output-dependencies

jbg commented 3 years ago

That works, but configuring providers inside modules referred to by other modules is deprecated since v0.11 so I guess it might stop working one day.

lukemarsden commented 3 years ago

@jbg interesting, would you mind sharing a link to where this deprecation is documented please?

To be clear, I'm not configuring one module's provider from within another module. Each module just configures providers that only it uses. It's just that one module outputs a kubeconfig that is used by the second module as provider configuration, and the kubeconfig doesn't exist until the first module has done its provisioning.

In any case, it would be great if terraform provided a supported way to solve the "provision a thing, then provision another thing inside the thing you just provisioned" use case before deprecating the workarounds that we have to support this.

Doing teardown in the right order for this is another pain point, but I've got it working with judicious use of depends_on pointing to whole objects exported between modules to enforce the correct ordering for destroy.

jbg commented 3 years ago

https://www.terraform.io/docs/language/modules/develop/providers.html

A module intended to be called by one or more other modules must not contain any provider blocks, with the exception of the special "proxy provider blocks" discussed under Passing Providers Explicitly below.

For backward compatibility with configurations targeting Terraform v0.10 and earlier Terraform does not produce an error for a provider block in a shared module if the module block only uses features available in Terraform v0.10, but that is a legacy usage pattern that is no longer recommended.

Since v0.11, providers are supposed to only be configured in the root module, and then passed (implicitly or explicitly) into modules that the root module calls.

AFAIK the "supported" ways to do the "provision a thing, then provision another thing inside the thing you just provisioned" workflow are to either apply a set of changes, then add the dependent changes and apply again (not ideal) or to split your terraform configuration into multiple terraform configurations that interact at arms length (which actually works quite well).

lukemarsden commented 3 years ago

Thanks! How is the arms length interaction pattern meant to work, particularly with respect to parameter passing e.g using the output from one configuration, such as a kubeconfig, for the input to another? Is it up to the system driving terraform apply to do that passing or can they share a state file and reference another's output?

I really appreciate your prompt and helpful responses, thank you!

jbg commented 3 years ago

https://www.terraform.io/docs/language/state/remote-state-data.html is one option - pull outputs from one root module as a data source in another root module.

mabunixda commented 3 years ago

Use this within TFC or TFE you can use a similar solution

j94305 commented 3 years ago

This is a somewhat old thread but unfortunately, this is still an issue. Meanwhile, we are resorting to Terragrunt to give the extra layer of defining process steps across Terraform scripts. There should be a different way of modelling some things as resources instead of providers. This way, they can be the result of execution steps and dependencies can be respected accordingly. The model of providers is for resource-type objects (e.g., databases) too limited in the current Terraform framework.

doctor-eval commented 3 years ago

AFAIK the "supported" ways to do the "provision a thing, then provision another thing inside the thing you just provisioned" workflow are to either apply a set of changes, then add the dependent changes and apply again (not ideal) or to split your terraform configuration into multiple terraform configurations that interact at arms length (which actually works quite well).

What I'm trying to do is provision a VPS that runs Docker, and then provision things on Docker. IMO the problem is in the modelling of the Docker provider - moreso than Terraform itself. There doesn't seem to be a good reason that the Docker endpoint isn't just a resource.

I seem to remember that the Postgresql provider works the same way - the provider needs a hostname. But in my environment the provider doesn't even exist until the VPS is created... why is the PG or Docker provider defined at the IP address level when the VPS they live on, isn't?

From the outside it looks like these providers have been built with a specific use-case in mind - "using terraform to manage docker containers" - rather than the general use case that Terraform is so good at - "using terraform to build infra".

gothrek22 commented 3 years ago

Another example up for consideration:

resource "null_resource" "kubeconfig" {
  provisioner "local-exec" {
    command = <<LOCAL_EXEC
export KUBECONFIG="${path.root}/kubeconfig"
aws eks update-kubeconfig eks-cluster
LOCAL_EXEC
  }
}

provider "kubernetes" {
  alias            = "eks"
  config_path      = "${path.root}/kubeconfig"
  depends_on = [
    null_resource.kubeconfig
  ]
}

provider "helm" {
  alias = "eks"
  kubernetes {
    config_path      = "${path.root}/kubeconfig"
  }

  depends_on = [
    null_resource.kubeconfig
  ]
}

Instead of repeating an almost the same provider definition when using both kubernetes and helm providers, allow making things a bit more dry by using a depends on between a resource and providers.

villesau commented 3 years ago

In addition to Kube configs, also e.g cloud SQL would benefit of this. You probably want to grant DB privileges for your IAM users, but for that you need postgres provider and he data for that depends on newly created database instance.

lukemarsden commented 3 years ago

I put together a draft spec and tooling called "terrachain" to make it easier to do the supported thing of running multiple apply steps in sequence, passing the output of each to the input of the next, to solve this issue.

https://combinator.ml/terrachain

https://github.com/combinator-ml/combinator/blob/main/docs/terrachain.md

Feedback welcome and happy to open source the couple reference implementations I've got (one in golang and the original prototype in python) if folks are interested?

On Sun, 8 Aug 2021, 00:57 Ville Saukkonen, @.***> wrote:

In addition to Kube configs, also e.g cloud SQL would benefit of this. You probably want to grant DB privileges for your IAM users, but for that you need postgres provider and he data for that depends on newly created database instance.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform/issues/2430#issuecomment-894719943, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACATUVECHB6IW6MW3FS7SDT3XB5NANCNFSM4BJGPOGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

MoritzKn commented 3 years ago

@lukemarsden I've been following this issue for a while now and I think it's clear that this would be a desirable feature even though it's hard to implement and contrary to some of terraforms basic principles. In my opinion, outside tooling would be a great way to explore the possibility space and find out how to solve the issue of terraform projects that need to be applied multiple times because of provider dependencies.

I myself contemplated a couple of times already whether I should sit down and write up a draft and proof of concept for this.

I think it would be great if you could open source your work and maybe one day we can actually get something like this landed in the mainline terraform CLI.

RealityAnomaly commented 3 years ago

I put together a draft spec and tooling called "terrachain" to make it easier to do the supported thing of running multiple apply steps in sequence, passing the output of each to the input of the next, to solve this issue. https://combinator.ml/terrachain https://github.com/combinator-ml/combinator/blob/main/docs/terrachain.md Feedback welcome and happy to open source the couple reference implementations I've got (one in golang and the original prototype in python) if folks are interested? On Sun, 8 Aug 2021, 00:57 Ville Saukkonen, @.***> wrote: In addition to Kube configs, also e.g cloud SQL would benefit of this. You probably want to grant DB privileges for your IAM users, but for that you need postgres provider and he data for that depends on newly created database instance. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2430 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACATUVECHB6IW6MW3FS7SDT3XB5NANCNFSM4BJGPOGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Please open source this! It would be great to have.

bodinsamuel commented 3 years ago

Also Pushing for this 👍🏻 since everyone is looking for good reason, here is another one:

Secrets are better stored in Vault, hashicorp won't say the opposite. So it's a pity that we can not wait for vault to be ready for others providers that require secrets.

# Not possible right now:

provider "vault" {
  address            = "<redacted>"
}

data "vault_generic_secret" "DATADOG_API_KEY" {
  path = "secret/DATADOG_API_KEY"
}

provider "datadog" {
  api_key = data.vault_generic_secret.DATADOG_API_KEY
}
dsogari commented 3 years ago

While I agree with @godacre's comment about having separate Terraform states for different layers of the infrastructure, sometimes this separation may not be so clean-cut. For example, I might want to launch a K8s cluster and then deploy a CNI plugin to it, and I'd like to treat this combo as pertaining to the same "layer".

The more I think about this issue, the more I come to agree with @doctor-eval's comments (here and here). Currently, many providers are implemented in such a way that requires specifying shared configuration (in the form of credentials, URLs, certificates, addresses) for access to a presumably existing infrastructure. They miss the point of being able to provision the underlying infrastructure in the same terraform apply operation.

IMO there are two things that could be done to help us users:

chrisroat commented 3 years ago

The solution by @lukemarsden fits a good deal of use cases.

One can create two modules: one for infra and one for applications. The infra module exposes outputs which contain the necessary data for creating a provider for applications, and the root module can pass that data to the application module. The application module takes that data as input variables -- it has the correct dependencies, and it create its own provider.

This seems to solve a general class of problems brought up on this thread. Can we ask the terraform folks to hold off on deprecating the ability for child modules to create providers?

jbg commented 3 years ago

@dsogari

... and I'd like to treat this combo as pertaining to the same "layer"

This is, basically, the issue though. They're not the same "layer". If you have resources that require the k8s cluster to be up and running, then they exist in a "layer" above the "layer" with the k8s cluster. CNI plugins often have multiple parts (e.g. executables that must exist on the nodes, plus k8s objects that must be deployed to the k8s cluster), and those parts may exist in different "layers".

The solution of separate states for separate "layers" of infra is clean, works well today, and helps to encourage good architecture of infrastructure. I can imagine it feels a bit "heavy" for smaller infrastructure, but maybe some better tooling would help with that.

donwlewis commented 2 years ago

Another good use case for this that I have run into is when creating Azure subscriptions with the azurerm_subscription resource. In the scenario, you could be creating a subscription that you then need to create resources in. This would allow for the subscription ID to be passed once it was created. Unless someone knows of another way of doing this. The only way I know if is to target the specific subscription first, but targeting resources seems to be discouraged.

jbg commented 2 years ago

It's another version of the same thing, yes. If you want to do it all in the one terraform state, you can use target to apply the subscription first and then apply the rest. But a much nicer way to do it is to recognise that these are different layers of infrastructure and have a separate terraform state for each layer.

lukemarsden commented 2 years ago

I've just got back from parental leave but it sounds like there's some interest in the terrachain approach, I'll tidy things up and put some code and a demo on GitHub when I get a chance.

On Sat, Oct 9, 2021 at 5:32 AM Jasper @.***> wrote:

It's another version of the same thing, yes. If you want to do it all in the one terraform state, you can use target to apply the subscription first and then apply the rest. But a much nicer way to do it is to recognise that these are different layers of infrastructure and have a separate terraform state for each layer.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform/issues/2430#issuecomment-939224473, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACATUTSZQRPGAMFDFF265TUF7AWNANCNFSM4BJGPOGQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- lukemarsden.net @.*** @lmarsden https://twitter.com/lmarsden

uripre commented 2 years ago

I got to be honest, it's is so embarrassing that Terraform doesn't support this. Such a basic functionality IMO. Would have chosen AWS CDK in retrospect.

alexott commented 2 years ago

We hit the same issue in the Databricks terraform provider - we need to refer to the URL of the workspace to be created, but because it's not yet created, Terraform will try to create Dataricks objects, like users, and will fail because no workspace yet exists. The workaround is to add depends_on to all databricks resources, but that's errorprone

ryanisnan commented 2 years ago

We've hit the same issue trying to provision kubernetes clusters in a module, and then also use the helm provider to also install default configuration into the cluster. This cannot be done in single applies right now.

marcus-sa commented 2 years ago

I got to be honest, it's is so embarrassing that Terraform doesn't support this. Such a basic functionality IMO. Would have chosen AWS CDK in retrospect.

I'm considering migrating to Pulumi instead.

In Terraform for providers to depend on other resources, you'd first have to create those resources in a different stack and use remote state to get the resource outputs. So much hassle for something that should be so simple.

hajdukd commented 2 years ago

Anyone knows whether terragrunt allows for that ? To first run creation of X set of resources to be able to use them in generated provider ?

xwing3 commented 2 years ago

seems that this feature will never happen...

grzegorzjudas commented 2 years ago

I'm not sure yet another example would help here, but here's mine:

provider "kubernetes" {
  host                   = aws_eks_cluster.eks_cluster.endpoint
  cluster_ca_certificate = base64decode(aws_eks_cluster.eks_cluster.certificate_authority[0].data)

  exec {
    api_version = "client.authentication.k8s.io/v1alpha1"
    args        = ["eks", "get-token", "--cluster-name", "eks-cluster"]
    command     = "aws"
  }
}

resource "aws_eks_cluster" "eks_cluster" {
  name = "eks-cluster"
  role_arn = aws_iam_role.eks_cluster_role.arn
  version = "1.21"

  vpc_config {
    subnet_ids = flatten([
      aws_subnet.eks_public_subnet.id,
      aws_subnet.eks_private_subnet[*].id
    ])

    security_group_ids = [
      aws_security_group.eks_node_secgroup.id
    ]
  }

  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_role_policy_attachment
  ]
}

As you can see, it'd be perfect if I could init kubernetes provider with output from eks_cluster after it's applied.

jbg commented 2 years ago

@grzegorzjudas It wouldn't in fact be perfect. EKS tokens last for 15 minutes. Terraform data sources can't change value between plan and apply (that's by design: otherwise apply could perform different actions than what was planned).

If you use an aws_eks_cluster_auth data source to provide the authentication for the provider, your apply will have to always finish within 15 minutes of the plan starting, otherwise your token will have expired and you'll get spurious errors. This is probably incompatible with many (most?) CI/CD workflows.

The exec solution as you have shown here is actually the better way to do it since it means the provider can get a new token at the start of apply. One improvement would be to pass the cluster name from the resource into the exec args rather than hard-coding it.

grzegorzjudas commented 2 years ago

That makes sense. Still, even changing this, the originally requested feature of depends_on for providers could be used here, as I'd only want to initialize the provider once the EKS gets created.

jbg commented 2 years ago

Problem is, the provider needs to produce the plan. So if you create the EKS cluster during apply phase, there is no way for the kubernetes provider to produce a plan because it has no access to the cluster during plan phase (since it doesn't exist yet). You have to do it in two steps: plan cluster creation, apply, plan resources within cluster, apply. This is why the general recommendation is to split your state into 'layers', which solves this problem cleanly without needing to repeatedly plan+apply the same state.

Some kind of depends_on for providers alone doesn't solve this. You would need Terraform to grow the ability to do some kind of multi-stage series of plan+apply, or defer some part of planning into the apply phase. IMO neither of those are good solutions, the status quo is better. What is really needed is more documentation, and better tooling (probably outside the core Terraform project), around how to manage multiple states for different layers of infrastructure.

j94305 commented 2 years ago

I don't see a general contradiction between the plan/apply approach and the need for provider dependencies.

If you take the postgresql provider as an example, you have to specify connection parameters already in the provider definition itself, i.e., this requires the resource to exist at the time of the provider instantiation. In a way, the requirement to specify actual connection information in this place already breaks the declarative paradigm because there is an implicit order of execution: provider initialisation comes before plan execution.

If this was a simple instantiation of a PostgreSQL provider (just knowing how to interact with a database of this sort, the database itself possibly but not necessarily having been created somewhere else), the connection could be treated as a resource. Subsequent operations to ascertain a schema, privileges, etc. could depend on this resource. This approach would allow resource planning to respect the dependencies of resources within a provider block on the implicit resources generated at the provider level.

However, somebody has chosen to have some provider-level resources allocated during the initialisation phase of providers, thus making them exempt from proper planning. It seems to me, the root cause of our issues is the placement of such initialisation-time resources with providers, making them unplannable by prohibiting them from being elements in the resource dependencies graph.

Unfortunately, I don't speak Go well enough to make these modifications to the existing providers myself.

jbg commented 2 years ago

Producing a plan may require reading data sources, so providers need to be already initialised.

Being able to defer knowing postgres/k8s/etc connection details later in the process than planning implies that either a) you can't have the plan depend on anything that requires provider initialisation (which means no data sources) or b) you defer some of the planning to apply time (which would break the model of knowing what you are applying before you apply it).

A contrived example using the kubernetes provider:

data "kubernetes_all_namespaces" "all" {}

# Inject a secret into every namespace
resource "kubernetes_secret_v1" "super_secret" {
  for_each = toset(data.kubernetes_all_namespaces.all.namespaces)

  metadata {
    name = "super-secret"
    namespace = each.value
  }

  data = { "super" = "secret" }
}

You can't plan this without already having an initialised kubernetes provider with access to the cluster. These kind of patterns (obviously less silly than this one) are very common in real TF configs.

In general there is no problem with establishing dependencies between providers and resources (implicitly, by having the provider configured based on attributes of resources). It's just that all those dependencies must be able to be resolved during the plan phase. You can ensure this is always the case when you use separate states for separate layers of infra, because you never create a resource in the same state as providers that depend on it.

The problems come when the provider depends on attributes that may be unknown until apply (e.g. creating or modifying a k8s cluster in the same state as the provider that connects to it, or creating or modifying a PostgreSQL server in the same state as the provider that connects to it).

grzegorzjudas commented 2 years ago

I see. Well, I'm quite new to terraform, so if depends_on doesn't work and architecturally is debatable, what's the standard/common approach to such cases? Kubernetes provider depending on EKS is one thing, but the chain can easily get longer, if you want for example to spawn postgresql database in the cluster and seed it, which would add a 3rd item in the chain:

aws_eks_cluster -> kubernetes -> postgresql

I doubt (or should I say, I hope) it's not something like "run apply three times manually from different folders".

jbg commented 2 years ago

To do that properly you need to plan & apply three times. That follows simply from the fact that in general the provider can't produce a plan until it's configured.

If you don't like the idea of multiple states, you can do it all in a single state (by adding the 'layers' to your config one at a time, or using -target), but then there is nothing (other than you, manually) handling the dependencies between those resources, and it will be brittle with regard to modifications to those resources later.

It's tidiest to use separate states for each layer of your infrastructure, and use remote_state data sources or other similar techniques to form dependencies between them. Designing this well at the beginning will save you from a lot of pain when your infrastructure grows.

There's no need to make the layers too small. The aws_eks_cluster could live in a state along with other aws_* resources representing your core cloud-managed infrastructure. The postgresql resources might live in the same state as resources representing the deployments of your applications that use those database resources.

The layered approach has other benefits. The requirement to express dependent values through a well-defined interface (e.g. remote_state) forces you to think more clearly about coupling in your infrastructure.

Being separate states, each layer also has its own state locks, allowing the ability to carry out multiple Terraform actions simultaneously when the changes affect different layers. This can be a huge benefit on large infrastructure where the monolithic approach often forces unrelated changes to wait for each other.

apparentlymart commented 2 years ago

Hi again, all!

As I mentioned in my very first comment right back at the start of this, the framing of this as "depends_on for providers" is misleading, because Terraform is respecting dependencies here, but the plan/apply execution model means that respecting dependencies isn't sufficient to get the desired behavior in practice.

To understand why, let's imagine the following configuration which I'm intending to represent the most common situation where we run into trouble today: when a single Terraform configuration is responsible for multiple layers at once, such that the results of creating one layer become arguments to the provider for the second:

provider "happycloud" {
  region = "uswest1"
}

resource "happycloud_compute_cluster" "example" {
  # ...
}

provider "compute" {
  compute_api = happycloud_compute_cluster.example.api_url
}

resource "compute_deployment" "example" {
  # ...
}

Terraform can see that the provider "compute" block depends on the happycloud_compute_cluster.example resource and so will wait until that is planned before configuring the compute provider to plan its own resources. That leads to the following chain of actions, in which I'll use a Terraform-language-like pseudocode notation where (known after apply) means the same thing it does in Terraform's plan output:

happycloud.ConfigureProvider({
  region = "uswest1"
})
happycloud.PlanResourceChange("happycloud_compute_cluster", {
  # ...
})
compute.ConfigureProvider({
  compute_api = (known after apply)
})
compute.PlanResourceChange("compute_deployment", {
  # ...
})

The crucial detail here is that happycloud_compute_cluster's api_url attribute is not predictable until the apply step, and so the compute provider configuration derived from it it is also not fully predictable until the apply step. Terraform does still ask that provider to configure itself at the appropriate time, but it sends the provider an unknown value for its compute_api argument.

What happens next is up to the provider. Some providers choose to ignore that unknown value because they know that their planning actions won't make any API calls anyway, in which case this all works out. Some providers misunderstand the unknown value as an unset value due to a historical design flaw in the old plugin SDK, in which case they might try to use some default API URL like https://127.0.0.1:8443/ and then return an error if it's not reachable. Some providers just explicitly return an error.

What remains unclear here, and what the design effort for addressing this use-case must contend with, is what exactly the compute provider ought to do in this situation.

If we consider the most extreme case where "compute" is standing in for the hashicorp/kubernetes provider, Kubernetes is a system which supports dynamic schema and so the provider can't really do any useful work during planning if it can't reach the API to fetch the schema.

There are some less extreme cases where the schema is still fixed and so the provider could at least validate arguments during the planning phase, but would potentially fail to catch certain problems until the apply phase as we already see for managed resources and apply-time-deferred data resources today.

There is also a Terraform Core lifecycle wrinkle here that some others upthread have mentioned: Terraform currently decides whether to defer a data resource to the apply phase based entirely on its own rules, which essentially mean that if a data resource configuration refers to an unknown value or if it depends on a resource that has actions pending in the current plan then it will be deferred until the apply phase. Terraform Core doesn't give providers any way to decide using their own logic to defer until the apply phase, and so if Terraform determines that a data resource ought to be resolvable then the provider must either return a result for it or return an error. There is currently no compromise position where the provider could say "ask me again during the apply phase". Whether there should be is part of the design space here.

Some (including me) have suggested that it might be better to allow providers in the extreme situation like the Kubernetes provider to return a more extreme version of "unknown" which applies to the entire provider configuration or to an entire resource, where Terraform could then perhaps carve off a chunk of the graph as "deferred" and leave it to be handled on a subsequent plan and apply; that's what #4149 was about. However, that comes with the disadvantage that during initial creation you may need to run terraform apply multiple times in order to fully converge, where each run would plan and create gradually more objects similar to if you had manually used -target to force Terraform to create a partial plan.

Others might argue that a provider should "just" configure as normal and then return unknown values for anything it can't predict due to its configuration being incomplete. That would work within Terraform Core's current model (aside from the data resources quirk I mentioned above), but it isn't entirely clear what plugin SDK design would support such an approach, to avoid provider developers needing to separately handle that partially-configured situation in the planning logic for every resource type.

I mentioned in passing above that the old SDK had a design flaw where it would prevent providers from distinguishing between a provider configuration argument being unset or being unknown. That was represented as hashicorp/terraform-plugin-sdk#331 and ultimately deemed unfixable within the compatibility constraints of the old SDK, but the new framework gives the provider code access to that distinction, leaving it up to the provider developer to decide how to handle it. So far, the most elaborate provider using that new framework has been the AWS Cloud Control provider, which chooses to return an error if its configuration contains any unknown values.

What's needed for progress on this issue is some consensus (across provider developers, SDK developers, and Terraform Core developers) about what providers ought to do when they get partially-unknown configuration (whether that be a single rule for all providers, or something more nuanced for different situations), and then from there some design effort to understand what SDK/Framework abstractions would allow provider developers to meet that expectation in an intuitive way, and potentially introducing new mechanisms into Terraform Core if we conclude that this situation warrants some different UI feedback or a change to the usual plan/apply workflow.


I had originally left this issue titled "depends_on for providers" even though that's a misleading framing of the problem because it seemed like that was the framing that many people who wanted to open issues about it were coming from, and thus it allowed them to find this issue even though the title isn't accurate, and thus avoid creating duplicate issues. However, I can see that leaving it this way is causing the discussion to repeatedly veer back into "why not just add depends_on support", even though that isn't really a productive direction for the conversation.

Given that, I'm going to retitle this in a way that describes the underlying use-case we're talking about, rather than a particular proposed solution to it, but keep the original title text in there to make this stay searchable.