hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.43k stars 9.51k forks source link

Configuring one provider with a dynamic attribute from another (was: depends_on for providers) #2430

Closed dupuy closed 4 months ago

dupuy commented 9 years ago

This issue was inspired by this question on Google Groups.

I've got some Terraform code that doesn't work because the EC2 instance running the Docker daemon doesn't exist yet so I get "* Error pinging Docker server: Get http://${aws_instance.docker.public_ip}:2375/_ping: dial tcp: lookup ${aws_instance.docker.public_ip}: no such host" if I run plan or apply.

There are providers (docker and consul - theoretically also openstack but that's a stretch) that can be implemented with Terraform itself using other providers like AWS; if there are other resources in a Terraform deployment that use the (docker or consul) provider they cannot be provisioned or managed in any way until and unless the other resources that implement the docker server or consul cluster have been successfully provisioned.

If there were a depends_on clause for providers like docker and consul, this kind of dependency could be managed automatically. In the absence of this, it may be possible to add depends_on clauses for all the resources using the docker or consul provider, but that does not fully address the problem as Terraform will attempt (and fail, if they are not already provisioned) to discover the state of the docker/consul resources during the planning stage, long before it has completed the computation of dependencies. Multiple plan/apply runs may be able to resolve that specific problem, but having a depends_on clause for providers would allow everything to be managed in a single pass.

bitglue commented 8 years ago

fleet is another example of a service for which this is a problem.

That provider used to work: if the ip address is an empty string, then it would use a mock API that failed on everything. But that solution is no longer working in Terraform 0.6.3.

apparentlymart commented 8 years ago

I think this is describing the same issue I wrote up in #2976, in which case unfortunately the problem is a bit more subtle than supporting depends_on on the provider.

Terraform is actually already able to correctly handle provider instantiation in the dependency graph, correctly understanding that (in your example) the docker provider instantiation depends on the completion of the EC2 instance.

The key issue here is that providers need to be instantiated for all operations, not just apply. Thus when terraform plan is run, Terraform will run the plan for the AWS instance, noting that it needs to create it, and then it will try to instantiate the Docker provider to plan the docker_container resource, which of course it can't do because we won't know the AWS instance results until apply.

When I attempted to define this problem in #2976 I was focused on working with resources that don't really have a concept of "creation", like consul_keys or template_file, rather than things like aws_instance, etc. There really isn't a good way to make that EC2 instance and docker example work as long as we preserve Terraform's strict separation between plan and apply.

The workaround for this problem is to explicitly split the problem into two steps: make one Terraform config that creates the EC2 instance and produces the instance IP address as an output, publish the state from that configuration somewhere, and then use the terraform_remote_state resource to reference that from a separate downstream config that sets up the Docker resources.

Unfortunately if you follow my above advice, you will then run into the issue that I described in #2976: the terraform_remote_state resource also won't get instantiated during plan. That issue seems solvable, however; terraform_remote_state just reads some data from elsewhere and doesn't actually create anything, so it should be safe to refresh it during plan and get the data necessary to populate the provider configuration before the provider is instantiated.

bitglue commented 8 years ago

@apparentlymart: In that issue, you are describing a case "which is very handy when parts of the config are set dynamically from outside the configuration." And you propose to get around the issue by making those resources that represent the outside configuration "pre-refreshed", meaning you skip the create step and immediately go to the read step.

But I'm describing a case where parts of the config are set dynamically from inside the configuration. For example, I want to manipulate a Docker or Fleet service that exists on the EC2 instance I just made. Would pre-refreshing help in this case?

apparentlymart commented 8 years ago

@bitglue no, as my (rather verbose) comment described, having a "true" resource from one provider be used as input to another is not compatible with Terraform's model of separating plan from apply. The only way to solve that, without changing Terraform's architecture considerably, is to break the problem into two separate configurations and then use terraform_remote_state (which can potentially be pre-refreshed, but isn't yet) to pass resource data between the two.

bitglue commented 8 years ago

edited a bit to clarify the plan/apply stages

@apparentlymart I don't think it's impossible, because terraform-provider-fleet used to do it successfully. When the configuration is being planned for the first time, the provider doesn't actually need to do anything, which is why it can use a Fleet API object which fails all the time. None of the provider methods get called because it's not necessary: if there's no prior state, then the plan is trivially to create everything, and you don't need to call any provider methods to know that.

After the plan is made and it's time to apply, then the provider can be initialized after having created the EC2 instance, and now it has a proper endpoint and can actually work and create the fleet resources.

On subsequent planning runs, the public IP address of the EC2 instance is already known, so planning can happen as usual. Bonus points for refreshing the EC2 instance before initializing the fleet provider to do its refreshing.

I'd also think it's not the separation of plan and apply that's really the issue here, but more specifically refresh. You can always terraform plan -refresh=false and that will work even if the providers can't connect to anything, right? Assuming the state file is accurate, of course.

You can run into a little trouble if you delete the EC2 instance to which fleet was connecting but after the fleet resources have been created. Now plan can't work. But there are two ways to resolve that situation:

  1. Give the fleet provider the IP address of another EC2 instance in the same fleet cluster which hasn't been deleted.
  2. If the entire fleet cluster has been deleted and so there is no IP address you could give it, then all the fleet units have been deleted, too. So you can delete all the fleet units from the state file.
  3. Assuming you have an accurate (enough) state file, create the plan without refreshing and apply it. Then the missing EC2 instance will exist again and things will be back to normal.

Granted, these resolutions require a little hackish manual action, but it's not a situation I ever hit in practice. I'm sure with a little refinement it could be made less hackish.

apparentlymart commented 8 years ago

@bitglue it sounds like you're saying that in principle the providers could tolerate their configurations being incomplete until they are asked to do something. That is certainly theoretically true... while today most of them verify their config during instantiation and fail hard if the config is incomplete (as you saw the Docker provider do), they could potentially just let an incomplete config pass and then have it fail if any later operation tries to do any operations.

So one thing we could prototype is to revise how helper/schema handles provider configuration errors: in Configure, rather than returning an error when the ConfigureFunc returns an error, it could simply remember the error as internal state and return nil. The Apply and Refresh functions would then be responsible for checking for that saved error and returning it, so that Diff (which, as you noted, does not depend on the instantiated client) can complete successfully.

Having a prototype of that would allow us to try out the different cases and see what it fixes and when/how it fails. As you said, it should resolve the initial creation case because at that point the Refresh method won't be called. The case I'm less sure about -- which is admittedly an edge case -- is when a provider starts of with a totally literal configuration, and then at some later point it's changed to depend on the outcome of another resource; in that case Terraform will try to refresh the resources belonging to that provider, which will presumably then fail.

bitglue commented 8 years ago

@apparentlymart That's more or less my thinking, yeah. Though from what I've observed trying to get terraform-fleet-provider working again, in many circumstances the failure happens before ConfigureFunc is even called, so we might need a different approach.

apparentlymart commented 8 years ago

@bitglue I guess the schema validation will catch cases where a field is required but yet empty, so you're right that what I described won't entirely fix it unless we make all provider arguments optional and handle them being missing inside the ConfigureFunc.

mtekel commented 8 years ago

This is issue for postgresql provider as well, when you e.g. want to create AWS RDS instance and then use the port in the provider configuration. This fails, as the provider initializes before RDS instance is created, the port number is returned as "" and that doesn't convert to int:

  * provider.postgresql: cannot parse '' as int: strconv.ParseInt: parsing "": invalid syntax

TF code:

 provider "postgresql" {
   host = "${aws_db_instance.myDB.address}"
   port = "${aws_db_instance.myDB.port}"
   username = "${aws_db_instance.myDB.username}"
   password = "abc"
 }
mtekel commented 8 years ago

Interestingly, in your case TF graph does show the provider dependency, yet provider runs in parallel still. This is esp. problematic on destroy, as RDS instance gets destroyed before postgresql provider has chance to destroy the resources it has created, leaving "undestructable" state file behind. See https://github.com/hashicorp/terraform/issues/5340

BastienM commented 8 years ago

Hello there,

I got a similar problem with the Docker provider when used inside a Openstack instance (graph).

# main.tf
module "openstack" {
    source            = "./openstack"
    user-name         = "${var.openstack_user-name}"
    tenant-name       = "${var.openstack_tenant-name}"
    user-password     = "${var.openstack_user-password}"
    auth-url          = "${var.openstack_auth-url}"
    dc-region         = "${var.openstack_dc-region}"
    key-name          = "${var.openstack_key-name}"
    key-path          = "${var.openstack_key-path}"
    instance-flavor   = "${var.openstack_instance-flavor}"
    instance-os       = "${var.openstack_instance-os}"
}

module "docker" {
    source                  = "./docker"
    dockerhub-organization  = "${var.docker_dockerhub-organization}"
    instance-public-ip      = "${module.openstack.instance_public_ip}"
}
$ terraform plan

Error running plan: 1 error(s) occurred:
* Error initializing Docker client: invalid endpoint

Even thought I'm using :

provider "docker" {
    host = "${var.instance-public-ip}:2375/"
}

Logically it should wait for the instance to be up but sadly the provider is still initialized at the very beginning ...


So as a workaround I splitted my project into modules (module.openstack & module.docker) and then execute them one at a time with the -target parameter. Like this :

$ terraform apply -target=module.openstack && terraform apply -target=module.docker

It does the job but make the whole process quite annoying as we must always specify the modules in the good order for each steps (plan, apply, destroy ...).

So until we got an option such as depends_on, I don't see other ways to do. Is there an update on this matter ?

closconsultancy commented 8 years ago

I've submitted a similar question to the google group on this:

https://groups.google.com/forum/#!topic/terraform-tool/OhDdMrSoWK8

The workaround of specifying the modules separately didn't seem to work for me. Weirdly the docker provider was spinning up a second EC2 instance?! I've also noticed that terraform destroy didn't seem to take notice of the target module. See below:

#terraform destroy -target=module.aws

Do you really want to destroy?
  Terraform will delete the following infrastructure:
    module.aws

module.aws.aws_instance.my_ec2_instance: Refreshing state... (ID: i-69b034e5)
.....
Error refreshing state: 1 error(s) occurred:

* Error initializing Docker client: invalid endpoint
apparentlymart commented 8 years ago

Issue #4149 was my later proposal to alter Terraform's workflow to better support the situation of having a single Terraform config work at multiple levels of abstraction (a VM and the app running on it, as in this case).

It's not an easy fix but it essentially formalizes the use of -target to apply a config in multiple steps and uses Terraform's knowledge of the dependency graph to do it automatically.

dbcoliveira commented 8 years ago

@apparentlymart I don't fully understand why the dependency graph does actually count on the plan step. Intuitively I would guess that the dependency matrix would be applied at all stages (including instantiation on each step) . At least for all these cases would just avoid a bunch of problems. By other words the dependency graph should indicate if the instantiation should wait or not for certain resource. This kind of issues just smashes eventual capabilities of the tool. Its a bit silly that with a series of posix commands it can be fixed while programmatically (using TF logic) it can't.

CloudSurgeon commented 7 years ago

So, if I understand this correctly, there is no way for me to tell a provider to not configure until after certain resources have been configured. For example, this won't work because custom_provider will already be initialized before my_machine is built: provider "custom_provider" { url = "${aws_instance.my_machine.public_ip}" username = "admin" password = "password" }

The only option would be to run an apply with a -target option for my_machine first, the run the apply again after the dependency has been satisfied.

derFunk commented 7 years ago

+1 for depends_on for providers. I want to be able to depend on having all actions from another provider applied first.

My use case: I want to create another database and roles/schema inside this database in PostgreSQL.

To do so, I have to connect as the "root" user first, create the new role with appropriate permissions, and then connect again with the new user to create the database and schema in it.

So I need two providers with aliases, one with root and one for the application db. The application postgresql provider depends on the finished actions from the root postgresql provider.

My workaround currently is to comment out the second part first, apply the first part, then comment in the second part again to apply it as well. :(

# ====================================================================
# Execute first
# ====================================================================

provider "postgresql" {
  alias           = "root"
  host            = "${var.db_pg_application_host}"
  port            = "${var.db_pg_application_port}"
  username        = "root"
  password        = "${lookup(var.rds_root_pws, "application")}"
  database        = "postgres"
}

resource "postgresql_role" "application" {
  provider        = "postgresql.root"
  name            = "application"
  login           = true
  create_database = true
  password        = "${lookup(var.rds_user_pws, "application")}"
}

# ====================================================================
# Execute second
# ====================================================================

provider "postgresql" {
  alias           = "application"
  host            = "${var.db_pg_application_host}"
  port            = "${var.db_pg_application_port}"
  username        = "application"
  password        = "${lookup(var.rds_user_pws, "application")}"
  database        = ""
}

resource "postgresql_database" "application" {
  provider = "postgresql.application"
  name     = "application"
  owner = "${postgresql_role.application.name}"
}

resource "postgresql_schema" "myschema" {
  provider = "postgresql.application"
  name     = "myschema"
  owner      = "${postgresql_role.application.name}"

  policy {
    create = true
    usage  = true
    role   = "${postgresql_role.application.name}"
  }

  policy {
    create = true
    usage  = true
    role   = "root"
  }
}
apparentlymart commented 7 years ago

@derFunk your use case there (deferring a particular provider and its resources until after its dependencies are ready) is a big part of what #4149 is about. (Just mentioning this here to create the issue link, so I can find this again later!)

andylockran commented 6 years ago

I've managed to his this same issue with the postgres provider depending on an aws_db_instance outside my module. Is there a workaround available now?

jjlakis commented 6 years ago

Any progress here?

johnmarcou commented 6 years ago

Hi all.

I had a similar issue where I am using Terraform to: 1 - deploy an infrastructure (Kubernetes Typhoon) 2 - then to deploy resources on the fresh deployed infrastructure (Helm packages)

The helm provider was checking the connection file (kubeconfig) at terraform initialisation, so before the file was created itself (this occur during the step 1). So the helm resources creation was crashing for sure, because the provider was unable to contact the infrastructure.

A double terraform apply works though, but, here is how I manage to make it working with a single terraform apply, forcing the helm provider to wait for the infrastructure to be online, and exporting the infrastructure config file in a temporary local_file:

resource "local_file" "kubeconfig" {
  # HACK: depends_on for the helm provider
  # Passing provider configuration value via a local_file
  depends_on = ["module.typhoon"]
  content    = "${module.typhoon.kubeconfig}"
  filename   = "./terraform.tfstate.helmprovider.kubeconfig"
}

provider "helm" {
  kubernetes {
    # HACK: depends_on via an another resource
    # config_path = "${module.typhoon.kubeconfig}", but via the dependency
    config_path = "${local_file.kubeconfig.filename}"
  }
}

resource "helm_release" "openvmtools" {
  count      = "${var.enable_addons ? 1 : 0}"
   # HACK: when destroy, don't delete the resource dependency before the resource
  depends_on = ["module.typhoon"]
  name       = "openvmtools"
  namespace  = "kube-system"
  chart      = "${path.module}/addons/charts/open-vm-tools"
}

NB: This hack works because the provider expect a file path as config value.

Hope it can help.

ap1969 commented 6 years ago

Hi, Similar issue with Rancher. The rancher provider requires the URL for the rancher host, but if the plan is to create the rancher host and some other hosts to run the containerized services, it's then impossible to:

1) create the rancher host, and 2) have the other hosts register with the rancher hosts.

This is due to the rancher provider failing during the terraform plan step, as it can't reach the API.

baudday commented 6 years ago

Given this issue is almost three years old, are there plans to implement depends_on for providers?

apparentlymart commented 6 years ago

As I mentioned earlier in the thread, depends_on is not the missing feature here, and would not actually help.

Something like the proposal in #4149 is what will address the underlying problem here. The Terraform team at HashiCorp is planning to implement something like that (subject to further prototyping/design work, since we need to figure out the exact details of how it will work), but we must first complete the current work in progress to fix several issues and limitations in the configuration language, which will come in the next major release.

whazor commented 5 years ago

I think the issue is that the ConfigureFunc function call of a provider is performed at planning point where it should not. It should follow the graph, only once the resources are available then the provider should be configured.

syst0m commented 5 years ago

@johnmarcou Your solution doesn't work for me.

provider "kubernetes" {
  version     = ">= 1.4.0"
  config_path = "${local_file.kubeconfig.filename}"
}

resource "local_file" "kubeconfig" {
  depends_on = ["module.habito-eks"]
  content    = "${module.habito-eks.kubectl_config}"
  filename   = "${module.habito-eks.kubeconfig_filename}"
terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

data.template_file.map_accounts: Refreshing state...
data.template_file.map_roles: Refreshing state...
data.template_file.map_users[1]: Refreshing state...
data.template_file.map_users[0]: Refreshing state...
data.aws_region.current: Refreshing state...
data.aws_caller_identity.current: Refreshing state...
data.aws_availability_zones.available: Refreshing state...
data.aws_iam_policy_document.cluster_assume_role_policy: Refreshing state...
data.aws_iam_policy_document.workers_assume_role_policy: Refreshing state...
data.aws_ami.eks_worker: Refreshing state...

------------------------------------------------------------------------

Error: Error running plan: 1 error(s) occurred:

* module.habito-eks.provider.kubernetes: Failed to load config (; default context): invalid configuration: no configuration has been provided
chad-barensfeld-exa commented 5 years ago

Will the provider dependencies work in version 12? Getting around this by using -target for now.

ulm0 commented 5 years ago

@syst0m have you found any solution yet? i'm facing the same issue.

TylerWanner commented 5 years ago

trying to build GKE cluster on GCP and set that cluster as context for k8s provider, with several state dependencies going to just separate them and use terraform_remote_state, but it would be nice to have more workflow control, in this case specifically by provider. I find this generally to be my primary letdown/ source of frustration with terraform

syst0m commented 5 years ago

@syst0m have you found any solution yet? i'm facing the same issue.

@ulm0 My workaround is to comment out the provider before provisioning the cluster. Then comment it back in, after the cluster is provisioned.

athak commented 5 years ago

trying to build GKE cluster on GCP and set that cluster as context for k8s provider, with several state dependencies going to just separate them and use terraform_remote_state, but it would be nice to have more workflow control, in this case specifically by provider. I find this generally to be my primary letdown/ source of frustration with terraform

Just my 2c, this is my current solution. Keep GKE separate from the actual K8s config in different projects. If you absolutely want to do everything in one plan/apply, there is always terragrunt.

s-u-b-h-a-k-a-r commented 5 years ago

Hi all.

I had a similar issue where I am using Terraform to: 1 - deploy an infrastructure (Kubernetes Typhoon) 2 - then to deploy resources on the fresh deployed infrastructure (Helm packages)

The helm provider was checking the connection file (kubeconfig) at terraform initialisation, so before the file was created itself (this occur during the step 1). So the helm resources creation was crashing for sure, because the provider was unable to contact the infrastructure.

A double terraform apply works though, but, here is how I manage to make it working with a single terraform apply, forcing the helm provider to wait for the infrastructure to be online, and exporting the infrastructure config file in a temporary local_file:

resource "local_file" "kubeconfig" {
  # HACK: depends_on for the helm provider
  # Passing provider configuration value via a local_file
  depends_on = ["module.typhoon"]
  content    = "${module.typhoon.kubeconfig}"
  filename   = "./terraform.tfstate.helmprovider.kubeconfig"
}

provider "helm" {
  kubernetes {
    # HACK: depends_on via an another resource
    # config_path = "${module.typhoon.kubeconfig}", but via the dependency
    config_path = "${local_file.kubeconfig.filename}"
  }
}

resource "helm_release" "openvmtools" {
  count      = "${var.enable_addons ? 1 : 0}"
   # HACK: when destroy, don't delete the resource dependency before the resource
  depends_on = ["module.typhoon"]
  name       = "openvmtools"
  namespace  = "kube-system"
  chart      = "${path.module}/addons/charts/open-vm-tools"
}

NB: This hack works because the provider expect a file path as config value.

Hope it can help. But this will fail while destroying the cluster.

s-u-b-h-a-k-a-r commented 5 years ago

Hi all. I had a similar issue where I am using Terraform to: 1 - deploy an infrastructure (Kubernetes Typhoon) 2 - then to deploy resources on the fresh deployed infrastructure (Helm packages) The helm provider was checking the connection file (kubeconfig) at terraform initialisation, so before the file was created itself (this occur during the step 1). So the helm resources creation was crashing for sure, because the provider was unable to contact the infrastructure. A double terraform apply works though, but, here is how I manage to make it working with a single terraform apply, forcing the helm provider to wait for the infrastructure to be online, and exporting the infrastructure config file in a temporary local_file:

resource "local_file" "kubeconfig" {
  # HACK: depends_on for the helm provider
  # Passing provider configuration value via a local_file
  depends_on = ["module.typhoon"]
  content    = "${module.typhoon.kubeconfig}"
  filename   = "./terraform.tfstate.helmprovider.kubeconfig"
}

provider "helm" {
  kubernetes {
    # HACK: depends_on via an another resource
    # config_path = "${module.typhoon.kubeconfig}", but via the dependency
    config_path = "${local_file.kubeconfig.filename}"
  }
}

resource "helm_release" "openvmtools" {
  count      = "${var.enable_addons ? 1 : 0}"
   # HACK: when destroy, don't delete the resource dependency before the resource
  depends_on = ["module.typhoon"]
  name       = "openvmtools"
  namespace  = "kube-system"
  chart      = "${path.module}/addons/charts/open-vm-tools"
}

NB: This hack works because the provider expect a file path as config value. Hope it can help. But this will fail while destroying the cluster.

@tdmalone In my case since it was not able to find the kubeconfig while destroying I am getting the below error. Please let me know If any thing wrong I am doing here please.

Error: Error refreshing state: 1 error occurred:

resource "local_file" "kubeconfig" {
  depends_on = ["module.eks"]
  content    = "${module.eks.kubeconfig}"
  filename   = "./kubeconfig_demo-cloud"
}

provider "kubernetes" {
  load_config_file = true
  config_path      = "${local_file.kubeconfig.filename}"
}

resource "kubernetes_service_account" "tiller" {
  metadata {
    name      = "tiller"
    namespace = "kube-system"
  }

  automount_service_account_token = true
  depends_on                      = ["module.eks"]
}

resource "kubernetes_cluster_role_binding" "tiller" {
  metadata {
    name = "tiller"
  }

  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "cluster-admin"
  }

  subject {
    kind      = "ServiceAccount"
    name      = "tiller"
    namespace = "kube-system"
  }

  depends_on = [
    "kubernetes_service_account.tiller",
  ]
}

provider "helm" {
  install_tiller  = true
  tiller_image    = "gcr.io/kubernetes-helm/tiller:v2.14.0"
  service_account = "${kubernetes_service_account.tiller.metadata.0.name}"
  namespace       = "${kubernetes_service_account.tiller.metadata.0.namespace}"

  kubernetes {
    load_config_file = true
    config_path      = "${local_file.kubeconfig.filename}"
  }
}

data "helm_repository" "incubator" {
  name       = "incubator"
  url        = "https://kubernetes-charts-incubator.storage.googleapis.com"
  depends_on = ["kubernetes_service_account.tiller", "kubernetes_cluster_role_binding.tiller"]
}

data "helm_repository" "stable" {
  name       = "stable"
  url        = "https://kubernetes-charts.storage.googleapis.com/"
  depends_on = ["kubernetes_service_account.tiller", "kubernetes_cluster_role_binding.tiller"]
}

resource "helm_release" "mydatabase" {
  name  = "mydatabase"
  chart = "stable/mariadb"

  set {
    name  = "mariadbUser"
    value = "foo"
  }

  set {
    name  = "mardiadbPassword"
    value = "qux"
  }

  depends_on = [
    "kubernetes_cluster_role_binding.tiller",
  ]
}
nunofernandes commented 5 years ago

This feature would help a lot because the postgresql provider will try to connect to the database at the provider initialization step but the database may not be present at that time. I would like to have the possibility to add depends_on = ["rds.instance"] to the postgresql provider definition.

quinont commented 5 years ago

I have same problem here (https://github.com/terraform-providers/terraform-provider-influxdb/issues/9)

cofyc commented 5 years ago

hi, all

@johnmarcou 's solution works around the issue in the apply phase but will fail in refresh phrase because if a resource is used in provider argument, the argument will not be resolved and its default value is used (unintuitive...). However, in certain scenarios, you can work around this issue in current terraform (0.12.x). If your kubeconfig path can be pre-computed, you can use a pre-computed constant value in config_path, and defer helm provider initialization by other arguments. Here is an example:

data "local_file" "kubeconfig" {
  depends_on = [resources to generate the content of kubeconfig file]
  filename   = local.kubeconfig
}

# local file resource used to delay helm provider initialization
resource "local_file" "kubeconfig" {
  content    = data.local_file.kubeconfig.content
  filename   = local.kubeconfig
}

provider "helm" {
  alias          = "gke"
  insecure       = true
  install_tiller = false
  kubernetes {
    config_path = local.kubeconfig // must be a constant or value which can be computed with variables
    load_config_file = local_file.kubeconfig.filename != "" ? true : false // its default value is true, works in refresh state
  }
}

Hope this helps.

j94305 commented 5 years ago

I have mentioned this topic here as well: https://github.com/terraform-providers/terraform-provider-postgresql/issues/42

A mere depends_on in combination with dummy failures does not suffice because it would violate the principle of showing the user in the planning phase the kinds of modifications Terraform will make. Anything beyond a dependent provider whose pre-conditions for becoming active cannot be shown, however. Take the example of a Terraform script creating a database instance and then using the PostgreSQL provider to create databases, or of creating a cluster with Kubernetes in it and then setting ConfigMaps and Secrets. Therefore, this will - by the nature of this setup - have to be a multi-stage setup that involves a number of modules with not only dependant resources, but also dependant providers.

Missing pre-conditions do not always indicate absence. Take the example of running a script to build a VPC with all kinds of services, users and passwords, and a second script specifying one of these users and passwords to authenticate against PostgreSQL (which just happens to use the same identity provider). The database may already exist - it's just the credentials creating a dependency.

Trying to be consistent with the "plan" = "show all we're going to do" and "apply" = "actually do what we have to do", one would want to run terraform to apply any modules whose pre-conditions are all met already. Then, "terraform apply" could output a list of remaining modules whose pre-conditions are now met, and of those which still cannot be executed. A new "terraform plan" would display the additional actions that have to be taken in the new situation. The next "apply" run will get a bit further... and so on.

Unless the unwanted destruction of resources can be completely ruled out, Terraform plans should be checked prior to execution.

Both, the implicit dependency from resource references, and the explicit depends_on declaration will be needed.

This feature missing from Terraform prompts a number of ugly workarounds or splitting Terraform executions into separate sets of scripts, which potentially poses problems when updates occur.

As far as I can see, an easy way to implement this feature without major impact on the flow of control in Terraform could be to "suspend" providers whose pre-conditions are not all met at the time of planning. The "plan" and "apply" run would indicate any suspended providers, effectively asking the user to re-run "plan" and "apply". The depends_on declaration should be supported by providers as well, so hidden dependencies can be considered.

Right now, I am doing this with a script wrapper around terraform that checks previous outputs to determine which sets of .tf files to consider for this run. Unfortunately, terraform only supports one directory with scripts, so I have to move files according to executability into one "tf" directory and run them from there.

However, it would be nice if this conditionality could be more tightly integrated into the flow of Terraform as the separation into components does add complexity to updates and tear-downs. It looks like the conditionality of modules has been discussed before (e.g., here: https://github.com/hashicorp/terraform/issues/12906 ), but there was no resolution into a feature.

oogali commented 5 years ago

@j94305 I'm in a similar boat -- I have all of my .tf files in a base directory, then a directory per environment, which contains symbolic links to the files within base.

Particularly in database scenarios, I symlink everything except for the PostgreSQL provider and its resources, run the first plan + apply, then symlink in the remaining bits and do a final plan + apply.

I don't particularly enjoy doing this.

mleonhard commented 4 years ago

Without this feature, every deployment requires many separate Terraform configs: A. Cloud VMs B. Database servers C. Databases, database users, and applications that use them.

Without this feature, we must use three manual plan inspections:

  1. terraform plan A
  2. manually inspect plan and apply
  3. terraform plan B
  4. manually inspect plan and apply
  5. terraform plan C
  6. manually inspect plan and apply

Each inspection comes with the risk of missing something. With many inspections, the operator will get into the habit of quickly scanning them, increasing risk of missing an unintentional change and subsequent catastrophe.

Without this feature, we cannot use create_before_destroy to replace VMs without downtime. The only ways to use Terraform to perform such updates is to use "--target" (which is error-prone) or to split the configs per-VM.

If this feature were implemented, we could use a short deployment procedure:

  1. terraform plan
  2. manually inspect plan and apply

Put another way: I want Terraform to be smart enough to know that when docker_container.api_server10 depends on aws_instance.host10, and host10 is getting replaced, that it must replace api_server10. And if host10 doesn't exist, then it must assume that api_server10 also doesn't exist and must not try to refresh it.

zdelanofw commented 4 years ago

I'm in need of a depends_on for a provider as well in the case of AWS EKS and Kubernetes. I have the creation of a cluster and a node group set up, however, my Terraform configuration tries to configure my Kubernetes provider as soon as the EKS cluster is done, in parallel with the node group. This causes an error, telling me that there is no such host because the node group does not exist yet.

brpaz commented 4 years ago

This issue is a blocker for me (https://github.com/terraform-providers/terraform-provider-kubernetes/issues/708).

I want to provision a DigitalOcean Kubernetes cluster inside a module and then using the configurations as values in the "Kubernetes" provider.

dpoetzsch commented 4 years ago

We face the same problems as many users here. For example, when setting up app services and a keycloak authentication management, the keycloak provider depends on the app service to be already up and running.

Until this issue is handled by terraform we created a workaround, that basically allows to manage multiple dependent terraform projects by executing them sequentially: https://github.com/mondata-dev/terraform-stages

For now this is still work in progress, but we already use it in small production projects. Maybe it has the potential to help your processes as well.

galindro commented 4 years ago

I got an interesting situation that makes me wondering if the depends_on is really necessary or if this should be a problem to be solved by each provider.

I'm setting up a grafana AWS RDS instance and I need to create a grafana database a setup some user roles to use it. First, I tried to use a postgresql RDS variant with the following code (I'm omitting the vars declaration).

provider "aws" {
  version = "~> 2.30"
  region  = var.aws_region
}

provider "postgresql" {
  version  = "~> 1.4"
  host     = aws_db_instance.grafana.address
  port     = aws_db_instance.grafana.port
  database = var.app
  superuser       = false
  username        = var.app
  password        = var.app_db_password
  sslmode         = "verify-full"
  connect_timeout = 15
}

resource "aws_db_instance" "grafana" {
  name       = var.app
  identifier = local.full_app_name

  engine         = var.engine.name
  engine_version = var.engine.version
  instance_class = var.instance_class

  storage_type          = var.storage.type
  allocated_storage     = var.storage.allocated
  max_allocated_storage = var.storage.max_allocated
  iops                  = var.storage.iops

  username = var.admin_username
  password = var.admin_password

  db_subnet_group_name   = data.terraform_remote_state.databases_vpc.outputs.database_vpc.database_subnet_group
  vpc_security_group_ids = [aws_security_group.grafana.name]

  apply_immediately     = true

}

resource "postgresql_role" "grafana" {
  name  = var.app
  login = true
}

resource "postgresql_database" "grafana" {
  name  = var.app
  owner = postgresql_role.grafana.name
}

If I try to execute a plan using it, I got the following error:

$ terraform plan
Error: Error initializing PostgreSQL client: error detecting capabilities: error PostgreSQL version: dial tcp :0: connect: connection refused

  on providers.tf line 6, in provider "postgresql":
   6: provider "postgresql" {

Now, if I change the provider to mysql, my plan is executed with no errors.

provider "mysql" {
  version  = "~> 1.9"
  endpoint = "${aws_db_instance.grafana.address}:${aws_db_instance.grafana.port}"
  username = var.app
  password = var.app_db_password
  tls      = "true"
}

resource "mysql_database" "grafana" {
  name     = var.app
}

resource "mysql_user" "grafana" {
  user               = "grafana"
  host               = "%"
  plaintext_password = "mypwd"
}

resource "mysql_grant" "grafana" {
  user       = mysql_user.grafana.user
  host       = mysql_user.grafana.host
  database   = mysql_database.grafana.name
  privileges = ["SELECT", "UPDATE", "INSERT"]
}

This means that postgres provider isn't respecting the dependency graph built by terraform but mysql is. So, maybe, this is a provider problem, not a terraform "global" problem. @j94305 are you aware about this situation? Have you faced this before?

I would like to see the opinion of @apparentlymart on this post as well.

j94305 commented 4 years ago

If you define a properties of a resource (e.g., a database schema or a set of permissions) in one provider A and the resource (e.g., the instance the resource runs on) is created in another provider B, the question is what and when provider A should check.

Currently, the PostgreSQL provider will check at planning time, i.e., it will notice if the instance where the database is intended to run does not yet exist. Therefore, any properties of the instance can be taken into account for the database creation and setup (e.g., the max. number of concurrent connections or the sizing of buffer spaces in relation to the RAM available on the instance). This can be done at planning time.

I have never used the MySQL provider myself, but it seems this provider is doing the checks lazily at execution time. Of course, this is a possible approach, but anything affecting the nature of plans cannot be a consequence of anything the provider may detect when it starts verifications of the environment.

So, while providers certainly have a choice between planning time and execution time verifications and checks, only steps at planning time can influence the actual execution plan. My guess is that lazy providers checking preconditions only at execution time would work well (like in your MySQL example), except when there are more dependencies on the resources described for the database, i.e. when the dependency graph is not that simple and conditions checked by the provider in question will/may influence the plan.

Terraform has a strict separation between (first) "plan" and (then) "apply". Providers typically check their prerequisites in the supply phase because they need to obtain environment characteristics to further determine what needs to be done, i.e., what their plan would look like. The MySQL provider seems to have an opportunistic approach to such checks, i.e., it does not care. This may work well in some cases, however, in some it does not because portions of the plan depend on the result of checks.

Example: you set up a VPC and a jumphost. Only through the jumphost, it can be determined which services need to be set up. Unless you have the jumphost, you won't be able to determine the plan for what it available behind it - i.e., an execution plan requires the jump host. In essence, we end up having phases of infrastructure definitions, with resources in one phase being required to be able to plan the next phase. Currently, I don't think the plan/apply separation of Terraform can handle this just by using lazy providers.

galindro commented 4 years ago

Agree @j94305

J7mbo commented 4 years ago

If it wasn't already an explicit example here, I have a very simple use-case. I want to:

I can't tell the docker to wait for that server to exist.

This means I can't (easily, without 'hacks') use terraform to both provision a server which is supported by terraform, and then use the docker provider, which is also supposedly supported by terraform.

T00mm commented 4 years ago

Got the same problem with AWS creating a postgresql db but already tries to set the provider.

fimbulwint commented 4 years ago

Same problem. Ran into this on my first attempt at creating some simple production infrastructure with terraform.

I was trying to provision an EC2 instance on AWS and then start a Docker container on it, which I imagine is not such an uncommon use case. Now it seems I'll have to jump through some hacky hoops to get there, which is kind of a letdown :(

J7mbo commented 4 years ago

@fimbulwint For now don't bother using docker at all on terraform. I added docker-machine calls which sets up docker on the machines provisioned by terraform, using the ip4 address and cert, so all works good that way.

stephencweiss commented 4 years ago

I read through this and several other threads and I think my use case is not (yet) supported, but that folks might have a workaround. I'm pretty new to terraform, so would love any guidance the community might have.

I have existing infrastructure in multiple AWS regions that I'm trying to tie together.

For example: image

The goal of the terraform is:

  1. Create an SNS topic and cloud watch event monitoring the s3 bucket (us-west)
  2. Create a lambda that subscribes to the SNS topic (us-east)

I thought I'd be able to do this:

provider "aws" {
  profile = "default"
  region  = "us-east-1"
}
provider "aws" {
  profile = "default"
  region  = "us-west-1"
  alias = "west"
}
resource "aws_sns_topic" "s3-file-create" {
  name = "s3-file-create-topic"
}
resource "aws_lambda_function" "s3-file-replicate" {
  /*...*/
  depends_on = [aws_sns_topic.s3-file-create, west]
}

where I'm saying that the lambda both needs the sns topic to exist and that it be created in the west... but the west in particular doesn't seem to work...

Thoughts?

apparentlymart commented 4 years ago

Hi @stephenchu,

If I'm understanding what you are asking correctly, you can specify that the aws_lambda_function resource should be handled by your "west" provider configuration using the provider meta-argument:

resource "aws_lambda_function" "s3-file-replicate" {
  # ...
  provider = aws.west
  depends_on = [aws_sns_topic.s3-file-create]
}

If that isn't what you were asking, I'd invite you to ask this question in the community forum where I can hopefully help you without creating too much notification noise for the folks who are following this issue. Thanks!