hashicorp / terraform-provider-template

Terraform template provider
https://www.terraform.io/docs/providers/template/
Mozilla Public License 2.0
130 stars 89 forks source link

Directory management with the template or other provider #39

Open greg-szabo opened 5 years ago

greg-szabo commented 5 years ago

Terraform Version

0.12-dev

Affected Resource(s)

template provider, template_dir data source

Expected behavior

This is a proposal to improve the way user files are managed in Terraform.

One user story

User story: "I would like to deploy static websites to S3, using only Terraform." Problem description:

  1. Fact nr.1: the current aws_s3_bucket_object is capable of uploading a well-defined file to S3
  2. Fact nr.2: Using count (or the soon-to-be for-each) it is possible to create a set of aws_s3_bucket_objects to upload a set of files. This requires a list-type variable that can be iterated.
  3. Problem nr.1: there is no Terraform-native way to read a list of files from a directory. (As mentioned in hashicorp/terraform/issues/16697 and worked around with https://github.com/saymedia/terraform-s3-dir)
  4. Problem nr.2: lists are not supported as Terraform input parameters (at least -var didn't support them a few versions ago).

So if I have a directory with a bunch of files (not exactly identified by name for Terraform), I have no way of iterating through them. The one exception is, if they are templates and I need them rendered on the same machine. In that case the template_dir resource in the template provider will do it for me.

Generic user story

It seems that there are a few cases where people want to iterate through a directory. An additional one is hashicorp/terraform/issues/6065 where the user wants to use the file provisioner to keep a directory up-to-date but the file provisioner doesn't keep track of changes. It seems that a provider iterating through directories would be useful.

Originally, I thought this should be a custom directory provider and just write it for my use-case. Then I thought why not try to do it in a Terraform-native-way and try to integrate with the current providers. The closest provider that iterates through directories is the already mentioned template_dir resource. Unfortunately, if I generally want to read the folders but not write them, I'm out of luck. As it was raised in #34, there is no template_dir data source, only resource.

Proposed solution nr.1 (already done)

New data source: template_dir. This can be considered a non-breaking improvement since it didn't exist before. Based on how the template_dir resource is defined (and trying to keep in line with it), here's how it would look like:

data "template_dir" "solution1" {
  source_dir = "website"   # Same as in the resource
  exclude = "\.tmp$"       # Improvement: regular expression applied to the relative path
  vars {                   # Same as in the resource
    "name": "mywebsite"
  }
  render: true             # Improvement: details below (consider files as templates)
}

Output variable rendered (same as in the resource) would be a map where the keys are the file names and optionally the values are the rendered files (in case of templates) or empty if render = false. (You can create a list of files using keys(data.template_dir.solution1.rendered) and iterate through it, for example to upload them to S3.)

So, the idea is to stuff this functionality into the template provider, even though the template provider is a bit more than just a file-system reader.

I already programmed this solution for 0.11 because I needed it for my use-case. Unfortunately, because of a bug in 0.11 (hashicorp/terraform/issues/19258), this will only work correctly in 0.12. I'm in the process of upgrading it and I'll share afterwards.

Pro: in-line with current expectations on how Terraform works; non-breaking change Con: It's abusing the template provider to add functionality.

Proposed solution nr.2

New provider: io or built-in functions. It seems that generic file reads (and writes) are in demand. In hashicorp/terraform/issues/16697 there is a discussion about a set of builtin functions to manage files in a directory. It seems that those discussions didn't amount to anything, but they might still be good ideas.

So the request: is it possible to get a bit of brainstorming together for an overarching solution? Is it better to use providers or builtin functions? I'm willing to look into any of it that makes sense because I would rather have a terraform-native solution (eventually) to this problem, than a custom provider.

Until a better idea emerges, I'll keep working on solution nr.1.

jakexks commented 5 years ago

In case anyone comes across this thread and is desperate for a way to iterate a directory I threw together https://github.com/jakexks/terraform-provider-glob

greg-szabo commented 5 years ago

(edit:) I should've started with this: thanks for your contribution, I'm sure that it will help a lot of people struggling with the problem.

One of the reasons why this didn't really get far is the problem that the terraform state file can get very big if we list a bunch of file contents in it. I'm guessing it's the case with your solution too, based on the glob_contents_list property.

Did you try it with, say a 5MB list of files? How big did the terraform state file get?

jakexks commented 5 years ago

It'll certainly get massive, and probably won't work well (or at all) for binary files! Definitely a "use at your own risk" thing.

My use case is terraform enterprise, as runs happen in a container that throws away any local files. A module I wanted to use uses template_dir to write some templates to disk that I expect to be there next run, but aren't. Hashicorp support recommended refactoring to outputs 🤷‍♂

I imagine you could do something with count = "length()" and the file() interpolation function to avoid storing too much in the state file, but you'd have to stay aware of the issue.

apparentlymart commented 5 years ago

Sorry for the long silence here, @greg-szabo and everyone else. Thanks for documenting this use-case!

This use-case is close to my heart because in my previous job (before I was on the Terraform team at HashiCorp) I had this very same need and used it as the motivation for a proposal I opened at the time, in hashicorp/terraform#3310.

We've been gradually making incremental progress towards a different approach on this than I originally made there, using a combination of different smaller features. The Terraform 0.12 release has laid some groundwork, but there are still some parts to fill in. Once we get there, the configuration might look something like this, assuming the goal is to just upload files from disk as-is, without any local processing:

locals {
  source_dir = "${path.module}/htdocs"
  source_files = {
    for fn in filelist("${local.source_dir}/**")) :
    pathrel(fn, local.source_dir) => fn
  }
}

resource "aws_s3_bucket_object" "file" {
  for_each = local.source_files

  bucket = var.s3_bucket_name
  key    = each.key
  source = each.value
}

This is a combination of the for_each feature discussed in hashicorp/terraform#17179 and the filelist and pathrel functions we were discussing in hashicorp/terraform#16697. The first of these needed some internal redesign to support, which we've completed in master already so work on this should be unblocked after the 0.12.0 release (though given that we've been focused on config language improvements for so long, we are likely to need to take a break catch up on some other Terraform subsystems for a while first). I'd also asked that we hold on implementing new built-in functions until after 0.12.0, but they are a lot more straightforward than for_each so hopefully won't take long to get done.

Combining this with template rendering would of course make life a little more complex, since a subset of the files would need to have their content passed through templatefile rather than just read directly from disk in aws_s3_bucket_object, but should be doable with a suitable file naming convention to allow recognizing the ones that need to be rendered as templates and dealing with them separately:

locals {
  source_dir = "${path.module}/htdocs"
  source_files = {
    for fn in filelist("${local.source_dir}/**")) :
    pathrel(fn, local.source_dir) => fn
  }
  template_files = {
    for k, fn in local.source_files : k => fn
    if length(fn) >= 5 && substr(fn, length(fn)-5) == ".tmpl"])
  }
  rendered_files = {
    for k, fn in local.template_files :
    k => templatefile(fn, local.template_vars)
  }

  template_vars = { /* whatever variables the templates expect */ }
}

resource "aws_s3_bucket_object" "file" {
  for_each = local.source_files

  bucket  = var.s3_bucket_name
  key     = each.key
  source  = contains(keys(rendered_files), k) ? null : each.value
  content = contains(keys(rendered_files), k) : rendered_files[k] : null
}

The templatefile function in Terraform 0.12 is aiming to supersede the template_file data source by allowing templates from files to be rendered directly where they are needed. template_dir was created primarily as a workaround to allow inserting dynamic data from Terraform into a zip file before uploading it to AWS Lambda, but environment variables are now a better option and so its original purpose is no longer relevant either (though I know that some folks have found other uses for it). The main idea here is to make templates first-class in the Terraform language, because manipulation of template strings is such a common operation when combining different components into a working system.

caquino commented 5 years ago

Another usecase for a filelist() function is the kubernetes_config_map resource.

Currently if you want to add multiple files to a configMap you need to do so manually, which is error prone and time consuming.

Example:

resource "kubernetes_config_map" "nginx-cfgmap" {
  metadata {
    name = "nginx-cfgmap"
  }

  data = {
    "nginx.conf" = file("configs/nginx/nginx.conf")
    "fastcgi.conf" = file("configs/nginx/fastcgi.conf")
    ...
  }
}

How it could look like:

resource "kubernetes_config_map" "nginx-cfgmap" {
  metadata {
    name = "nginx-cfgmap"
  }

  data = {
    for fn in filelist("${path.root}/configs/nginx/**") :
    pathrel(fn, "${path.root}/configs/nginx/") => file(fn)
  }
}