gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
8.01k stars 971 forks source link

Feature Request: Shared and overridable variables (globals?) #814

Open apottere opened 5 years ago

apottere commented 5 years ago

Background

For most of our terraform projects, we prefer to keep the remote state in the same region as the resources that are being created. This allows us to manage resources in other regions when S3/DynamoDB is down in a particular region.

We also have a pretty standard directory structure for services:

terraform
├── terragrunt.hcl
└── <account>
    └── <region>
        └── <environment>
            └── terrgrunt.hcl

With the addition of terraform functions, it was simple to get the region from the directory name and use that in the backend configuration.

However, we also have a separate structure containing all of the infrastructure that doesn't belong to a single application - and this structure is much less predictable. Some resources have multiple directories before the region, and some don't have a region directory at all. Since many of these resources don't (and shouldn't) conform to the structure above, it would be very convenient to specify a region variable in the parent terragrunt.hcl that can be overridden in the child terragrunt.hcl.

There are certainly ways to work around this right now, but what I would love to be able to do is:

globals {
  region = "us-east-1"
}
terraform {
  source = ...
}
include {
  path = find_in_parent_folders()
}

Proposed Solution

I would like to implement something similar to locals, except keys in the block would be merged with the parent config (with child definitions overwriting parent definitions). Thus, the config resolution order would be:

  1. locals
  2. include
  3. globals
  4. Everything else

This would accomplish three things:

  1. Defining variables in parent config that can be used in every child config
  2. Defining variables that the child configs can use to affect the behavior of the parent config
  3. Allowing for intermediate variables that would otherwise go in locals but need the relative path to the parent config

Questions

  1. Is this something that seems like a good idea? Should I create a PR for this?
  2. Is globals the correct name, or is there something better?

This is a result of https://github.com/gruntwork-io/terragrunt/pull/802#discussion_r310202282 and possibly related to #132.

yorinasub17 commented 5 years ago

Since I suggested this, I am on board with this idea! Although, I am curious to hear others' opinions about separating globals (temporary common vars that are auto merged across include) from locals (temporary common vars that are scoped to the current config).

I am a bit confused by what you mean by the config resolution order in the proposed config. If globals works exactly like locals, then the access to it should be explicit (e.g global.region). Can you elaborate a bit more on this?

apottere commented 5 years ago

By the config resolution order, I mean the order that the blocks in the terragrunt HCL file are are evaluated. I was assuming the include block depends on locals being interpolated (which I didn't verify), but regardless globals would need to depend on the include block being evaluated if globals are merged across includes.

Edit: Yes, access would be explicit (e.g. global.region)

apottere commented 5 years ago

If you wanted to make it even more complicated, in reality locals could depend on globals as long as those locals aren't used in the include block. I'm not sure if that confusion is worth it, though.

apottere commented 5 years ago

After some more thought, I have an alternate proposal:

Both globals and locals can reference each other, as long as they don't cause a dependency loop. A local referenced from a global would refer to the local from the file that the global expression is in - i.e. if a child overrides a global with an expression that includes a local, that would be the child's local.

This strategy would accomplish a few things:

  1. It would remove the requirement of only being able to use the "include" functions (relative path, etc) in globals, simplifying the semantics of global and local ("should this variable be shared or not shared?")
  2. It would disambiguate the "include" functions - right now they're available everywhere, but their behavior changes depending on which part of the config they're called from.
  3. It would make the config a lot more terraform-like, which would probably be more intuitive for new terragrunt users coming from terraform. Instead of defined resolution phases, any variable can be used anywhere as long as it doesn't cause a loop.

As far as implementation goes, looking through the terraform code it doesn't seem incredibly complicated to copy the variable discovery, graph creation, graph verification, and evaluation. Some code from the terraform project could even be imported and re-used.

Thoughts?

jfunnell commented 5 years ago

One minor issue I have with the current implementation of locals is that I cannot use the path_relative_to_include in a local definition. This limits a chunk of my use cases and I guess the above proposal fixes that.. But I'm not entirely sure why it's necessary to change the well known and documented path-discovery function calls to an include group? Seems like a huge change.

As for the original proposal, a lot of my locals are being passed down to the underlying child modules using inputs, so yeah I would say this seems like a good idea in general.

yorinasub17 commented 5 years ago

One minor issue I have with the current implementation of locals is that I cannot use the path_relative_to_include in a local definition

Ah that is a good point and an oversight on my part. The original proposal for globals would indeed address that!

include is a new variable "group" that gets set when the include block is evaluated

While I understand where you are going with this, I think this could overcomplicate the implementation. Right now the path functions are implemented as simple go functions, which makes them fairly easy to maintain. Having to fold them into the include block requires adding additional custom parsing logic while the include is being created, that then needs to be passed down throughout the code to each execution contexts. I can see this being unintuitive and a pain to maintain.

I'd say that for now, start by implementing the basic version of globals, which are locals that get auto merged across include blocks. Since locals are relatively new, I'd wait for it to be tested in the wild for a little bit longer before we start optimizing the implementation with complex execution orders that allow interdependencies between locals, include, and globals. My hunch is that the basic globals will cover 90% of use cases that require referencing the include paths.

yorinasub17 commented 5 years ago

And it might turn out that we don't need locals once globals is implemented.

yorinasub17 commented 5 years ago

Thanks for going through this thought exercise by the way! Very helpful in understanding why you need it, and your thought process on approaching the implementation!

brikis98 commented 5 years ago

What's the difference between globals and inputs = { ... } in the root config?

yorinasub17 commented 5 years ago

globals can be referenced in the children inputs. E.g if you had:

globals {
  region = "us-west-1"
}

You can have a child config that is:

include {
  path = find_in_parent_folders()
}

inputs = {
  s3_url = "com.amazonaws.${global.region}.s3"
}
brikis98 commented 5 years ago

Would that be different than adding support for inputs referencing other inputs? E.g., s3_url = "com.amazonaws.${inputs.region}.s3"?

Or how does this compare with a get_input([<FILE>], <INPUT>) helper? We know we want to add get_output, and it seems like get_input is a natural analog. You can either specify an explicit a FILE to read inputs from, or if you omit it, it uses the current inputs, merged with parent inputs.

yorinasub17 commented 5 years ago

I think get_input makes sense, but independent of globals. In fact, we should probably also have get_local and get_global, or even a full import_input/local/global function that imports the full map to make it less verbose.

The benefit of globals is: What if you don't want it to be passed in as an input to terraform? globals gives you a space to specify variables that you don't intend to pass to any module, but use to build up the other inputs. It seems clean to have these temporary variables built up there instead of as inputs which will be converted to TF_VAR_xyz env vars.

brikis98 commented 5 years ago

Oh, I was thinking get_input could read values out of other .hcl or even .yml files. So then you could have vars.hcl and put a bunch of your locals/globals/etc in there. Perhaps get_input makes it sound like it only reads the inputs = { ... } block, but that's not really the goal.

BTW, I'm not 100% set on any approach here and am just thinking out loud. My main goal is to find the minimal set of tools we could expose for defining, sharing, reusing, and passing variables around. So the question is whether locals / globals provides a more flexible toolset than get_input (and get_output)? Or if some other option is even more effective? Might be worth listing out all use cases we want to support (e.g., local var usage, sharing vars from parent, getting vars from other modules, getting vars from var files, etc).

apottere commented 5 years ago

Sorry, I was away for a few days and didn't get a chance to respond.


Re: replacing the path functions with static variables:

On the contrary, I think using functions for this is actually the unintuitive way to do it. In terraform, functions are available and idempotent no matter where they're used. If globals are added, the behavior of these functions would differ between usage in the local and global blocks. In local, they would resolve to ., and in global they would be a real path. To me, that doesn't seem like an ideal solution. In terraform, dependencies are expressed by accessing properties of other resources, not with function calls. Assuming terragrunt users are terraform users first, this is probably the more intuitive way to do it.

We can still forego complicated dependency logic for now by just erroring if the include variables (or functions) are used in a local variable.


Re: globals replacing locals entirely:

It would definitely be possible to replace globals with locals, but I see two issues with that (assuming simple dependency logic):

  1. You wouldn't be able to use any variable in an include block, since that block would need to be interpreted before globals could be. Right now, you could use a local.
  2. Encapsulation is generally accepted as a good practice, i.e. you shouldn't expose information that you don't need to. This keeps your code simple and reduces the chance of bugs. Removal of locals would cause every variable you use in your parent config to be available from your child config, and you could accidentally override those variables from the child if you re-use the name accidentally. If something doesn't need to be global, it should be kept local.

Re: get_input:

What's the use-case for this? It seems to me all of those cases would be covered by good local and global system. Would this be a way to share variables between different terragrunt projects?

apottere commented 5 years ago

I'm definitely willing to whip up a PoC even if it needs significant changes before being merged.

yorinasub17 commented 5 years ago

On the contrary, I think using functions for this is actually the unintuitive way to do it.

I wasn't talking about the end user UX. I was talking about the code that actually implements it. So it isn't about whether or not it is intuitive to users of terragrunt or terraform, but rather whether or not it can be implemented in a way that is easy to maintain and has few bugs. In other words, will the implementation be intuitive to the golang developer?

This is why I mentioned this should be addressed independent of the globals implementation, because I expect globals would be fairly easy to implement (and thus can be released sooner), while the path parsing stuff might be difficult and tying the globals implementation to that might delay the implementation of the feature.

What's the use-case for this?

The main use case for this is if you want to fragment the globals across multiple HCL files that are nested across the tree. Right now terragrunt doesn't allow more than one include, so there is only one level of depth even for globals. get_input is a way to import and reference multiple HCL files, replacing the yaml based system of merging multiple variable files.

apottere commented 5 years ago

I wasn't talking about the end user UX. I was talking about the code that actually implements it.

Ah, sorry, I completely misunderstood that. I think it should be just as easy to implement - just remove those functions and add a variable group to the evaluation context when evaluating everything but locals and include. It definitely can be separated from the initial globals implementation if you think that would be better.

The main use case for this is if you want to fragment the globals across multiple HCL files that are nested across the tree.

Cool, thanks for the explanation. Is multi-include being considered, and if so would get_input just be a quick stopgap until that gets figured out? Or have you decided that multi-include isn't a good idea?

yorinasub17 commented 5 years ago

add a variable group

I think you are underestimating the complexity of interjecting something to the parsing pipeline to add this. But who knows! I might be missing some clever way to do this.

It definitely can be separated from the initial globals implementation if you think that would be better.

Yes let's keep that separate.

Is multi-include being considered, and if so would get_input just be a quick stopgap until that gets figured out? Or have you decided that multi-include isn't a good idea?

You can see https://github.com/gruntwork-io/terragrunt/issues/303 for past thoughts on this, as well as some current thoughts. In general, multi-include is not something we want to add and if possible would like to use other primitives to achieve the same effect. get_input is one of those proposed primitives.

yorinasub17 commented 5 years ago

By the way, you might want to hold off on implementation until I am done with https://github.com/gruntwork-io/terragrunt/pull/828. globals will almost certainly conflict with that feature. Also, the new partial parsing pipeline might help make the include path variable context implementation easier.

apottere commented 5 years ago

I think you are underestimating the complexity of interjecting something to the parsing pipeline to add this.

I don't think so, unless I'm missing something. It should be just as easy as adding locals to the evaluation context:

    if locals != nil {
        ctx.Variables = map[string]cty.Value{"local": *locals}
    }

In fact this function definition already has the value of the include block available to it:

func CreateTerragruntEvalContext(
    filename string,
    terragruntOptions *options.TerragruntOptions,
    include *IncludeConfig,
    locals *cty.Value,
) *hcl.EvalContext {

By the way, you might want to hold off on implementation until I am done with #828.

Thanks for the heads up, it definitely looks that way. I'll do some initial exploration but I won't spend too much time on it till that's complete.

yorinasub17 commented 5 years ago

Ok #828 is now merged. Also, I've refactored the eval context parameters into a struct EvalContextExtensions which should make it easier to pass through the include vars all the way down.

So @apottere , feel free to start working on this! Be sure to think through where globals fits in the parsing pipeline and update the docs accordingly. Here are two places where we document the parsing pipeline:

apottere commented 5 years ago

Sounds good, thanks!

apottere commented 5 years ago

@yorinasub17 I got really busy, but I finally had some time to put together a POC of this in #858. Let me know what you think.

I decided to go for the more complicated approach because I was pretty sure I could make it work and it was more fun to try and figure out. I can definitely implement a simpler approach if you want.

sebastianmacarescu commented 4 years ago

Any news on this one?

7nwwrkdnakht3 commented 2 years ago

Wanted to ask for any update on this, thanks!

remidebette commented 1 year ago

This would be nice to have

propyless commented 1 year ago

I do something a little like this, but with recursion hell. Not sure how interested people would be to use it.

terragrunt.hcl ( I don't use inputs, in terragrunt.hcl I generate a vars file since I hit the system limit for commandline length )

# This will load all the locals from root.hcl and inject them here.
include "root" {
  path   = find_in_parent_folders("root.hcl")
  expose = true
}

locals {
  merged_locals = include.root.locals.merged_locals
}

terraform {
  source = include.root.locals.merged_locals.MODULE_PATH
}

root.hcl - I read the top most level here

vars = read_terragrunt_config("${local.og_tg_dir}/instance.hcl")

In instance.hcl I read and merge all lower levels

locals {
  merged_locals = merge(
    local.global_vars.locals,
    local.environment_vars.locals,
    local.subscription_vars.locals,
    local.region_vars.locals,
    local.service_vars.locals,
  )
  og_tg_dir = get_original_terragrunt_dir()

  service_vars      = read_terragrunt_config("${local.og_tg_dir}/../service.hcl")
  region_vars       = read_terragrunt_config("${local.og_tg_dir}/../../region.hcl")
  subscription_vars = read_terragrunt_config("${local.og_tg_dir}/../../../subscription.hcl")
  environment_vars  = read_terragrunt_config("${local.og_tg_dir}/../../../../environment.hcl")
  global_vars       = read_terragrunt_config("${local.og_tg_dir}/../../../../../globals.hcl")
} 

Note: This is just snippets so it doesn't tell the whole story or describe the entire solution, but hopefully you get the gist of it.

And I do the same for service, region, sub, env. While also in each lower namespace I subtract those files that are higher up. In service.hcl I do not load instance.hcl for example.

I do this because I want to gain access to all the variables and use them in a higher namespace. It takes like 40 seconds to parse everything.

An alternative could have been to merge it all the end instead of this in every single level. Then it goes much much faster, but I as I want to have all the vars from below.. its a bit of a tradeoff. But honestly, is speed such a big deal for infra if I'm going to wait for an azure virtual gateway to come up in 25 minutes.. :zzz:

I use the same approach for modules, so that I can override terraform module versions per instance, service, region, sub, env.. etc..

I've done this for quite a few of my projects.. if you know a better way or if you think I'm crazy for doing it this way, I'm always open to learn.

edit: I guess its a bit like phatcher gave an example on in the mentioned issue here: https://github.com/gruntwork-io/terragrunt/issues/1942

gw0 commented 1 year ago

I came up with something very similar, but instead of using locals {} am using inputs = {} everywhere. This causes a bit less cognitive load and makes specifying inputs is a bit less verbose. The root module then merges all inputs from all layers.

Example of root module with this and a few more ideas:

locals {
  path_parts         = concat(split("/", path_relative_to_include()), ["", "", "", ""])
  common_config      = read_terragrunt_config("${get_parent_terragrunt_dir()}/common_config.hcl", { inputs = {} })
  environment_config = read_terragrunt_config("${get_parent_terragrunt_dir()}/${local.path_parts[0]}/environment_config.hcl", { inputs = {} })
  ...
  inputs = merge(
    {
      terraform_tag    = path_relative_to_include()
      environment_name = local.path_parts[0]
      ....
    },
    local.common_config.inputs,
    local.environment_config.inputs,
    ...
  )
}
inputs = local.inputs
withnale commented 11 months ago

@gw0 - is the above possible using yaml files instead of hcl files? We've already got the necessary configuration in yaml files which is already being used by other things? Making a hcl specific config file just for terraform would be a pain

gw0 commented 11 months ago

@gw0 - is the above possible using yaml files instead of hcl files? We've already got the necessary configuration in yaml files which is already being used by other things? Making a hcl specific config file just for terraform would be a pain

Sure, a similar approach could be used with YAML. For YAML there is yamldecode() which is much more straightforward to use (kind-of-related https://terragrunt.gruntwork.io/docs/reference/built-in-functions/#sops_decrypt_file).

propyless commented 11 months ago

I came up with something very similar, but instead of using locals {} am using inputs = {} everywhere. This causes a bit less cognitive load and makes specifying inputs is a bit less verbose. The root module then merges all inputs from all layers.

Example of root module with this and a few more ideas:

locals {
  path_parts         = concat(split("/", path_relative_to_include()), ["", "", "", ""])
  common_config      = read_terragrunt_config("${get_parent_terragrunt_dir()}/common_config.hcl", { inputs = {} })
  environment_config = read_terragrunt_config("${get_parent_terragrunt_dir()}/${local.path_parts[0]}/environment_config.hcl", { inputs = {} })
  ...
  inputs = merge(
    {
      terraform_tag    = path_relative_to_include()
      environment_name = local.path_parts[0]
      ....
    },
    local.common_config.inputs,
    local.environment_config.inputs,
    ...
  )
}
inputs = local.inputs

The reason I don't use input is because of this issue: https://github.com/gruntwork-io/terragrunt/issues/2132

gw0 commented 11 months ago

The reason I don't use input is because of this issue: #2132

Instead of cramming everything into inputs (meaning env vars), have you considered:

propyless commented 11 months ago

I don't cram everything into inputs :) hence I said The reason I don't use input is because of this issue: https://github.com/gruntwork-io/terragrunt/issues/2132

The ticket I referered to has a solution proposed by denis which I use.