gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
8.05k stars 977 forks source link

[Feature request] Fetching data from remote state #1150

Open kromol opened 4 years ago

kromol commented 4 years ago

The idea is to be able to retrieve data from remote state without creating separate terraform file(s). Use case is pretty simple - sometimes module's input value is the output of another external terragrunt/terraform. An example could be creating Lambda function that must be placed into VPC. Subnets and security group are part of the output of another (external) terraform module. In order to solve it we need to create a new terragrunt module that will basically consist of data.tf file to fetch remote state and then we need to reference that new terragrunt module in lambda module:

|--lambda
|   |--terragrunt.hcl
|-remote_state
|   |--terragrunt.hcl
...

Then in lambda/terragrunt.hcl it will look like this:

...
dependency "remote_state" {
  config_path = "../remote_state"
}

inputs = {
  ...
  security_group_id = dependency.remote_state.outputs.security_group_id
}

It would be great to have a dedicated block to fetch data from remote state and use it later, similarly how it's done with dependency block. Or there is another "terragrunt" way of handling it?

brikis98 commented 4 years ago

The dependency block calls terraform output on whatever dependency you're using, and terraform output is reading from state, so it's already very similar to what you're describing.

To read from remote state directly, (a) you'd have to specify exactly where the state file is stored, (b) authenticate to it, and (c) Terragrunt would have to duplicate Terraform's state reading logic, which would be complicated and brittle.

What advantage would this offer over the existing dependency construct?

kromol commented 4 years ago

The advantage is in eliminating the need to create another terragrunt module to read remote state and make it a dependency to be able to read it. But instead do it in scope of the module that actually needs those values from the remote state. I mean something like this:

remote_state_data "main_vpc" {
  // for example it could inherit configuration from `remote_state` by default with an option to override any field.
  // assuming it uses s3 backend
  config = {
    key = "/path/to/remote/state.tfstate"
  }
  output_key = public_subnets // `public_subnets` is output key name
}

inputs = {
  subnets = remote_state_data.main_vpc.value
}

With existing dependency block we need to create a separate terragrunt.hcl and a couple of tf files to read remote state and then reference that terragrunt.hcl as a dependency in the module that needs data.

kromol commented 4 years ago

To read from remote state directly, (a) you'd have to specify exactly where the state file is stored, (b) authenticate to it, and (c) Terragrunt would have to duplicate Terraform's state reading logic, which would be complicated and brittle.

I imagined it could be done in a similar way like dependency block does it - call terraform output to let terraform read from state. Assuming that remote_state_data config (from the example above) is the same like config for terraform_remote_state my understanding was that it should be possible to generate remote_state.tf file that looks like this:

data "terraform_remote_state" "my_remote_state" {
  backend = "s3"
  config = {
    bucket   = "my-bucket"
    encrypt  = true 
    key      = "path/to/terraform.tfstate"
    role_arn = "role if needed"
    region   = "us-east-1"
  }
}

output "output_name" {
  value = data.terraform_remote_state.my_remote_state.outputs.output_name
}

And then only run terraform commands in the background to get values.

This is basically what I am doing manually in order to be able to read from remote state and pass that data into the module as an input. I thought it's possible to automate it.

brikis98 commented 4 years ago

The advantage is in eliminating the need to create another terragrunt module to read remote state and make it a dependency to be able to read it.

But if this is remote state that we're reading, then there is already some module that wrote it, right? Why would you need to write another one?

kromol commented 4 years ago

It's for the use case when the module that wrote to the remote state is "external".

For example we have platform team which is responsible for setting up VPC. Then we need to get subnets or security group of that VPC from remote state, but the module that wrote to the remote state is not part of the project.

Another example would be this: we have a separate pipeline that does some basic setup for our AWS account like creating a star certificate that is later used by different services. It's a separate project and if later we need to get ARN of the certificate, we can read it from remote state even though the module that wrote to the state is not part of the project.

I am not sure if there is a more proper way to do it, but this is how we are handling it at the moment

brikis98 commented 4 years ago

Ah, I gotcha.

I suspect that having Terragrunt generate a module with a terraform_remote_state data source will be relatively complicated to build/maintain in a way that works with all use cases. Perhaps you can use the generate block to do this for your specific use case instead?

kromol commented 4 years ago

I tried and it didn't work. I don't remember the exact error, but maybe I will try one more time and update this thread.

kromol commented 4 years ago

By using generate block I can generate terraform files, but there seems to be no way to access data from those files in terragrunt.hcl. I need to use that data as an input for a module.

Edit: I also tried to use run_cmd to run terraform output on generated files, but it also did not work.

brikis98 commented 4 years ago

Ah, you're right. It would be all applied together as one module, whereas what you want is to run the data sources first as one module, read their outputs, and pass them as inputs to the other module.

Hm, I certainly get the need for your use case, but something about first class support for this in Terragrunt strikes me as a lot of complexity and expense to maintain. I'm certainly open to other ideas though. @yorinasub17 Curious to get your take too.

yorinasub17 commented 4 years ago

I agree that this is going to add enough complexity to terragrunt that it may not be worth implementing. Ultimately, this is one step away from allowing terragrunt to dynamically read terraform data sources, and I am not sure that is something we should do in the current form of terragrunt.

With that said, in the new version we've been discussing (https://github.com/gruntwork-io/terragrunt/issues/759), this feature would be supported out of the box since the engine for terragrunt context will now be in terraform (assuming you don't need those data source lookups in the preprocessor).


One other potential workaround: if all you need is the ability to read remote state files, could you create a bash script that implements this functionality with run_cmd?

terraform output has the ability to read data out of local state file, so in theory you can have a script that:

You can then read the output data using jsondecode to work with it in terragrunt.

NasAmin commented 3 years ago

I am in the same boat, I have recently started migrating our terraform stacks to terragrunt and we have different resources for different environments in different statefiles for example: VPC is in a separate state file and other modules like EKS or documentdb accesses the vpc_id and private_subnet_ids from the remote state as a data source from the vpc module.

However, the problem I am facing is when I run terragrunt plan on a module that relies on the remote state or the VPC module, I am getting an error that the state doesn't exist this is when I am trying to plan the entire environment for the first time using terragrunt plan-all

Is there a way to mock outputs from a remote-state like we can do the dependency input mocking?

Kind Regards,

Nas

masabiGonz commented 3 years ago

this would be great to have... for example we're also currently having to either hardcode some inputs that are actually values coming from externally located terragrunt modules or merging those together to be able to properly set up dependencies.

Being able to somehow specify dependency but for an external terragrunt+terraform module would be amazing

s1mark commented 3 years ago

@kromol Till the https://github.com/gruntwork-io/terragrunt/issues/759 is implemented maybe you could consider the following. I guess in your terragrunt.hcl file you specify the

terraform {
  source = ...
}

I suppose you are referencing directly to the module that you want to supply the attributes from the inputs block in your terragrunt.hcl. Maybe you could do the following restructure: Like @brikis98 suggested use the generate block to create a main.tf. It could contain the terraform_remote_state datasource and the necessary module block with all the desired variables that you originally sourced in and configured in your terragrunt.hcl

This way you dont need to create another helper module... just wire out the module sourcing and parameter setting to the generated maint.tf

all4innov commented 3 years ago

1240 same topic

gtazzoli-uala commented 2 years ago

+1 to this. We have more than 100 aws accounts in our company with dozens of products. Each product will go to internet through a centralized transit gateway/vpns. When we deploy a new product we need to fetch the transit gateway tfstate and vpc from the centralized account in order to create the routes and create transit gateway attachments.

We are using a custom terraform module that fetches any tfstate outputs and we use that as a terragrunt dependency block. It will be very useful to be able to read remote states outputs that were created from others projects.

garrington commented 2 years ago

+1 also, we use a similar structure to the above and have encountered this limitation... the workarounds are hackish and make me feel dirty, like when I use Windows.

naviat commented 2 years ago

+1 to this. We have more than 100 aws accounts in our company with dozens of products. Each product will go to internet through a centralized transit gateway/vpns. When we deploy a new product we need to fetch the transit gateway tfstate and vpc from the centralized account in order to create the routes and create transit gateway attachments.

We are using a custom terraform module that fetches any tfstate outputs and we use that as a terragrunt dependency block. It will be very useful to be able to read remote states outputs that were created from others projects.

I have the same architecture with you but smaller with VPC peering only, could you pls share the custom terraform module which you mentioned?

kamaz commented 2 years ago

Although not the solution but just a workaround I have implemented a script that pulls the state from a remote state.

If that helps anyone this is an example:

#!/bin/bash

while getopts 'k:b:' options
do
  case $options in
    b) 
      bucket=$OPTARG 
      ;;
    k) 
      key=$OPTARG 
      ;;
  esac
done
shift $((OPTIND -1))

key=${key:?'Key is required'}
bucket=${bucket:?'Bucket is required'}

aws s3 cp s3://${bucket}/${key} ${key} > /dev/null 2>&1
cat ${key} | jq '.outputs| with_entries(.value = (.value.value))'

The example below will make it a little bit nicer as well removing all the outputs and values.

Example output:

{ 
  "datadog": {                                                                   
    "policy_arn": "arn:aws:iam::xxxx:policy/monitoring/DatadogGetSecret",
    "secret_arn": "arn:aws:secretsmanager:xxxx:xxxx:secret:xxxx"
  }
}

Terragrunt example:

  platform_output        = jsondecode(run_cmd("./config/state.sh", "-b", "<replace_with_bucket_name>", "-k", "<replace_with_path_to_state_object>"))

# later in the code

inputs = {
  policy_arns = [
    local.platform_output.datadog.policy_arn,
  ]
}
matteo-ssys commented 1 year ago

It would be great with any data source, not only remote state: since terragrunt relies on modules, if a module requires some input from AWS you will need to implement your way to fetch it (separate data source module which requires then to be referenced with dependency and dependencies, or run-cmd with some bash script/make file). Examples: when you need to lookup the availability zones deploying subnets, or if you need to lookup an IAM role for a resource, well basically any "lookup"..

Kristin0 commented 1 year ago

any updates?

jmreicha commented 8 months ago

Just ran into this one, looks like it has been awhile since there was activity. Has anyone found a clean workaround for this outside of run_cmd or using another module output as a dependency?

FlorinAndrei commented 1 month ago

It would be great with any data source

Exactly. Terragrunt needs a native way to query data sources, just like Terraform does. That would be very, very helpful.