Open avnerv opened 1 year ago
Voting for Prioritization
Volunteering to Work on This Issue
We are encountering a similar issue with the aws_region
data resource.
We're on version v3.76.1
of the AWS provider, and have tested this on both v1.0.11
and v1.3.9
of Terraform.
We have two modules that are used together for making a lambda function. I've simplified the examples to keep this concise.
# main.tf
module "iam_role" {
}
module "lambda_function" {
iam_role = module.iam_role.arn
}
Inside the lambda_function module, we look up the aws_region and join it with a few user values to create the function name.
# lambda_function module
data "aws_region" "current" {}
resource "random_id" "main" {
byte_length = 2
}
resource "aws_lambda_function" "main" {
function_name = join("-", [var.input_1, data.aws_region.current.name, var.input_2, random_id.main.hex,])
}
The problem we're encountering is when an inconsequential input to the iam_role module is changed, it's triggering the data resource inside the lambda module to re-read, and then recreate the function (with the same name after the apply completes). We also use a random string in the name, and that resource does NOT get changed.
For instance, if we simply add a new tag to the iam_role, the data resource will get marked as needing to be read during apply due to a dependency.
# terraform plan output
# module.lambda_function.data.aws_region.current will be read during apply
# (depends on a resource or a module with changes pending)
<= data "aws_region" "current" {
+ description = (known after apply)
+ endpoint = (known after apply)
+ id = (known after apply)
+ name = (known after apply)
}
# module.lambda_function.aws_lambda_function.main must be replaced
-/+ resource "aws_lambda_function" "main" {
# redacted additional changes
~ function_name = "redacted" -> (known after apply) # forces replacement
}
# module.iam_role.aws_iam_role.main will be updated in-place
~ resource "aws_iam_role" "main" {
id = "redacted"
name = "redacted"
~ tags = {
+ "DummyTag" = "anyvalue"
# (7 unchanged elements hidden)
}
~ tags_all = {
+ "DummyTag" = "anyvalue"
# (7 unchanged elements hidden)
}
# (9 unchanged attributes hidden)
# (1 unchanged block hidden)
}
Same issue with data "aws_region"
Same issue with data "aws_caller_identity"
Terraform version: v1.4.5
AWS provider version: v4.58.0
Hey @avnerv 👋 Thank you for taking the time to raise this (and for everyone else for the ongoing discussion). The behavior you're experiencing here is described in the Terraform Data Source documentation under Data Resource Behavior.
If the data source depends on another managed resource (including modules) that has changes, the data source cannot be read until apply time. When the data from the data source is then used to set the value of an argument on an additional resource, that value will in turn not be known until apply time. For certain arguments on certain resources, this is seen as a reason to replace the resource; for example, changing the function_name
argument of the aws_lambda_function
resource requires that the resource be replaced. This is why seemingly inconsequential changes are leading to resources being replaced in each of the scenarios listed in the comments.
I am experiencing the same behaviour.
We created a Terraform module that uses aws_region
and aws_availability_zones
data sources.
Resources are created based on the output from those two data sources.
We recently updated the module and created a new version release, when the input to the module is changed, the data sources are read again, and resulted in the behaviour justinretzolk described above.
While I understand this is a Terraform behaviour, and combined with the fact that certain attributes of AWS resources are immutable which leads to resources being replaced; this behaviour remains undesirable and renders the point of using data sources moot.
Is there a known solution / workaround to the data source problem, or is there a way to force data sources to be "pre-applied" prior to the resources getting applied?
I had to remove almost all of the data providers in the project because of this. My understanding is that Hashicorp wants things to be more deterministic but even in cases where the data provider is always the same like with availability zones, the data provider causes the resources to be re-created. What exactly is Hashicorp trying to achieve here? This is too big of an issue for this to be just a bug.
Hi all,
Looking at the original example I see that the data
block contains a depends_on
argument which refers to aws_subnet.infra
. This tells Terraform that if there are any changes to anything about aws_subnet.infra
then they must be completed before reading this data source. Terraform doesn't have any further information about why you've declared that dependency, so to ensure correct behavior it has no option but to delay reading the data source until the apply step.
I would suggest solving this by making use of the data Terraform already has in memory as a result of planning and applying the resource "aws_subnet"
blocks. Terraform already knows the IDs of those objects as a result of having created them, so there's no reason to re-read the same information again from the remote API.
For example (assuming all three of those aws_subnet
resources use the count
argument, and are therefore represented as lists of subnet objects):
locals {
subnets_by_name = {
for subnet in concat(aws_subnet.public, aws_subnet.infra, aws_subnet.public) :
subnet.tags_all["Name"] => subnet
}
}
resource "aws_route_table_association" "this" {
for_each = { for key in var.route_table_routes : key.name => key }
subnet_id = local.subnets_by_name[each.value.subnet_name].id
route_table_id = aws_route_table.this[each.key].id
}
Any situation where you use a data
block to read exactly the same object that's managed by a resource
block elsewhere in the same configuration will typically run into this situation, because Terraform needs to wait to see what effect all of the changes to the managed resources will have before re-checking the data sources. data
blocks are primarily useful for declaring dependencies on data that is by resource
blocks in some other configuration, where you then wouldn't be able to just refer directly to the resource data as I showed in the example above.
For any situation with these symptoms that isn't caused by trying to re-read the same object another part of the configuration is already managing, the usual answer will be to specify your dependencies more precisely. depends_on
is a very blunt instrument and should be used as a last resort, particularly with module
blocks, because it gives Terraform very little information about what the dependency represents and therefore causes Terraform to be more cautious to make sure it will always respect the order of operations that you've described.
In particular, there's no reason for an empty data "aws_region"
, data "aws_availability_zones"
, or data "aws_caller_identity"
block to be subject to depends_on
because no resource you can declare in a Terraform configuration can ever affect what regions or availability zones are present in AWS, or what IAM principal you're authenticating as. Therefore any configuration where such a block either directly has a depends_on
argument or is in a module with a depends_on
argument has over-specified dependency declarations, and the solution would be to remove those redundant declarations.
Like I said, data providers are nearly useless now. So many rules and expectations from developers just to make it behave is a really poor developer experience. There is no reason a data provider could not run at plan time and determine what if any changes would be made to the state.
Literally no one on my team would be able to understand these rules and still get value out of data providers. It's too obscure of a design requirement.
At the moment we are working around this issue with a lifecycle policy on resources that consumes this problematic AWS data source.
It seems to be adequate in preventing resources from being destroyed, though we're still exploring the implication of this policy.
lifecycle {
ignore_changes = [
availability_zone
]
}
I'm having a similar issue to this in modules that have explicit dependencies. It might be something going on but if I set a module that depends on something this triggers as well. In those cases does aws_region inherit the dependency and specifically trigger even though it's effectively a constant within the same region? I've had to work around this by injecting region as a constant into modules instead of being able to query them but it's a real pain.
for example:
resource "some_resource_type" "some_resource_name" {
#...
}
module "that_contains_a_region_data_source" {
depends_on = [some_resource_type.some_resource_name]
# ...
}
^ this ends with a persistent diff on anything that relies on region - I'm assuming because of this issue - ideally I'd be able to use aws_region in this case without having to pass region through.
I'm facing the same issue with following data types:
In particular, there's no reason for an empty data "aws_region", data "aws_availability_zones", or data "aws_caller_identity" block to be subject to depends_on because no resource you can declare in a Terraform configuration can ever affect what regions or availability zones are present in AWS, or what IAM principal you're authenticating as. Therefore any configuration where such a block either directly has a depends_on argument or is in a module with a depends_on argument has over-specified dependency declarations, and the solution would be to remove those redundant declarations.
@apparentlymart not sure if what you write here applies to my case. I don't use the depends_on
property and still have massive problems.
I created a submodule which should create backup vaults. I'm passing my provider and want to determine the KMS key to use. Simplified and redacted example:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
configuration_aliases = [
aws.backup,
]
}
}
}
data "aws_region" "this" {
provider = aws.backup
}
resource "aws_backup_vault" "backup" {
provider = aws.backup
name = var.source_account.id
kms_key_arn = data.aws_region.this.name == "us-east-1" ? var.aws_kms_key.arn_home_region : var.aws_kms_key.arn_backup_region
}
I get the same message for the aws_region
depends on a resource or a module with changes pending
This results in the key not being known to terraform and therefore terraform wants to replace (destroy and recreate) the backup vault. So in order to make this work I'd have to pass the region separately as a variable. This would work, but from my perspective it doesn't make sense. Because I already know in which region I want to operate based on my provider. Now I'm introduced to another problem: I could have a mismatch in region between my provider and the variable that I pass.
@woodcockjosh wrote:
I had to remove almost all of the data providers in the project because of this. My understanding is that Hashicorp wants things to be more deterministic but even in cases where the data provider is always the same like with availability zones, the data provider causes the resources to be re-created. What exactly is Hashicorp trying to achieve here? This is too big of an issue for this to be just a bug.
Same. In a new module I had to remove all of them except for two. I'm not remembering this wrong, am I? The behavior changed, right? Is it possible that the whole topic on memory consumption is the cause to this change as one of the "smaller optimizations"? If it has actually changed and not just me remembering it wrong.
@apparentlymart Just checking in on this, curious about the folks reporting this with aws_region, which shouldn't depend on any resources.
Terraform v1.7.4
Similar issue, where re-create happens on no change with data retrieved from data-source:
data "aws_route53_zone" "enviroment" {
zone_id = var.hosted_zone_id
}
data "aws_iam_policy_document" "route53" {
statement {
sid = "AllowTrexRoute53"
actions = [
"route53:ChangeResourceRecordSets"
]
resources = [
"arn:aws:route53:::hostedzone/${data.aws_route53_zone.enviroment.zone_id}"
]
condition {
test = "ForAllValues:StringEquals"
values = ["loki.${local.dns_name}"]
variable = "route53:ChangeResourceRecordSetsNormalizedRecordNames"
}
}
}
resource "aws_iam_policy" "route53" {
name = "<redacted>-route53"
path = "/"
description = "route53 access to ${data.aws_route53_zone.enviroment.name}"
policy = data.aws_iam_policy_document.route53.json
}
results in:
-/+ resource "aws_iam_policy" "route53" {
~ arn = "<redacted>" -> (known after apply)
~ description = "route53 access to <redacted>" # forces replacement -> (known after apply) # forces replacement
~ id = "<redacted>" -> (known after apply)
name = "<redacted>"
+ name_prefix = (known after apply)
~ policy = jsonencode(
{
- Statement = [
- {
- Action = "route53:ChangeResourceRecordSets"
- Condition = {
- "ForAllValues:StringEquals" = {
- "route53:ChangeResourceRecordSetsNormalizedRecordNames" = "<redacted>"
}
}
- Effect = "Allow"
- Resource = "arn:aws:route53:::hostedzone/<redacted>"
- Sid = "AllowRoute53"
},
]
- Version = "2012-10-17"
}
) -> (known after apply)
~ policy_id = "ANPA2FEP2ECCOJ3AVI3RH" -> (known after apply)
- tags = {} -> null
# (2 unchanged attributes hidden)
}
This is just one example. Another case is where I use count based on data resource:
data "aws_security_groups" "agent_access" {
filter {
name = "vpc-id"
values = [var.vpc_id]
}
tags = {
agent = "true"
}
}
data "aws_security_group" "agent" {
count = length(data.aws_security_groups.agent_access.ids)
id = data.aws_security_groups.agent_access.ids[count.index]
}
Results in:
The "count" value depends on resource attributes that cannot be determined until apply, so Terraform cannot predict how many instances will be created. To work around this, use the -target argument to first apply only the resources that the count depends on.
The amount of security groups are the same, they are static in this case. Why they can't be retrieved and counted during plan phase?
I've wrote a small python script that does the plan, finds the resources that need to be read, and outputs the apply command targeting those particular resources.
import json
import subprocess
# terraform plan -out=./plan.out
# terraform show -json plan.out > plan.json
# load state from JSON file and find reads
subprocess.Popen('terraform plan -out=./plan.out', shell=True).wait()
subprocess.Popen('terraform show -json plan.out > plan.json', shell=True).wait()
with open("plan.json", "r") as plan:
state = json.load(plan)
reads = []
for item in state["resource_changes"]:
actions = item["change"]["actions"]
if "read" in actions:
reads.append(item["address"])
print(f"terraform apply --target {" --target ".join(f'"{resource}"' for resource in reads)}")
This fixes the issue for my particular case.
Run the script above, run resulting target apply, next run is normal apply -> all green, no changes.
it is ugly, but it works :/
Edit:
Some good news, at least during apply it won't re-create resources that are matching:
terraform apply
<redacted plan>
Plan: 0 to add, 11 to change, 0 to destroy.
<apply output>
Apply complete! Resources: 0 added, 8 changed, 0 destroyed.
That aws_region
example above includes a required_providers
block that declares that the module expects to be passed a provider configuration from the calling module, so I assume this must be a child module rather than a root module.
All of the resources in a child module must inherit whatever dependencies the module itself has, because Terraform must resolve the module call before it can resolve anything inside the module. Modules acquire their own dependencies through use of depends_on
, count
, or for_each
in the module
block. Therefore I would have to guess that the module
block uses one of those arguments and refers to something else that's changing in the same plan, and so Terraform must wait for those changes to be resolved before reading the data source in order to honor those declared dependencies, but I can't be sure about that without seeing the whole plan.
@apparentlymart
Any one have solution for this? We are also having same issue .
When did this change exactly? We have a module that is in use across multiple applications (though varying versions of the AWS provider and Terraform to be sure) and this problem only just hit us on our latest application which is generally using more current versions of each.
Basically the module accepts a list of subnets and, in the process of creating a security group, fetches the subnet data from AWS to get the associated vpc_id:
data "aws_subnet" "main" { id = var.subnets_ids[0] }
On each plan now, Terraform wants to destroy and recreate this security group because it thinks the vpc_id might change (when it doesn't). If I move this data resource to the parent module and pass in the VPC directly as a variable, Terraform is happy to not recreate the underlying resource because it doesn't actually change. The work around is to update the module to accept a vpc_id directly, but still seems odd that this worked before and has apparently stopped.
It was already mentioned, but I'm also seeing aws_caller_identity
trip up the plan, which is annoying, but at least in this case doesn't cause anything more than a no-op update instead of a destroy which cascades to other resources.
Versions: hashicorp/aws v4.67.0, terraform v1.7.2
ETA: Checked another project with a very similar setup. Uses an older version of the same module, though on comparison nothing substantially different stands out. We just plan/applied that project and the data resource for the subnet did not trigger a diff. Versions: hashicorp/aws v3.76.0, terraform v1.3.7.
@JohnKeippel
In our case we are creating broker first than using datasource for data "dns_a_record_set" to get ip which passed to nlb .. terraform assuming if any update on broker will change ip address but in our its not .
Also running into the same issue for various resources, some of which can be updated in-place, and other needs to be recreated entirely from scratch. After putting some thought into this issue, I'm pretty sure this isn't AWS' fault, rather, this is a Hashicorp/Terraform issue, as others have mentioned in this thread, it doesn't make any logical sense for a datablock to not* be queried while developing out the execution plan (just like how other states are pulled to determine if resources need to be created in the first place).
I see this "known after apply" for even a trivial case with a child resource:
resource "aws_iam_policy" "dynamodb_fetch" {
name = "${var.project}-dynamodb-fetch-policy"
description = "Allow dynamodb fetch with overrides"
policy = data.aws_iam_policy_document.dynamodb_fetch.json
}
data "aws_iam_policy_document" "dynamodb_fetch" {
statement {
sid = "DynamoDb"
effect = "Allow"
actions = [
"dynamodb:Describe*",
"dynamodb:Get*",
"dynamodb:BatchGetItem"
]
resources = ["*"]
}
}
The terraform plan with no changes in the code results in the following in the plan:
~ resource "aws_iam_policy" "dynamodb_fetch" {
id = "arn:aws:iam::123456789012:policy/myproject-dynamodb-fetch-policy"
name = "myproject-dynamodb-fetch-policy"
~ policy = jsonencode(
{
- Statement = [
- {
- Action = [
- "dynamodb:Get*",
- "dynamodb:Describe*",
- "dynamodb:BatchGetItem",
]
- Effect = "Allow"
- Resource = "*"
- Sid = "DynamoDb"
},
]
- Version = "2012-10-17"
}
) -> (known after apply)
tags = {}
# (5 unchanged attributes hidden)
}
How can this simple policy not be known at plan time?? Many other resources don't have this problem, but aws_iam_policy and lambda resources seem particularly problematic with regards to unnecessary terraform plan noise.
I think @justinretzolk already answered the reason. I am putting an example here.
Here is simple directory structure. Assume there are two modules alarm
and cw
and a terrform.tf
in the root module (just keeping one simple file).
.
├── alarm
│ ├── main.tf
│ └── versions.tf
├── cw
│ ├── main.tf
│ └── versions.tf
└── terraform.tf
cw/main.tf
variable "name_prefix" {
type = string
}
data "aws_region" "current" {}
resource "aws_cloudwatch_log_group" "this" {
name = "${var.name_prefix}/application/sample-${data.aws_region.current.name}" # depends on data resource
}
cw/versions.tf is
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0"
}
}
}
terraform.tf
terraform {
required_version = "~> 1.8.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.47"
}
}
backend "s3" {
region = "eu-north-1"
bucket = "terraform-state-blah-eu-north-1"
key = "mock-test.tfstate"
profile = "aws-dev"
dynamodb_table = "terraform-state-lock"
}
}
provider "aws" {
region = "eu-north-1"
profile = "aws-dev"
}
module "alarm" {
source = "./alarm"
}
# NOTE: I am calling same module twice once without depends on and once with depends on
module "cw_without_depends_on" {
source = "./cw"
name_prefix = "without-depends-on"
}
module "cw_with_depends_on" {
source = "./cw"
name_prefix = "with-depends-on"
depends_on = [module.alarm] <------- See this
}
For this following plan is generated.
# module.cw_with_depends_on.data.aws_region.current will be read during apply
# (depends on a resource or a module with changes pending)
<= data "aws_region" "current" {
+ description = (known after apply)
+ endpoint = (known after apply)
+ id = (known after apply)
+ name = (known after apply) <--- due to depends_on
}
# module.cw_with_depends_on.aws_cloudwatch_log_group.this will be created
+ resource "aws_cloudwatch_log_group" "this" {
+ arn = (known after apply)
+ id = (known after apply)
+ log_group_class = (known after apply)
+ name = (known after apply) <--- name NOT resolved, due to aws_region resolution is deffered
+ name_prefix = (known after apply)
+ retention_in_days = 0
+ skip_destroy = false
+ tags_all = (known after apply)
}
# module.cw_without_depends_on.aws_cloudwatch_log_group.this will be created
+ resource "aws_cloudwatch_log_group" "this" {
+ arn = (known after apply)
+ id = (known after apply)
+ log_group_class = (known after apply)
+ name = "without-depends-on/application/sample-eu-north-1" <--- name resolved.
+ name_prefix = (known after apply)
+ retention_in_days = 0
+ skip_destroy = false
+ tags_all = (known after apply)
}
data resource aws_region
resolution is deferred, if we introduce depends_on
on the module.
similar behaviour will happen for for_each
and count
Hope this helps.
I was able to replicate the issue with data aws_eks_cluster
resource. Any time I change manually anything about the eks cluster config -> ALL data resources are recalculated again at apply time.
This is a huge issue. I have a workaround at hand for my case, but single resource requiring refresh should not cause ALL data sources in the module to refresh.
This is a bug.
Same issue with data "aws_ec2_transit_gateway_attachment"
Terraform version: v1.5.4 AWS provider version: v5.52.0
I see this issue with changes in default_tags which can be annoying if you have a lot of policy documents in a single directory.
# data.aws_iam_policy_document.default will be read during apply
# (depends on a resource or a module with changes pending)
<= data "aws_iam_policy_document" "default" {
+ id = (known after apply)
+ json = (known after apply)
+ minified_json = (known after apply)
+ statement {
+ actions = [
+ "events:PutEvents",
]
+ effect = "Allow"
+ resources = [
+ "arn:aws:events:us-east-1:snip:event-bus/guardduty",
]
+ sid = "snip"
+ principals {
+ identifiers = [
+ "snip",
]
+ type = "AWS"
}
}
}
# aws_cloudwatch_event_bus_policy.default will be updated in-place
! resource "aws_cloudwatch_event_bus_policy" "default" {
id = "guardduty"
! policy = jsonencode(
{
- Statement = [
- {
- Action = "events:PutEvents"
- Effect = "Allow"
- Principal = {
- AWS = "arn:aws:iam::snip:root"
}
- Resource = "arn:aws:events:us-east-1:snip:event-bus/guardduty"
- Sid = "snip"
},
]
- Version = "2012-10-17"
}
) -> (known after apply)
# (1 unchanged attribute hidden)
}
Terraform Core Version
1.3.6
AWS Provider Version
3.76.1
Affected Resource(s)
Expected Behavior
In this scenario, when I create a new subnet (such as
aws_subnet.infra
, which is referred to asmanagement
), I anticipate that Terraform will only create the resources that are related to it, such as:Actual Behavior
what occurs is that Terraform replaces an existing
aws_route_table_association
resources even if they haven't been modified, when a new subnet is created.Relevant Error/Panic Output Snippet
No response
Terraform Configuration Files