Open ghost opened 4 years ago
I'm having the same issue. I think it's ordering the custom environment variables, plus adding some default configurations if you don't already have them in your task definition. For my case I had to add these default options and reorder the environment variable according to the diff output.
It'd be nice if terraform can
Going off of @moyuanhuang I also suspect the issue is the ordering of environment variables. I do NOT see the issue with secrets. One thing to note for my use case is I am changing the image of the task definition. So I do expect a new task definition to be created with the new image, but I do not expect to see a diff for unchanging environment variables.
This makes evaluating diffs for task definitions extremely difficult.
I notice the that AWS API + CLI do return these arrays in a consistent order (from what I can see), so perhaps this is something that Terraform or the provider itself is doing.
Hi, I having the same issue without mount points.
In addition of reordering custom environment variables, I have some variable set at null
who indicate terraform to recreate the task definition.
Exemple with docker health check :
~ healthCheck = {
command = [
"CMD-SHELL",
"agent health",
]
interval = 15
retries = 10
startPeriod = 15
- timeout = 5 -> null
}
@LeComptoirDesPharmacies That probably means there's a default value set for this particular config to be 5. However because you don't specify that config in your task definition so terraform thinks that you're trying to set it to null
(which is never gonna happen since there is an enforced default). Add
timeout = 5
...to your task definition and you should be able to avoid terraform recreating the task.
@moyuanhuang Yes thanks. But I have the problems with custom environment variables too. I found this fix who is waiting to be merge :
I found that alphabetizing my env variables by name seems to keep it out of the plan. Noticed that the ecs task definition stores them that way in the json output in the console.
I can confirm that alphabetizing like @jonesmac mentioned and adding in the items with their default values that terraform thinks has changed will resolve this, as a workaround.
I have also got hit with this. The workaround suggested in this thread 1. ensure environment variables are in alphabetical order 2. ensure all the default values are filled in with its null/empty values worked for us for now. I still believe that , this is the terraform aws provider issue and still a bug to address for the future.
The same thing happened using FluentBit log router. After adding its container definition, Terraform was forcing a new plan each time, even without touching anything:
~ {
- cpu = 0 -> null
- environment = [] -> null
- mountPoints = [] -> null
- portMappings = [] -> null
- user= "0" -> null
- volumesFrom = [] -> null
# (6 unchanged elements hidden)
},
After setting explicitly these values in the code, no more changes. Here's the FluentBit container definition:
{
essential = true,
image = var.fluentbit_image_url,
name = "log_router",
firelensConfiguration = {
type = "fluentbit"
},
logConfiguration : {
logDriver = "awslogs",
options = {
awslogs-group = "firelens-container",
awslogs-region= var.region,
awslogs-create-group = "true",
awslogs-stream-prefix = "firelens"
}
},
memoryReservation = var.fluentbit_memory_reservation
cpu = 0
environment = []
mountPoints = []
portMappings = []
user = "0"
volumesFrom = []
}
Hey y'all 👋 Thank you for taking the time to file this issue, and for the continued discussion around it. Given that there's been a number of AWS provider releases since the last update here, can anyone confirm if you're still experiencing this behavior?
@justinretzolk I can tell you this is still an issue. We've been hitting it for almost a year now, and I've made a concerted effort over the last week or so to address it. These are our versions (should be the latest at the time of posting this):
Terraform v1.0.11
on linux_amd64
+ provider registry.terraform.io/hashicorp/aws v3.64.2
+ provider registry.terraform.io/hashicorp/null v3.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/hashicorp/time v0.7.2
+ provider registry.terraform.io/hashicorp/tls v3.1.0
Still actively going through the workarounds above trying to get this to work correctly, but so far no dice.
@justinretzolk This issue still persists in the latest AWS provider version and should definitely need to be addressed.
It's been a little while since I last posted. I've attempted to fix this a few times since my last post, but I've had no success so far. To provide a little more information, this is one of the ECS tasks that's being effected (sorry for the sanitization):
-/+ resource "aws_ecs_task_definition" "task" {
~ arn = "arn:aws:ecs:AWS_REGION:AWS_ACCOUNT_ID:task-definition/AWS_ECS_TASK_NAME:317" -> (known after apply)
~ container_definitions = jsonencode(
[
- {
- cpu = 0
- dockerLabels = {
- traefik.frontend.entryPoints = "https"
- traefik.frontend.passHostHeader = "true"
- traefik.frontend.rule = "Host:MY_DNS_NAME"
- traefik.protocol = "https"
}
- environment = [
- {
- name = "APP_PORT"
- value = "54321"
},
]
- essential = true
- image = "DOCKER_REPOR_URL:DOCKER_TAG"
- logConfiguration = {
- logDriver = "awslogs"
- options = {
- awslogs-group = "AWS_LOGS_GROUP"
- awslogs-region = "AWS_REGION"
- awslogs-stream-prefix = "AWS_STREAM_PREFIX"
}
}
- mountPoints = [
- {
- containerPath = "/PATH/IN/CONTAINER/"
- sourceVolume = "EFS_NAME"
},
- {
- containerPath = "/PATH/IN/CONTAINER"
- sourceVolume = "EFS_NAME"
},
]
- name = "SERVICE_NAME"
- portMappings = [
- {
- containerPort = 443
- hostPort = 443
- protocol = "tcp"
},
]
- repositoryCredentials = {
- credentialsParameter = "arn:aws:secretsmanager:AWS_REGION:AWS_ACCOUNT_ID:secret:SECRET_VERSION"
}
- startTimeout = 120
- stopTimeout = 120
- volumesFrom = []
},
]
) -> (known after apply) # forces replacement
This is part of the plan output immediately after an apply. One attempt I made recently was just focused on getting 'cpu' above to stop appearing. Adding a "cpu": 0,
into the json for the container definition and reapplying has zero effect on the diff for future plan/applies.
Not sure what I'm doing wrong, but at this point we've begun dancing around the issue by using -target=
during terraform applies so that it doesn't update all the time.
Hi Everyone,
While upgrading from TF version 0.11.x to version 1.0.10, we ran into a similar issue. Though alphabetizing and setting the default values with null/empty values worked, it's a cumbersome process to refactor the task definitions with so many parameters. I believe that the revision number of the task definition is responsible for this behavior if we provision the task without a revision number. Consequently, I made a few changes to the service task definition in ECS to capture the revision number, which helped resolve the issue.
ECS Task Definition JSON File
[
{
"secrets": [
{
"name": "NRIA_LICENSE_KEY",
"valueFrom": "arn:aws:ssm:${xxxxxx}:${xxxxxx}:xxxxxx/portal/${xxxxx}/NewrelicKey"
}
],
"portMappings": [],
"cpu": 200,
"memory": ${ram},
"environment": [
{
"name": "NRIA_OVERRIDE_HOST_ROOT",
"value": "/host"
},
{
"name": "ENABLE_NRI_ECS",
"value": "true"
},
{
"name": "NRIA_PASSTHROUGH_ENVIRONMENT",
"value": "ECS_CONTAINER_METADATA_URI,ENABLE_NRI_ECS"
},
{
"name": "NRIA_VERBOSE",
"value": "0"
},
{
"name": "NRIA_CUSTOM_ATTRIBUTES",
"value": "{\"nrDeployMethod\":\"downloadPage\"}"
}
],
"mountPoints": [
{
"readOnly": true,
"containerPath": "/host",
"sourceVolume": "host_root_fs"
},
{
"readOnly": false,
"containerPath": "/var/run/docker.sock",
"sourceVolume": "docker_socket"
}
],
"volumesFrom": [],
"image": "${image}",
"essential": true,
"readonlyRootFilesystem": false,
"privileged": true,
"name": "${name}",
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "${awslogs_group}",
"awslogs-region": "${aws_region}",
"awslogs-stream-prefix": "${name}"
}
}
}
]
resource "aws_ecs_task_definition" "newrelic_infra_agent" {
family = "${var.workspace}-newrelic-infra-${var.env}"
requires_compatibilities = ["EC2"]
network_mode = "host"
cpu = "256"
memory = "512"
execution_role_arn = var.ecs_task_role_arn
container_definitions = data.template_file.newrelic_infra_agent.rendered
#tags = "${local.tags}"
volume {
name = "host_root_fs"
host_path = "/"
}
volume {
name = "docker_socket"
host_path = "/var/run/docker.sock"
}
resource "aws_ecs_task_definition" "newrelic_infra_agent" {
family = "${var.workspace}-newrelic-infra-${var.env}"
requires_compatibilities = ["EC2"]
network_mode = "host"
cpu = "256"
memory = "512"
execution_role_arn = var.ecs_task_role_arn
container_definitions = data.template_file.newrelic_infra_agent.rendered
volume {
name = "host_root_fs"
host_path = "/"
}
volume {
name = "docker_socket"
host_path = "/var/run/docker.sock"
}
}
data "aws_ecs_task_definition" "newrelic_infra_agent" {
task_definition = "${aws_ecs_task_definition.newrelic_infra_agent.family}"
depends_on = [aws_ecs_task_definition.newrelic_infra_agent]
}
resource "aws_ecs_service" "newrelic_infra_agent" {
name = "${var.workspace}-newrelic-infra-${var.env}"
cluster = aws_ecs_cluster.ecs-cluster.id
task_definition = "${aws_ecs_task_definition.newrelic_infra_agent.family}:${max("${aws_ecs_task_definition.newrelic_infra_agent.revision}", "${data.aws_ecs_task_definition.newrelic_infra_agent.revision}")}"
scheduling_strategy = "DAEMON"
#tags = "${local.tags}"
propagate_tags = "TASK_DEFINITION"
depends_on = [aws_ecs_task_definition.newrelic_infra_agent]
}
Recently re-tested on the latest AWS provider 4.10. Issue still seems to be present.
I did find another workaround for this though. It's not great, but I think it's better than what we'd been doing. Essentially it boils down to this:
This way standard plan/apply never causes the containers to restart (Unless some other attribute changed). If you need to force a restart/redeploy, taint, and then reapply.
This is by no means a request for this issue, but I'm beginning to wish there was a way to override the check for resource replacement. So you could provide a sha or something, and if that changes it will update. Would make this a lot easier.
The same thing happened using FluentBit log router. After adding its container definition, Terraform was forcing a new plan each time, even without touching anything:
~ { - cpu = 0 -> null - environment = [] -> null - mountPoints = [] -> null - portMappings = [] -> null - user= "0" -> null - volumesFrom = [] -> null # (6 unchanged elements hidden) },
After setting explicitly these values in the code, no more changes. Here's the FluentBit container definition:
{ essential = true, image = var.fluentbit_image_url, name = "log_router", firelensConfiguration = { type = "fluentbit" }, logConfiguration : { logDriver = "awslogs", options = { awslogs-group = "firelens-container", awslogs-region= var.region, awslogs-create-group = "true", awslogs-stream-prefix = "firelens" } }, memoryReservation = var.fluentbit_memory_reservation cpu = 0 environment = [] mountPoints = [] portMappings = [] user = "0" volumesFrom = [] }
For me, adding just user = "0"
to the container definition resolved this. Here's the full container definition:
{
essential = true
image = "public.ecr.aws/aws-observability/aws-for-fluent-bit:stable"
name = "log-router"
firelensConfiguration = {
type = "fluentbit"
options = {
enable-ecs-log-metadata = "true"
config-file-type = "file"
config-file-value = "/fluent-bit/configs/parse-json.conf"
}
}
logConfiguration = {
logDriver = "awslogs"
options = {
awslogs-group = aws_cloudwatch_log_group.api_log_group.name
awslogs-region = local.aws_region
awslogs-create-group = "true"
awslogs-stream-prefix = "firelens"
}
}
memoryReservation = 50
user = "0"
}
I have same issue with the latest aws provider: 4.27.0
We also have this issue on terraform 1.2.7 and aws provider 4.31.0. The plan output is only marking arn
, container_definitions
, id
and revision
with ~
('known after apply'), but after the container_definitions
it says # forces replacement
, but the content has not changed at all. We tried sorting the json keys and adding default parameters to no avail. Do we also need to format it exactly like the plan is saying? Because in the container_definitions
its saying to delete all json keys.
With redactions:
~ container_definitions = jsonencode(
[
- {
- cpu = 64
- environment = [
- ...
]
- essential = true
- image = ...
- logConfiguration = {
- logDriver = ...
- options = {
- ...
}
}
- memoryReservation = 64
- mountPoints = []
- name = ...
- portMappings = [
- ...
]
- volumesFrom = []
},
- {
- cpu = 512
- environment = [
- ...
]
- essential = true
- image = ...
- logConfiguration = {
- logDriver = ...
- options = {
- ...
}
}
- memoryReservation = ...
- mountPoints = []
- name = ...
- portMappings = [
- ...
]
- secrets = [
- ...
]
- volumesFrom = []
},
]
No ENV variables changed, no secrets changed and no other configuration keys have been changed. If there is any way for me to help with debugging, please let me know.
A follow-up to my previous comment: The replacement was actually not caused by terraform not getting the diff correct. It was actually caused by a variable which was dependent on a file (data resource) which had a depends_on
to a null_resource
. Even in the documentation for depends_on
it is stated that terraform is more conservative and plans to replace more resources as is possibly needed. So in the end the ordering and filling all values with default values worked.
Our null_resource
had the trigger set to always. This of course makes the null_resource
'dirty' on every terraform run and I suspect that the other dependent resource then also get tagged as dirty in a transient fashion.
Still an issue with 4.63.0
. Setting values to the defaults or null
helps.
In my case I had:
portMappings = [
{
containerPort = "27017"
protocol = "TCP"
hostPort = "27017"
},
TCP was being evaluated as tcp and on the second run terraform was not smart enough to recognize that "TCP" was going to be evaluated as "tcp" and was trying to replace the definition, I changed "TCP" for "tcp" and it stopped trying to replace it.
Still an issue with 5.16.1
In my case, the recreation was caused by the healthcheck
definition. I did add to the config block the default values for interval, retries and such, and the problem was solved:
healthcheck = {
command = ["CMD-SHELL", "curl -f http://127.0.0.1/ || exit 1"]
interval = 30
retries = 3
startPeriod = 5
timeout = 5
}
I've submitted a PR for the healthcheck defaults normalization specifically: https://github.com/hashicorp/terraform-provider-aws/pull/38872
This issue was originally opened by @im-lcoupe as hashicorp/terraform#23780. It was migrated here as a result of the provider split. The original body of the issue is below.
Summary
Hi there,
So this only seems to have become a problem since upgrading my code to the latest version - and strangely only seems to happen on the task definitions with mount points (however, the format of them hasn't changed...)
Terraform will constantly try and replace the two task definitions regardless of whether any changes have been made to them...
Any guidance on this would be greatly appreciated, as it means the task definition revision is changing on every run (which of course is not ideal)...
Terraform Version
Terraform Configuration Files - 1st problematic task definition - followed by the second
Second Task definition
Example plan output - to me it isn't clear what exactly it needs to change which requires a forced replacement - it looks to remove vars that are already there and then re add them? in other places it seems to re order them also. It also looks to be adding the network mode (awsvpc) which is already defined in the task definition?
Expected Behavior
Terraform should not try and replace the task definitions on every plan.
Actual Behavior
Terraform forces replacement of the task definition on every plan.
Steps to Reproduce
Terraform plan