hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.87k stars 9.21k forks source link

aws_route53_health_check does not synchronize configuration after an apply #29251

Open Manouchehri opened 1 year ago

Manouchehri commented 1 year ago

Terraform Core Version

v1.3.7

AWS Provider Version

v4.52.0

Affected Resource(s)

Expected Behavior

We have a aws_route53_health_check that is based off an aws_cloudwatch_metric_alarm. The aws_cloudwatch_metric_alarm checks the healthiness of an EC2 instance. When we update the instance-id of the aws_cloudwatch_metric_alarm, we would expect the aws_route53_health_check health check to start using the updated configuration of the alarm.

Actual Behavior

We update the the instance-id of the aws_cloudwatch_metric_alarm but the aws_route53_health_check still uses the OLD instance id. You have to manually go to the AWS console and click on "Synchronize Configuration" button.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_route53_record" "walstream-secondary" {
  name = "walstream.ccodb.foobar.com"
  zone_id = "${data.terraform_remote_state.route53.foobar_zone_id}"
  type = "A"
  set_identifier = "SECONDARY"
  failover_routing_policy = {
       type = "SECONDARY"
   }
  alias {
    name = "write.ccodb.foobar.com"
    zone_id = "${data.terraform_remote_state.route53.foobar_zone_id}"
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "walstream" {
  lifecycle { create_before_destroy = true }
  name    = "walstream.ccodb.foobar.com"
  zone_id = "${data.terraform_remote_state.route53.foobar_zone_id}"
  type = "A"
  set_identifier = "PRIMARY"
  health_check_id = "${aws_route53_health_check.walstream_health_check.id}"
  failover_routing_policy = {
       type = "PRIMARY"
   }
  records = ["${var.failover_ip}"]
  ttl     = "60"
}

resource "aws_cloudwatch_metric_alarm" "failover_instance_health" {
    alarm_name          = "${var.environment}-failover-instance-health"
    alarm_description   = "The healthyness of the failover instance"
    namespace           = "AWS/EC2"
    metric_name         = "StatusCheckFailed"
    dimensions {
      InstanceId = "${var.failover_id}"
    }
    statistic           = "Average"
    period              = "60"
    evaluation_periods  = "1"
    comparison_operator = "GreaterThanOrEqualToThreshold"
    threshold           = "1"
    actions_enabled     = "True"
    # Trigger Alarm if data is missing:
    treat_missing_data  = "breaching"
    # Alarm when triggered:
    alarm_actions       = ["${data.terraform_remote_state.vpc.db_topic_arn}"]
    ok_actions          = ["${data.terraform_remote_state.vpc.db_topic_arn}"]
    actions_enabled     = "${var.alerting_enabled}"
}

resource "aws_route53_health_check" "walstream_health_check" {
  type                            = "CLOUDWATCH_METRIC"
  cloudwatch_alarm_name           = "${aws_cloudwatch_metric_alarm.failover_instance_health.alarm_name}"
  cloudwatch_alarm_region         = "${var.region}"
  insufficient_data_health_status = "Unhealthy"
  tags = {
    Name = "${var.environment}-failover-instance-health"
  }
}

Steps to Reproduce

Update a CloudWatch alarm, and notice that the route53 health check is still outdated.

Debug Output

No response

Panic Output

No response

Important Factoids

I did manage to find a very hacky workaround: abuse replace_triggered_by with a null_resource to always run, and se use create_before_destroy to safely swap out the aws_route53_health_check.

resource "null_resource" "always_run" {
  triggers = {
    timestamp = "${timestamp()}"
  }
}

resource "aws_route53_health_check" "arm" {
  lifecycle {
    replace_triggered_by = [
      null_resource.always_run
    ]
    create_before_destroy = true
  }
  type                            = "CLOUDWATCH_METRIC"
  cloudwatch_alarm_name           = trimprefix(data.aws_arn.arm.resource, "alarm:")
  cloudwatch_alarm_region         = data.aws_arn.arm.region
  insufficient_data_health_status = "Healthy"
}

References

This is a straight copy of https://github.com/hashicorp/terraform-provider-aws/issues/3489, as it was never addressed.

Would you like to implement a fix?

No

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

Manouchehri commented 1 year ago

cc @jmaitrehenry, @mutt13y, @JeremieCharest, @grayaii

nathanloyer commented 1 year ago

I just ran into this bug today when trying to use this resource type. Thanks for the suggestion for a workaround. I'll try that out since it seems like this isn't getting prioritized. It seems like this may just be an issue with AWS itself as I have the same problem if I manually go and update the cloudwatch alarm. Probably need to update the terraform code to hit the API to kick off synchronization after an alarm update.

dinukarajapaksha commented 2 months ago

+1

Facing the same issue in hashicorp/aws v5.57.0