Manouchehri commented 1 year ago

Terraform Core Version

v1.3.7

AWS Provider Version

v4.52.0

Affected Resource(s)

aws_route53_health_check

Expected Behavior

We have a aws_route53_health_check that is based off an aws_cloudwatch_metric_alarm. The aws_cloudwatch_metric_alarm checks the healthiness of an EC2 instance. When we update the instance-id of the aws_cloudwatch_metric_alarm, we would expect the aws_route53_health_check health check to start using the updated configuration of the alarm.

Actual Behavior

We update the the instance-id of the aws_cloudwatch_metric_alarm but the aws_route53_health_check still uses the OLD instance id. You have to manually go to the AWS console and click on "Synchronize Configuration" button.

Relevant Error/Panic Output Snippet

No response

Terraform Configuration Files

resource "aws_route53_record" "walstream-secondary" {
  name = "walstream.ccodb.foobar.com"
  zone_id = "${data.terraform_remote_state.route53.foobar_zone_id}"
  type = "A"
  set_identifier = "SECONDARY"
  failover_routing_policy = {
       type = "SECONDARY"
   }
  alias {
    name = "write.ccodb.foobar.com"
    zone_id = "${data.terraform_remote_state.route53.foobar_zone_id}"
    evaluate_target_health = true
  }
}

resource "aws_route53_record" "walstream" {
  lifecycle { create_before_destroy = true }
  name    = "walstream.ccodb.foobar.com"
  zone_id = "${data.terraform_remote_state.route53.foobar_zone_id}"
  type = "A"
  set_identifier = "PRIMARY"
  health_check_id = "${aws_route53_health_check.walstream_health_check.id}"
  failover_routing_policy = {
       type = "PRIMARY"
   }
  records = ["${var.failover_ip}"]
  ttl     = "60"
}

resource "aws_cloudwatch_metric_alarm" "failover_instance_health" {
    alarm_name          = "${var.environment}-failover-instance-health"
    alarm_description   = "The healthyness of the failover instance"
    namespace           = "AWS/EC2"
    metric_name         = "StatusCheckFailed"
    dimensions {
      InstanceId = "${var.failover_id}"
    }
    statistic           = "Average"
    period              = "60"
    evaluation_periods  = "1"
    comparison_operator = "GreaterThanOrEqualToThreshold"
    threshold           = "1"
    actions_enabled     = "True"
    # Trigger Alarm if data is missing:
    treat_missing_data  = "breaching"
    # Alarm when triggered:
    alarm_actions       = ["${data.terraform_remote_state.vpc.db_topic_arn}"]
    ok_actions          = ["${data.terraform_remote_state.vpc.db_topic_arn}"]
    actions_enabled     = "${var.alerting_enabled}"
}

resource "aws_route53_health_check" "walstream_health_check" {
  type                            = "CLOUDWATCH_METRIC"
  cloudwatch_alarm_name           = "${aws_cloudwatch_metric_alarm.failover_instance_health.alarm_name}"
  cloudwatch_alarm_region         = "${var.region}"
  insufficient_data_health_status = "Unhealthy"
  tags = {
    Name = "${var.environment}-failover-instance-health"
  }
}

Steps to Reproduce

Update a CloudWatch alarm, and notice that the route53 health check is still outdated.

Debug Output

No response

Panic Output

No response

Important Factoids

I did manage to find a very hacky workaround: abuse replace_triggered_by with a null_resource to always run, and se use create_before_destroy to safely swap out the aws_route53_health_check.

resource "null_resource" "always_run" {
  triggers = {
    timestamp = "${timestamp()}"
  }
}

resource "aws_route53_health_check" "arm" {
  lifecycle {
    replace_triggered_by = [
      null_resource.always_run
    ]
    create_before_destroy = true
  }
  type                            = "CLOUDWATCH_METRIC"
  cloudwatch_alarm_name           = trimprefix(data.aws_arn.arm.resource, "alarm:")
  cloudwatch_alarm_region         = data.aws_arn.arm.region
  insufficient_data_health_status = "Healthy"
}

References

This is a straight copy of https://github.com/hashicorp/terraform-provider-aws/issues/3489, as it was never addressed.

Would you like to implement a fix?

No

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

Manouchehri commented 1 year ago

cc @jmaitrehenry, @mutt13y, @JeremieCharest, @grayaii

nathanloyer commented 1 year ago

I just ran into this bug today when trying to use this resource type. Thanks for the suggestion for a workaround. I'll try that out since it seems like this isn't getting prioritized. It seems like this may just be an issue with AWS itself as I have the same problem if I manually go and update the cloudwatch alarm. Probably need to update the terraform code to hit the API to kick off synchronization after an alarm update.

dinukarajapaksha commented 2 weeks ago

+1

Facing the same issue in hashicorp/aws v5.57.0

hashicorp / terraform-provider-aws

aws_route53_health_check does not synchronize configuration after an apply #29251

Terraform Core Version

AWS Provider Version

Affected Resource(s)

Expected Behavior

Actual Behavior

Relevant Error/Panic Output Snippet

Terraform Configuration Files

Steps to Reproduce

Debug Output

Panic Output

Important Factoids

References

Would you like to implement a fix?

Community Note