hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.75k stars 9.11k forks source link

Cannot find Target Groups by ARN pulled from 0.12 state file #9034

Open npc-adrian opened 5 years ago

npc-adrian commented 5 years ago

While upgrading an AWS load Balancer from terraform 0.11 to terraform 0.12, I am getting an error when I try to run terraform plan against 0.12 for the first time.

Error: Error retrieving Target Group: ValidationError: 'arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-blog/a3f1958ff217c688' is not a valid target group ARN
    status code: 400, request id: 6f8ab670-910b-11e9-83dd-efb1ea6a619f

Everything appears to be in order - the resources exist, the state files are not corrupted etc. Details below.

I originally raised this issue against Terraform core but the team asked me to move it to here instead.

Community Note

Terraform Version

Terraform v0.12.2
+ provider.aws v2.15.0

Affected Resource(s)

2 ALB Target groups from the upgraded state cannot be found using the ARNs pulled from the newly upgraded state even though the ARNs are valid.

Terraform Configuration Files

resource "aws_lb_target_group" "arthr" {
  name     = "${var.stage}-${var.environment}-arthr"
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc_id

  health_check {
    healthy_threshold   = 5
    unhealthy_threshold = 2
    path                = "/check"
    matcher             = "200"
  }
}

resource "aws_lb_target_group" "blog" {
  name     = "${var.stage}-${var.environment}-blog"
  port     = 80
  protocol = "HTTP"
  vpc_id   = var.vpc_id
}

Debug Output

$ tf12 plan > /dev/null

Error: Error retrieving Target Group: ValidationError: 'arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-blog/a3f1958ff217c688' is not a valid target group ARN
    status code: 400, request id: 6f8ab670-910b-11e9-83dd-efb1ea6a619f

Error: Error retrieving Target Group: ValidationError: 'arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-arthr/55ccc73b8681221d' is not a valid target group ARN
    status code: 400, request id: 6f96eb40-910b-11e9-a808-67e629eff967

Full debug trace here

Panic Output

N/A

Expected Behavior

The plan should succeed.

Actual Behavior

The plan failed.

Steps to Reproduce

Important Factoids

I followed the instructions to upgrade from 0.11 to 0.12. It has worked for other terraform projects so I'm pretty sure I did everything right.

The plan fails to find 2 AWS Load Balancer target groups but they are definitely correct when I look in the console. Here are the entries from the state file...

$ tf12 state show module.sunset_environment.aws_lb_target_group.blog | grep arn
    arn                  = "arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-blog/a3f1958ff217c688"
    arn_suffix           = "targetgroup/dev-dev2-blog/a3f1958ff217c688"
    id                   = "arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-blog/a3f1958ff217c688"

and

$ tf12 state show module.sunset_environment.aws_lb_target_group.arthr
    arn                  = "arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-arthr/55ccc73b8681221d"
    arn_suffix           = "targetgroup/dev-dev2-arthr/55ccc73b8681221d"
    id                   = "arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-arthr/55ccc73b8681221d"

I confirmed I could look them up using the AWS CLI. Here's one example.

$ aws elbv2 describe-target-groups --target-group-arns 'arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-blog/a3f1958ff217c688'
  {
      "TargetGroups": [
          {
              "TargetGroupArn": "arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-blog/a3f1958ff217c688",
              "TargetGroupName": "dev-dev2-blog",
              "Protocol": "HTTP",
              "Port": 80,
              "VpcId": "vpc-xxxxxxxxxxxxxxxxx",
              "HealthCheckProtocol": "HTTP",
              "HealthCheckPort": "traffic-port",
              "HealthCheckEnabled": true,
              "HealthCheckIntervalSeconds": 30,
              "HealthCheckTimeoutSeconds": 5,
              "HealthyThresholdCount": 5,
              "UnhealthyThresholdCount": 2,
              "HealthCheckPath": "/",
              "Matcher": {
                  "HttpCode": "200"
              },
              "LoadBalancerArns": [
                  "arn:aws:elasticloadbalancing:eu-west-1:000000000000:loadbalancer/app/dev-dev2/9e3252cff6a19475"
              ],
              "TargetType": "instance"
          }
      ]
  }

I also checked that when I can re-initialize and run terraform plan against 0.11 sucessfully and that it does not find any changes to be made. Here are the original entries from the 0.11 state.

$ tf state show module.sunset_environment.aws_lb_target_group.arthr | grep arn
id                                 = arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-arthr/55ccc73b8681221d
arn                                = arn:aws:elasticloadbalancing:eu-west-1:000000000000:targetgroup/dev-dev2-arthr/55ccc73b8681221d
arn_suffix                         = targetgroup/dev-dev2-arthr/55ccc73b8681221d

References

I originally raised this issue against Terraform core but the team asked me to move it to here instead.

bflad commented 5 years ago

Hi @npc-adrian đź‘‹ Thanks for submitting this and sorry you are running into trouble here.

At first glance, I'm curious if you are utilizing provider configurations across multiple regions for your overall configuration and if maybe the selected provider for this particular resource inside a module changed during the switch to Terraform 0.12 (unrelated to any configuration changes). Would you be able to provide a brief overview of your overall configuration outlining places where provider "aws" is configured, if any environment variables such as AWS_DEFAULT_REGION are used, and specifically your module invocations? Thanks so much.

For example, I'm able to coerce TargetGroupNotFound errors from the AWS CLI if the ARN matches the configured region and ValidationError if the ARN does not match the configured region:

# 123456789012 is used as a placeholder account ID
$ aws --region eu-west-1 elbv2 describe-target-groups --target-group-arn arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/dev-dev2-blog/a3f1958ff217c688

An error occurred (TargetGroupNotFound) when calling the DescribeTargetGroups operation: One or more target groups not found

$ aws --region us-west-2 elbv2 describe-target-groups --target-group-arn arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/dev-dev2-blog/a3f1958ff217c688

An error occurred (ValidationError) when calling the DescribeTargetGroups operation: 'arn:aws:elasticloadbalancing:eu-west-1:123456789012:targetgroup/dev-dev2-blog/a3f1958ff217c688' is not a valid target group ARN
npc-adrian commented 5 years ago

Hi @bflad. Thanks for getting back to me. All good questions.

We supply our AWS settings through environment variables as follows and the region is set to eu-west-1 as you can see.

$ env | grep AWS
AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxx
AWS_SECRET_ACCESS_KEY=yyyyyyyyyyyyyyyyyyyyyyyy
AWS_DEFAULT_REGION=eu-west-1
AWS_PROFILE=zzzzz

I have also confirmed I can describe the target group using the command you supplied...

$ aws --region eu-west-1 elbv2 describe-target-groups --target-group-arn arn:aws:elasticloadbalancing:eu-west-1: <my account>:targetgroup/dev-dev2-blog/a3f1958ff217c688
{
    "TargetGroups": [
        {
            "TargetGroupArn": "arn:aws:elasticloadbalancing:eu-west-1:653789070130:targetgroup/dev-dev2-blog/a3f1958ff217c688",
            "TargetGroupName": "dev-dev2-blog",
            "Protocol": "HTTP",
            "Port": 80,
            "VpcId": "vpc-013c64c2fbb574ac2",
            "HealthCheckProtocol": "HTTP",
            "HealthCheckPort": "traffic-port",
            "HealthCheckEnabled": true,
            "HealthCheckIntervalSeconds": 30,
            "HealthCheckTimeoutSeconds": 5,
            "HealthyThresholdCount": 5,
            "UnhealthyThresholdCount": 2,
            "HealthCheckPath": "/",
            "Matcher": {
                "HttpCode": "200"
            },
            "LoadBalancerArns": [
                "arn:aws:elasticloadbalancing:eu-west-1:653789070130:loadbalancer/app/dev-dev2/9e3252cff6a19475"
            ],
            "TargetType": "instance"
        }
    ]
}

The code is in a module that needs a cloudfront cert from us-east-1

resource "aws_acm_certificate" "cloudfront" {
  provider          = aws.us_east_1
  domain_name       = local.site_cdn_fqdn
  validation_method = "DNS"
}

So we declared 2 providers in the module...

provider "aws" {
  version = "~> 2.15"
}

provider "aws" {
  alias = "us_east_1"
}

The former isn't strictly needed but I found I had to init the module in order to run the 0.12 upgrade utility. I tried taking it out but unfortunately the problem persists.

We define the providers in the calling HCL...

provider "aws" {
  # Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  # in your environment to configure user
  region  = "eu-west-1"
  version = "~> 2.15.0"
}

provider "aws" {
  alias   = "us_east_1"
  region  = "us-east-1"
  version = "~> 2.15.0"
}
npc-adrian commented 5 years ago

Hi. Has there been any progress on this? It's blocking my 0.12 upgrade. Thanks

GNSunny commented 4 years ago

hi even I am receiving the same error when upgrading from 0.11.3 to 0.12.29

Error: Error retrieving Target Group: ValidationError: 'arn:aws:elasticloadbalancing:eu-west-1:01234567890:targetgroup/dev1-web-to-http/5e7a9da3057bc30c' is not a valid target group ARN
        status code: 400, request id: 62d7fc52-0636-4c79-a43e-ab2444e8a29d

/// etc

looking forward to having any solution for this

nateww commented 3 years ago

I am seeing this randomly as well, but it's not related to pulling data from the state file (unless the state file was updated on the fly while building infrastructure).

I'm using the latest version of terraform (0.14.6), and I'm creating the target group. Even more concerning is that somehow it gets the ARN from the newly created target group, but somehow fails to consider it as valid.

module.websocket.aws_lb_target_group.app[0]: Creating...
...
[ ~12 minutes elapsed while I'm spinning up more infrastructure, but NO other messages regarding the TG is output. ]
Error: error updating LB Target Group (arn:aws:elasticloadbalancing:us-west-2:1233456789:targetgroup/websocket-nate-test/adb48fa8a765cbd7) tags: error tagging resource (arn:aws:elasticloadbalancing:us-west-2:123456789:targetgroup/websocket-nate-test/adb48fa8a765cbd7): TargetGroupNotFound: Target groups 'arn:aws:elasticloadbalancing:us-west-2:123456789:targetgroup/websocket-nate-test/adb48fa8a765cbd7' not found
    status code: 400, request id: 7a37c60b-1b74-40ca-84b4-3c71e9e81e3e

Another TG was created a few lines after I started create the above TG, and this is the relevant logs lines from that run.

module.turforsurf.aws_lb_target_group.app[0]: Creating...
module.turforsurf.aws_lb_target_group.app[0]: Creation complete after 0s [id=arn:aws:elasticloadbalancing:us-west-2:123456789:targetgroup/turforsurf-nate-test/f4f0c91946300de9]

It sure smells like some weird race condition in the AWS provider code.

nateww commented 3 years ago

My issue may be more closely related to https://github.com/hashicorp/terraform-provider-aws/issues/16860