hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.87k stars 9.21k forks source link

[Bug]: AWS Batch 'panic: interface conversion: interface {} is nil, not map[string]interface {}' #38710

Closed justingodden closed 3 months ago

justingodden commented 3 months ago

Terraform Core Version

1.9.3

AWS Provider Version

5.61.0

Affected Resource(s)

aws_batch_job_definition

Expected Behavior

I am using the AWS Batch TF module: https://registry.terraform.io/modules/terraform-aws-modules/batch/aws/latest

But I believe the problem is with the underlying aws_batch_job_definition resource.

I expect to be able to create the resources with the batch module.

Actual Behavior

Creating the resources initially with terraform apply works just fine. But even if the code remains completely unchanged, when running terraform apply again, the provider crashes.

It looks like it's coming from the /internal/service/batch.needsJobDefUpdate function.

Not the same issue, but looks similar: #22660, #17284

Relevant Error/Panic Output Snippet

Stack trace from the terraform-provider-aws_v5.61.0_x5 plugin:

panic: interface conversion: interface {} is nil, not map[string]interface {}

goroutine 3020 [running]:
github.com/hashicorp/terraform-provider-aws/internal/service/batch.needsJobDefUpdate(0xc00324e000)
        github.com/hashicorp/terraform-provider-aws/internal/service/batch/job_definition.go:569 +0x1074
github.com/hashicorp/terraform-provider-aws/internal/service/batch.jobDefinitionCustomizeDiff({0x13f75fa0?, 0x208b4340?}, 0xc00324e000, {0xe?, 0xc0026548f0?})
        github.com/hashicorp/terraform-provider-aws/internal/service/batch/job_definition.go:464 +0x3a
github.com/hashicorp/terraform-provider-aws/internal/service/batch.ResourceJobDefinition.Sequence.func23({0x170b2cc8, 0xc003980b10}, 0xc00324e000, {0x14e571c0, 0xc0026548f0})
        github.com/hashicorp/terraform-plugin-sdk/v2@v2.34.0/helper/customdiff/compose.go:69 +0x84
github.com/hashicorp/terraform-provider-aws/internal/provider.New.(*wrappedResource).CustomizeDiff.func5({0x170b2cc8?, 0xc00386a7e0?}, 0xc00324e000, {0x14e571c0, 0xc0026548f0})
        github.com/hashicorp/terraform-provider-aws/internal/provider/intercept.go:186 +0x63
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.schemaMap.Diff(0xc000f34060, {0x170b2cc8, 0xc00386a7e0}, 0xc003902c30, 0xc004722280, 0xc00022b9f8, {0x14e571c0, 0xc0026548f0}, 0x0)
        github.com/hashicorp/terraform-plugin-sdk/v2@v2.34.0/helper/schema/schema.go:698 +0x4b4
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*Resource).SimpleDiff(0x170b30f8?, {0x170b2cc8?, 0xc00386a7e0?}, 0xc003902c30, 0xc00386a810?, {0x14e571c0?, 0xc0026548f0?})
        github.com/hashicorp/terraform-plugin-sdk/v2@v2.34.0/helper/schema/resource.go:990 +0xdb
github.com/hashicorp/terraform-plugin-sdk/v2/helper/schema.(*GRPCProviderServer).PlanResourceChange(0xc00426f068, {0x170b2cc8?, 0xc00386a6f0?}, 0xc003ca5540)
        github.com/hashicorp/terraform-plugin-sdk/v2@v2.34.0/helper/schema/grpc_provider.go:858 +0xbe8
github.com/hashicorp/terraform-plugin-mux/tf5muxserver.(*muxServer).PlanResourceChange(0xc0013f8f50, {0x170b2cc8?, 0xc00386a420?}, 0xc003ca5540)
        github.com/hashicorp/terraform-plugin-mux@v0.16.0/tf5muxserver/mux_server_PlanResourceChange.go:73 +0x2ad
github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.(*server).PlanResourceChange(0xc000619c20, {0x170b2cc8?, 0xc00384fb60?}, 0xc004562600)
        github.com/hashicorp/terraform-plugin-go@v0.23.0/tfprotov5/tf5server/server.go:825 +0x3f0
github.com/hashicorp/terraform-plugin-go/tfprotov5/internal/tfplugin5._Provider_PlanResourceChange_Handler({0x14c3fb20, 0xc000619c20}, {0x170b2cc8, 0xc00384fb60}, 0xc004562580, 0x0)
        github.com/hashicorp/terraform-plugin-go@v0.23.0/tfprotov5/internal/tfplugin5/tfplugin5_grpc.pb.go:500 +0x1a6
google.golang.org/grpc.(*Server).processUnaryRPC(0xc001296000, {0x170b2cc8, 0xc00384fad0}, {0x17116620, 0xc0002ba300}, 0xc00385eb40, 0xc002700120, 0x208231e0, 0x0)
        google.golang.org/grpc@v1.63.2/server.go:1369 +0xdf8
google.golang.org/grpc.(*Server).handleStream(0xc001296000, {0x17116620, 0xc0002ba300}, 0xc00385eb40)
        google.golang.org/grpc@v1.63.2/server.go:1780 +0xe8b
google.golang.org/grpc.(*Server).serveStreams.func2.1()
        google.golang.org/grpc@v1.63.2/server.go:1019 +0x8b
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 65
        google.golang.org/grpc@v1.63.2/server.go:1030 +0x125

Error: The terraform-provider-aws_v5.61.0_x5 plugin crashed!

This is always indicative of a bug within the plugin. It would be immensely
helpful if you could report the crash with the plugin's maintainers so that it
can be fixed. The output above should help diagnose the issue.

Terraform Configuration Files

resource "aws_security_group" "this" {
  name   = "aws_batch_compute_environment_security_group"
  vpc_id = var.vpc_id
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

module "batch" {
  source = "terraform-aws-modules/batch/aws"

  create_instance_iam_role      = true
  instance_iam_role_name        = "batch-role"
  instance_iam_role_path        = "/batch/"
  instance_iam_role_description = "IAM instance role/profile for AWS Batch ECS instance(s)"
  instance_iam_role_additional_policies = [
    "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  ]

  create_service_iam_role      = true
  service_iam_role_name        = "batch-service-role"
  service_iam_role_path        = "/batch/"
  service_iam_role_description = "IAM service role for AWS Batch"

  compute_environments = {
    ec2_gpu = {
      name_prefix = "ec2_gpu"

      compute_resources = {
        type           = "EC2"
        min_vcpus      = 4
        max_vcpus      = 4
        desired_vcpus  = 4
        instance_types = ["g4dn.xlarge"]

        ec2_configuration = {
          image_type = "ECS_AL2_NVIDIA"
        }

        security_group_ids = [aws_security_group.this.id]
        subnets            = var.private_subnets
      }
    }
  }

  job_queues = {
    batch_queue = {
      name                     = "BatchQueue"
      state                    = "ENABLED"
      priority                 = 1
      create_scheduling_policy = false
    }
  }

  job_definitions = {
    nginx = {
      name = "nginx"
      type = "container"

      container_properties = jsonencode({
        image = "nginx"

        resourceRequirements = [
          { type = "VCPU", value = "4" },
          { type = "MEMORY", value = "15000" },
          { type = "GPU", value = "1" }
        ]
      })
    }
  }
}

Steps to Reproduce

terraform apply
yes

terraform apply

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 3 months ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

justingodden commented 3 months ago

Update: Adding EITHER a retry strategy OR a timeout with the following code solved it.

attempt_duration_seconds = 60
retry_strategy = {
        attempts = 3
        evaluate_on_exit = {
          retry_error = {
            action       = "RETRY"
            on_exit_code = 1
          }
          exit_success = {
            action       = "EXIT"
            on_exit_code = 0
          }
        }
      }

I'm no Golang expert but I think the problem is doing type coercion on a nil on this line.

github-actions[bot] commented 3 months ago

[!WARNING] This issue has been closed, meaning that any additional comments are hard for our team to see. Please assume that the maintainers will not see them.

Ongoing conversations amongst community members are welcome, however, the issue will be locked after 30 days. Moving conversations to another venue, such as the AWS Provider forum, is recommended. If you have additional concerns, please open a new issue, referencing this one where needed.

github-actions[bot] commented 3 months ago

This functionality has been released in v5.62.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

github-actions[bot] commented 2 months ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.