hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.75k stars 9.11k forks source link

[Bug]: (logging) `fatal error: out of memory` (`Error: Plugin did not respond`) #29153

Closed PratteekB closed 1 year ago

PratteekB commented 1 year ago

Previously titled: "The plugin encountered an error, and failed to respond to the plugin.(*GRPCProvider).ApplyResourceChange call."

Changing the name because at this stage I believe it is misleading since any (?) error may cause this crash.

See also #29236

Terraform Core Version

1.3.4

AWS Provider Version

4.52.0

Affected Resource(s)

unclear from the error message however the 2 plugins which errored were:

aws_rds_cluster_instance aws_lambda_function

Expected Behavior

the main.tf file was meant to execute as it has been and provision several AWS resources:

Actual Behavior

aws_rds_cluster_instance.test_rds_instance: Creating...
aws_lambda_function.prisma_data_source: Creating...
╷
│ Error: Plugin did not respond
│
│   with aws_rds_cluster_instance.test_rds_instance,
│   on main.tf line 174, in resource "aws_rds_cluster_instance" "test_rds_instance":
│  174: resource "aws_rds_cluster_instance" "test_rds_instance" {
│
│ The plugin encountered an error, and failed to respond to the plugin.(*GRPCProvider).ApplyResourceChange call.
│ The plugin logs may contain more details.
╵
╷
│ Error: Plugin did not respond
│
│   with aws_lambda_function.prisma_data_source,
│   on main.tf line 266, in resource "aws_lambda_function" "prisma_data_source":
│  266: resource "aws_lambda_function" "prisma_data_source" {
│
│ The plugin encountered an error, and failed to respond to the plugin.(*GRPCProvider).ApplyResourceChange call.
│ The plugin logs may contain more details.
╵

Relevant Error/Panic Output Snippet

Stack trace from the terraform-provider-aws_v4.52.0_x5.exe plugin:

fatal error: out of memory

goroutine 381 [running]:
runtime.throw({0xb689bab, 0xd})
        runtime/panic.go:1047 +0x4d fp=0x24091bc4 sp=0x24091bb0 pc=0x758b0d
runtime.(*mcache).allocLarge(0x13a40528, 0x57d22b5, 0x1)
        runtime/mcache.go:235 +0x208 fp=0x24091bec sp=0x24091bc4 pc=0x734ad8
runtime.mallocgc(0x57d22b5, 0x0, 0x0)
        runtime/malloc.go:1029 +0x446 fp=0x24091c34 sp=0x24091bec pc=0x72baf6
runtime.slicebytetostring(0x0, 0x77800000, 0x57d22b5)
        runtime/string.go:114 +0x58 fp=0x24091c48 sp=0x24091c34 pc=0x7707c8
fmt.Sprintf({0xb6ffc32, 0x17}, {0x24091cc8, 0x1, 0x1})
        fmt/print.go:220 +0x6f fp=0x24091c70 sp=0x24091c48 pc=0x80ba8f
log.Printf({0xb6ffc32, 0x17}, {0x24091cc8, 0x1, 0x1})
        log/log.go:354 +0x60 fp=0x24091c90 sp=0x24091c70 pc=0x823e70
github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2.debugLogger.Log({}, {0x270be508, 0x1, 0x1})
        github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2@v2.0.0-beta.22/logger.go:19 +0x1db fp=0x24091cd4 sp=0x24091c90 pc=0x74f8b1b
github.com/hashicorp/aws-sdk-go-base/v2/awsv1shim/v2.(*debugLogger).Log(0x12a8b4c4, {0x270be508, 0x1, 0x1})
        <autogenerated>:1 +0x40 fp=0x24091ce4 sp=0x24091cd4 pc=0x74fa8c0
github.com/aws/aws-sdk-go/aws/client.logRequest(0x2460f8c0)
        github.com/aws/aws-sdk-go@v1.44.186/aws/client/logger.go:84 +0x4db fp=0x24091d64 sp=0x24091ce4 pc=0xf8b05b
github.com/aws/aws-sdk-go/aws/request.(*HandlerList).Run(0x2460f9d4, 0x2460f8c0)
        github.com/aws/aws-sdk-go@v1.44.186/aws/request/handlers.go:267 +0x66 fp=0x24091dac sp=0x24091d64 pc=0xf7fb26
github.com/aws/aws-sdk-go/aws/request.(*Request).sendRequest(0x2460f8c0)
        github.com/aws/aws-sdk-go@v1.44.186/aws/request/request.go:602 +0xaf fp=0x24091dfc sp=0x24091dac pc=0xf82e2f
github.com/aws/aws-sdk-go/aws/request.(*Request).Send(0x2460f8c0)
        github.com/aws/aws-sdk-go@v1.44.186/aws/request/request.go:555 +0x13a fp=0x24091e4c sp=0x24091dfc pc=0xf8273a
github.com/aws/aws-sdk-go/service/lambda.(*Lambda).CreateFunctionWithContext(0x27059340, {0xca0c0a0, 0x26ad9da0}, 0x2704c580, {0x0, 0x0, 0x0})
        github.com/aws/aws-sdk-go@v1.44.186/service/lambda/api.go:730 +0x143 fp=0x24091e74 sp=0x24091e4c pc=0x5495283
github.com/hashicorp/terraform-provider-aws/internal/service/lambda.resourceFunctionCreate.func1()
        github.com/hashicorp/terraform-provider-aws/internal/service/lambda/function.go:502 +0x4d fp=0x24091ea0 sp=0x24091e74 pc=0x84ed79d
github.com/hashicorp/terraform-provider-aws/internal/tfresource.RetryWhen.func1()
        github.com/hashicorp/terraform-provider-aws/internal/tfresource/retry.go:29 +0x2e fp=0x24091ecc sp=0x24091ea0 pc=0x752f4de
github.com/hashicorp/terraform-provider-aws/internal/tfresource.Retry.func1()
        github.com/hashicorp/terraform-provider-aws/internal/tfresource/retry.go:204 +0x6c fp=0x24091ef0 sp=0x24091ecc pc=0x75301ac
github.com/hashicorp/terraform-plugin-sdk/v2/helper/resource.(*StateChangeConf).WaitForStateContext.func1()
        github.com/hashicorp/terraform-plugin-sdk/v2@v2.24.1/helper/resource/state.go:110 +0x255 fp=0x24091ff0 sp=0x24091ef0 pc=0x7237b35
runtime.goexit()
        runtime/asm_386.s:1326 +0x1 fp=0x24091ff4 sp=0x24091ff0 pc=0x785471
created by github.com/hashicorp/terraform-plugin-sdk/v2/helper/resource.(*StateChangeConf).WaitForStateContext
        github.com/hashicorp/terraform-plugin-sdk/v2@v2.24.1/helper/resource/state.go:83 +0x1de

goroutine 1 [select]:
runtime.gopark(0xb8ca694, 0x0, 0x9, 0x18, 0x1)
        runtime/proc.go:363 +0xff fp=0x27353acc sp=0x27353ab8 pc=0x75b96f
runtime.selectgo(0x27353d38, 0x259fdc5c, 0x0, 0x0, 0x2, 0x1)
        runtime/select.go:328 +0xbe8 fp=0x27353bc0 sp=0x27353acc pc=0x76a588
github.com/hashicorp/go-plugin.Serve(0x25dc00f0)
        github.com/hashicorp/go-plugin@v1.4.8/server.go:471 +0x1530 fp=0x27353e4c sp=0x27353bc0 pc=0xc2a4f0
github.com/hashicorp/terraform-plugin-go/tfprotov5/tf5server.Serve({0xb78e489, 0x23}, 0x254ccd20, {0x0, 0x0, 0x0})
        github.com/hashicorp/terraform-plugin-go@v0.14.3/tfprotov5/tf5server/server.go:312 +0xc3f fp=0x27353f84 sp=0x27353e4c pc=0xc851ff
main.main()
        github.com/hashicorp/terraform-provider-aws/main.go:32 +0x1f9 fp=0x27353fc4 sp=0x27353f84 pc=0x8e38ec9
runtime.main()
        runtime/proc.go:250 +0x231 fp=0x27353ff0 sp=0x27353fc4 pc=0x75b5b1
runtime.goexit()
        runtime/asm_386.s:1326 +0x1 fp=0x27353ff4 sp=0x27353ff0 pc=0x785471

goroutine 2 [force gc (idle)]:
runtime.gopark(0xb8ca668, 0x12a5ff10, 0x11, 0x14, 0x1)
        runtime/proc.go:363 +0xff fp=0x24089fdc sp=0x24089fc8 pc=0x75b96f
runtime.goparkunlock(...)
        runtime/proc.go:369
runtime.forcegchelper()
        runtime/proc.go:302 +0xc3 fp=0x24089ff0 sp=0x24089fdc pc=0x75b7f3
runtime.goexit()
        runtime/asm_386.s:1326 +0x1 fp=0x24089ff4 sp=0x24089ff0 pc=0x785471
created by runtime.init.5
        runtime/proc.go:290 +0x23

goroutine 3 [GC sweep wait]:
runtime.gopark(0xb8ca668, 0x12a60de0, 0xc, 0x14, 0x1)
        runtime/proc.go:363 +0xff fp=0x2408afd4 sp=0x2408afc0 pc=0x75b96f
runtime.goparkunlock(...)
        runtime/proc.go:369
runtime.bgsweep(0x24052200)
        runtime/mgcsweep.go:297 +0xf3 fp=0x2408afe8 sp=0x2408afd4 pc=0x745473
runtime.gcenable.func1()
        runtime/mgc.go:178 +0x27 fp=0x2408aff0 sp=0x2408afe8 pc=0x737077
runtime.goexit()
        runtime/asm_386.s:1326 +0x1 fp=0x2408aff4 sp=0x2408aff0 pc=0x785471
created by runtime.gcenable
        runtime/mgc.go:178 +0x7c

goroutine 4 [sleep]:
runtime.gopark(0xb8ca668, 0x12a64780, 0x13, 0x13, 0x2)
        runtime/proc.go:363 +0xff fp=0x2408bf5c sp=0x2408bf48 pc=0x75b96f
runtime.goparkunlock(...)
        runtime/proc.go:369
runtime.(*scavengerState).sleep(0x12a64780, 0x4131170000000000)
        runtime/mgcscavenge.go:468 +0x167 fp=0x2408bfcc sp=0x2408bf5c pc=0x743147
runtime.bgscavenge(0x24052200)
        runtime/mgcscavenge.go:626 +0xa5 fp=0x2408bfe8 sp=0x2408bfcc pc=0x743605
runtime.gcenable.func2()
        runtime/mgc.go:179 +0x27 fp=0x2408bff0 sp=0x2408bfe8 pc=0x737037
runtime.goexit()
        runtime/asm_386.s:1326 +0x1 fp=0x2408bff4 sp=0x2408bff0 pc=0x785471
created by runtime.gcenable
        runtime/mgc.go:179 +0xc1

goroutine 5 [finalizer wait]:

Error: The terraform-provider-aws_v4.52.0_x5.exe plugin crashed!

This is always indicative of a bug within the plugin. It would be immensely
helpful if you could report the crash with the plugin's maintainers so that it
can be fixed. The output above should help diagnose the issue.

Terraform Configuration Files

terraform {
  required_version = ">=1.2.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 4.49"
    }
  }
}

provider "aws" {
  region = "eu-west-2"
}

resource "aws_vpc" "vpc" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true

  tags = {
    name = "rds_vpc"
  }
}

data "aws_availability_zones" "available" {
  state = "available"
}

resource "aws_subnet" "rds_private_subnet" {
  count             = 2
  vpc_id            = aws_vpc.vpc.id
  cidr_block        = "10.0.${count.index}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]

  tags = {
    name = "rds_private_subnet_${count.index}"
  }
}

resource "aws_subnet" "lambdas_public_subnet_1" {
  vpc_id     = aws_vpc.vpc.id
  cidr_block = "10.0.101.0/24"

  tags = {
    name = "lambdas_public_subnet_1"
  }
}

resource "aws_db_subnet_group" "rds_subnet_group" {
  subnet_ids = [for subnet in aws_subnet.rds_private_subnet : subnet.id]
}

resource "aws_security_group" "lambdas_sg" {
  name   = "lambdas_security_group"
  vpc_id = aws_vpc.vpc.id

  ingress {
    description = "allow all traffic, hopefully from Appsync"
    protocol    = -1
    from_port   = 0
    to_port     = 0
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    description = "allow all outbound"
    protocol    = -1
    from_port   = 0
    to_port     = 0
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    name = "lambdas_sg"
  }
}

resource "aws_security_group" "rds_sg" {
  name   = "rds_security_group"
  vpc_id = aws_vpc.vpc.id

  ingress {
    description     = "allow prisma traffic, only from lambdas"
    protocol        = "tcp"
    from_port       = 5432
    to_port         = 5432
    security_groups = [aws_security_group.lambdas_sg.id]
  }

  tags = {
    name = "rds_sg"
  }
}

resource "random_password" "rds_master_password" {
  length           = 40
  special          = true
  min_special      = 5
  override_special = "!#$%^&*()-_=+[]{}<>:?"
}

resource "random_id" "id" {
  byte_length = 8
}

resource "random_string" "random" {
  length           = 16
  special          = true
  override_special = "/@£$"
}

resource "aws_cognito_user_pool" "test_pool" {
  name = "test_pool"
}

resource "aws_cognito_user" "test_debugging_user" {
  user_pool_id = aws_cognito_user_pool.test_pool.id
  username     = XXX+dev@XXX.io"

  attributes = {
    email          = "XXXX+dev@XXXX.io"
    email_verified = true
  }
}

resource "aws_cognito_user_pool_client" "userpool_client" {
  name                                 = "cognito_userpool_client"
  user_pool_id                         = aws_cognito_user_pool.test_pool.id
  callback_urls                        = ["https://example.com"]
  allowed_oauth_flows_user_pool_client = true
  allowed_oauth_flows                  = ["code", "implicit"]
  allowed_oauth_scopes                 = ["email", "openid"]
  supported_identity_providers         = ["COGNITO"]
}

resource "aws_cognito_user_pool_domain" "user_pool_domain" {
  user_pool_id = aws_cognito_user_pool_client.userpool_client.user_pool_id
  domain       = "XX-tech-appsync"
}

resource "aws_rds_cluster" "test_rds" {
  cluster_identifier     = "aurora-cluster-demo"
  engine_mode            = "provisioned"
  engine                 = "aurora-postgresql"
  database_name          = "test_rds_serverless"
  master_username        = "test_username"
  master_password        = random_password.rds_master_password.result
  db_subnet_group_name   = aws_db_subnet_group.rds_subnet_group.name
  vpc_security_group_ids = [aws_security_group.rds_sg.id]
  skip_final_snapshot    = true

  serverlessv2_scaling_configuration {
    max_capacity = 1.0
    min_capacity = 0.5
  }
}

resource "aws_rds_cluster_instance" "test_rds_instance" {
  cluster_identifier = aws_rds_cluster.test_rds.id
  instance_class     = "db.serverless"
  engine             = aws_rds_cluster.test_rds.engine
  engine_version     = aws_rds_cluster.test_rds.engine_version
}

resource "null_resource" "build_prisma" {
  triggers = {
    lambdas = filebase64("${path.module}/prisma_appsync/schema.prisma")
  }

  provisioner "local-exec" {
    command = "cd ./prisma_appsync && npm run build"
  }
}

data "local_file" "path_to_schema" {
  filename = "${path.module}/prisma_appsync/generated/prisma-appsync/schema.gql"

  depends_on = [
    null_resource.build_prisma
  ]
}

resource "aws_iam_role" "iam_for_lambda" {
  name               = "iam_for_lambda"
  assume_role_policy = file("${path.module}/lambdas/iam_role.json")
}

resource "aws_iam_policy" "iam_policy_for_lambda" {
  name   = "aws_iam_policy_for_terraform_aws_lambda_role"
  path   = "/"
  policy = file("${path.module}/lambdas/iam_policy.json")
}

resource "aws_iam_role_policy_attachment" "attach_iam_policy_to_iam_role" {
  role       = aws_iam_role.iam_for_lambda.name
  policy_arn = aws_iam_policy.iam_policy_for_lambda.arn
}

data "archive_file" "zip_prisma" {
  type        = "zip"
  source_dir  = "${path.module}/prisma_appsync/dist/"
  output_path = "${path.module}/prisma_appsync/seeder.zip"

  depends_on = [
    null_resource.build_prisma
  ]
}

resource "aws_lambda_function" "prisma_data_source" {
  filename         = "${path.module}/prisma_appsync/seeder.zip"
  source_code_hash = data.archive_file.zip_prisma.output_sha
  function_name    = "prisma-appsync_fn"
  role             = aws_iam_role.iam_for_lambda.arn
  handler          = "index.main"
  runtime          = "nodejs18.x"
  timeout          = 40
  memory_size      = 180
  depends_on = [
    aws_iam_role_policy_attachment.attach_iam_policy_to_iam_role,
    data.archive_file.zip_prisma
  ]

  vpc_config {
    security_group_ids = [aws_security_group.lambdas_sg.id]
    subnet_ids = [
      aws_subnet.lambdas_public_subnet_1.id
    ]
  }

  environment {
    variables = {
      # secretArn   = aws_secretsmanager_secret.test_rds_password.arn
      dbHost       = aws_rds_cluster.test_rds.endpoint
      dbName       = aws_rds_cluster.test_rds.database_name
      dbUsername   = aws_rds_cluster.test_rds.master_username
      dbPassword   = aws_rds_cluster.test_rds.master_password
      DATABASE_URL = ""
    }
  }
}

resource "aws_cloudwatch_log_group" "loggroup" {
  name              = "/aws/appsync/apis/${aws_appsync_graphql_api.graphql_api.id}"
  retention_in_days = 14
}

data "aws_iam_policy_document" "iam_appsync_role_document" {
  statement {
    actions = ["sts:AssumeRole"]
    principals {
      type        = "Service"
      identifiers = ["appsync.amazonaws.com"]
    }
  }
}

resource "aws_iam_role" "iam_appsync_role" {
  name               = "iam_appsync_role"
  assume_role_policy = data.aws_iam_policy_document.iam_appsync_role_document.json
}

data "aws_iam_policy_document" "iam_invoke_lambda_policy_document" {
  statement {
    actions   = ["lambda:InvokeFunction"]
    resources = ["*"]
  }
}

resource "aws_iam_policy" "iam_invoke_lambda_policy" {
  name   = "iam_invoke_lambda_policy"
  policy = data.aws_iam_policy_document.iam_invoke_lambda_policy_document.json
}

resource "aws_iam_role_policy_attachment" "appsync_invoke_lambda" {
  role       = aws_iam_role.iam_appsync_role.name
  policy_arn = aws_iam_policy.iam_invoke_lambda_policy.arn
}

resource "aws_appsync_graphql_api" "graphql_api" {
  authentication_type = "AMAZON_COGNITO_USER_POOLS"
  name                = "test_graphql_api"
  schema              = file(data.local_file.path_to_schema.filename)

  depends_on = [
    null_resource.build_prisma
  ]

  user_pool_config {
    aws_region     = "eu-west-2"
    default_action = "DENY"
    user_pool_id   = aws_cognito_user_pool.test_pool.id
  }

  log_config {
    cloudwatch_logs_role_arn = aws_iam_role.iam_appsync_role.arn
    field_log_level          = "ALL"
  }
}

resource "aws_appsync_datasource" "lambda_datasource" {
  api_id           = aws_appsync_graphql_api.graphql_api.id
  name             = "lambda_datasource"
  service_role_arn = aws_iam_role.iam_appsync_role.arn
  type             = "AWS_LAMBDA"

  lambda_config {
    function_arn = aws_lambda_function.prisma_data_source.arn
  }
}

output "log" {
  value = yamldecode(file("${path.module}/prisma_appsync/generated/prisma-appsync/resolvers.yaml"))
}

Steps to Reproduce

run 'terraform apply -auto-approve'

Debug Output

No response

Panic Output

No response

Important Factoids

No response

References

Would you like to implement a fix?

None

github-actions[bot] commented 1 year ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

ewbankkit commented 1 year ago

Off the top of my head, the provider is running out of memory because it is attempting to log the binary contents of the Lambda function's ZIP file.

YakDriver commented 1 year ago

I suspect that ~any~* error the AWS Provider returns will cause Terraform/SDK to crash in a similar way. This only occurs as of v4.53.0. The only change in v4.53.0 was #29223 (logging). (* Random golang errors do not seem to cause this. Perhaps only AWS errors?)

I can reproduce the issue 100% in situations where the plugin returns an error. Whether you get fatal error: out of memory or fatal error: concurrent map writes seems to depend largely on whether the goroutines trip up first or memory runs out first. These may not be useful clues.

In case it is helpful, here is a complete (~4100 line) trace: https://gist.github.com/YakDriver/66b508dab3342da50b2da75f4b17a73a

gdavison commented 1 year ago

@PratteekB, how large is the Zip file that you're sending? I'd like to test with something comparable

gdavison commented 1 year ago

Hi @PratteekB, the latest version of the provider now truncates the log entries. This may address the out of memory error. If you are still seeing this error, please open a new issue and let us know the size of the Zip file that you're sending.

github-actions[bot] commented 1 year ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.