hashicorp / terraform-provider-aws

The AWS Provider enables Terraform to manage AWS resources.
https://registry.terraform.io/providers/hashicorp/aws
Mozilla Public License 2.0
9.75k stars 9.1k forks source link

[Bug]: Lambda function resource produces inconsistent final plan when adding environment variables #38755

Open nikki-t opened 1 month ago

nikki-t commented 1 month ago

Terraform Core Version

1.7.3

AWS Provider Version

4.67.0

Affected Resource(s)

aws_lambda_function

Expected Behavior

1) Run terraform plan and apply. 2) Completes successfully and Lambda function environment variables are set using the aws_kms_ciphertext.resource_name.ciphertext_blob value.

Actual Behavior

1) Run terraform plan and apply. 2) An error is encountered with an inconsistent final plan. 3) Run terraform plan and apply again. 4) Completes successfully with Lambda function environment variables set to aws_kms_ciphertext.resource_name.ciphertext_blob value.

Relevant Error/Panic Output Snippet

│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for aws_lambda_function.hydrocron_lambda_authorizer
│ to include new values learned so far during apply, provider
│ "registry.terraform.io/hashicorp/aws" produced an invalid new value for
│ .environment: block count changed from 0 to 1.
│ 
│ This is a bug in the provider, which should be reported in the provider's
│ own issue tracker.

Terraform Configuration Files

Terraform configuaration files: https://github.com/podaac/hydrocron/tree/feature/issue-205/terraform

I have set up the Lambda authorizer to include environment variables which are created using a KMS key that is used to encrypt the API key values. This mimics what gets enabled when following the documentation and enabling encryption in the console. I don't think there is a direct way to do this in Terraform.

Steps to Reproduce

1) Define a Lambda function environment variable with a value set to an aws_kms_ciphertext resource's ciphertext_blob attribute. 2) Run terraform plan and apply. 3) May encounter

Debug Output

I cannot share full debug logs as I think it will contain sensitive information but here are the logs with the error: https://github.com/podaac/hydrocron/actions/runs/10290555214/attempts/1.

Here is a sample of the debug logs that I hope is relevant:

2024-08-07T15:56:42.457-0400 [TRACE] provider.terraform-provider-aws_v4.67.0_x5: Received request: tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=02c184e4-cca3-1b19-f147-4f95a6919fed @module=sdk.proto tf_proto_version=5.3 @caller=github.com/hashicorp/terraform-plugin-go@v0.15.0/tfprotov5/tf5server/server.go:770 tf_resource_type=aws_lambda_function tf_rpc=PlanResourceChange timestamp=2024-08-07T15:56:42.457-0400
2024-08-07T15:56:42.457-0400 [TRACE] provider.terraform-provider-aws_v4.67.0_x5: Sending request downstream: tf_proto_version=5.3 tf_req_id=02c184e4-cca3-1b19-f147-4f95a6919fed tf_resource_type=aws_lambda_function @module=sdk.proto tf_provider_addr=registry.terraform.io/hashicorp/aws tf_rpc=PlanResourceChange @caller=github.com/hashicorp/terraform-plugin-go@v0.15.0/tfprotov5/internal/tf5serverlogging/downstream_request.go:17 timestamp=2024-08-07T15:56:42.457-0400
2024-08-07T15:56:42.457-0400 [TRACE] provider.terraform-provider-aws_v4.67.0_x5: calling downstream server: @caller=github.com/hashicorp/terraform-plugin-mux@v0.10.0/internal/logging/mux.go:16 @module=sdk.mux tf_mux_provider="*schema.GRPCProviderServer" tf_rpc=PlanResourceChange timestamp=2024-08-07T15:56:42.457-0400
2024-08-07T15:56:42.460-0400 [TRACE] provider.terraform-provider-aws_v4.67.0_x5: Calling downstream: tf_req_id=02c184e4-cca3-1b19-f147-4f95a6919fed tf_resource_type=aws_lambda_function @caller=github.com/hashicorp/terraform-plugin-sdk/v2@v2.26.1/helper/schema/schema.go:698 tf_mux_provider="*schema.GRPCProviderServer" tf_provider_addr=registry.terraform.io/hashicorp/aws @module=sdk.helper_schema tf_rpc=PlanResourceChange timestamp=2024-08-07T15:56:42.460-0400
2024-08-07T15:56:42.460-0400 [TRACE] provider.terraform-provider-aws_v4.67.0_x5: Called downstream: @module=sdk.helper_schema tf_mux_provider="*schema.GRPCProviderServer" tf_provider_addr=registry.terraform.io/hashicorp/aws @caller=github.com/hashicorp/terraform-plugin-sdk/v2@v2.26.1/helper/schema/schema.go:700 tf_rpc=PlanResourceChange tf_req_id=02c184e4-cca3-1b19-f147-4f95a6919fed tf_resource_type=aws_lambda_function timestamp=2024-08-07T15:56:42.460-0400
2024-08-07T15:56:42.461-0400 [TRACE] provider.terraform-provider-aws_v4.67.0_x5: Received downstream response: tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_duration_ms=4 tf_req_id=02c184e4-cca3-1b19-f147-4f95a6919fed tf_rpc=PlanResourceChange diagnostic_error_count=0 diagnostic_warning_count=0 tf_resource_type=aws_lambda_function @module=sdk.proto tf_proto_version=5.3 @caller=github.com/hashicorp/terraform-plugin-go@v0.15.0/tfprotov5/internal/tf5serverlogging/downstream_request.go:37 timestamp=2024-08-07T15:56:42.461-0400
2024-08-07T15:56:42.461-0400 [TRACE] provider.terraform-provider-aws_v4.67.0_x5: Served request: tf_proto_version=5.3 tf_req_id=02c184e4-cca3-1b19-f147-4f95a6919fed tf_resource_type=aws_lambda_function tf_rpc=PlanResourceChange @caller=github.com/hashicorp/terraform-plugin-go@v0.15.0/tfprotov5/tf5server/server.go:796 @module=sdk.proto tf_provider_addr=registry.terraform.io/hashicorp/aws timestamp=2024-08-07T15:56:42.461-0400
2024-08-07T15:56:42.462-0400 [WARN]  Provider "registry.terraform.io/hashicorp/aws" produced an invalid plan for aws_lambda_function.hydrocron_lambda_authorizer, but we are tolerating it because it is using the legacy plugin SDK.
    The following problems may be the cause of any confusing errors from downstream operations:
      - .memory_size: planned value cty.NumberIntVal(128) for a non-computed attribute
      - .reserved_concurrent_executions: planned value cty.NumberIntVal(-1) for a non-computed attribute
      - .skip_destroy: planned value cty.False for a non-computed attribute
      - .runtime: planned value cty.StringVal("") for a non-computed attribute
      - .kms_key_arn: planned value cty.StringVal("") for a non-computed attribute
      - .description: planned value cty.StringVal("") for a non-computed attribute
      - .handler: planned value cty.StringVal("") for a non-computed attribute
      - .layers: planned value cty.ListValEmpty(cty.String) for a non-computed attribute
      - .publish: planned value cty.False for a non-computed attribute
      - .code_signing_config_arn: planned value cty.StringVal("") for a non-computed attribute
      - .image_config[0].entry_point: planned value cty.ListValEmpty(cty.String) for a non-computed attribute
      - .image_config[0].working_directory: planned value cty.StringVal("") for a non-computed attribute
      - .ephemeral_storage: block count in plan (1) disagrees with count in config (0)
      - .tracing_config: block count in plan (1) disagrees with count in config (0)
2024-08-07T15:56:42.462-0400 [TRACE] checkPlannedChange: Verifying that actual change (action Update) matches planned change (action Update)
2024-08-07T15:56:42.462-0400 [ERROR] vertex "aws_lambda_function.hydrocron_lambda_authorizer" error: Provider produced inconsistent final plan
2024-08-07T15:56:42.462-0400 [TRACE] vertex "aws_lambda_function.hydrocron_lambda_authorizer": visit complete, with errors

Panic Output

No response

Important Factoids

No response

References

No response

Would you like to implement a fix?

None

github-actions[bot] commented 1 month ago

Community Note

Voting for Prioritization

Volunteering to Work on This Issue

justinretzolk commented 1 month ago

Hey @nikki-t 👋 Thank you for taking the time to raise this! I noticed you're running a version of the provider that's a bit over a year old now. Since then, there've been a number of updates, including migrating the Lambda resources to use the newer AWS Go SDK v2. Are you able to test on a more recent version of the provider to see if the issue has already been resolved?

nikki-t commented 1 month ago

Hi @justinretzolk - Thank you for the response! I have upgraded both the AWS provider and the version of Terraform and gotten the same result/error message.

nikki-t commented 1 month ago

Digging a little deeper, I seem to be encountering an issue with the aws_lambda_function resource and setting environment variables.

For example, if I have a lambda function defined like this:

resource "aws_lambda_function" "hydrocron_lambda_authorizer" {
  package_type = "Image"
  image_uri    = "${aws_ecr_repository.lambda-image-repo.repository_url}:${data.aws_ecr_image.lambda_image.image_tag}"
  image_config {
    command = ["hydrocron.api.controllers.authorizer.authorization_handler"]
  }
  function_name = local.authorizer_function_name
  role          = aws_iam_role.hydrocron-lambda-authorizer-role.arn
  timeout       = 30
  vpc_config {
    subnet_ids         = data.aws_subnets.private_application_subnets.ids
    security_group_ids = data.aws_security_groups.vpc_default_sg.ids
  }
  tags = var.default_tags
  publish = true
  environment {
    variables = {
      HACK_TO_FORCE_LAMBDA_PUBLISH = timestamp()
    }
  }
}

This produces the following error message,

│ Error: Provider produced inconsistent final plan
│ 
│ When expanding the plan for aws_lambda_function.hydrocron_lambda_authorizer to include new values learned so far during apply, provider "registry.terraform.io/hashicorp/aws" produced an invalid new value for .environment:
│ block count changed from 0 to 1.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

It really seems the issue happens when defining an environment variable from a resource that requires creation as I have tried pointing the environment variable to a null_resource that gets updated with another resource value.

When I run terraform plan the environment variable addition does not show up but when I run terraform apply it seems to be trying to create and set the environment variable, maybe?

I have also updated Terraform to 1.9.3 and the AWS provider to 5.62.0.

justinretzolk commented 1 month ago

Thanks for the additional information here, @nikki-t! Unfortunately, I'm not able to dig into this too much more at the moment, so I'd like to leave this open and labeled as a bug for someone from the team/community to take a deeper look as well.

That said, I found a bit of information while looking that I'm hoping will help you find a workaround in the meantime.

It really seems the issue happens when defining an environment variable from a resource that requires creation as I have tried pointing the environment variable to a null_resource that gets updated with another resource value.

This comment piqued my interest, as I'd also noticed a note in the variables argument reference:

If provided at least one key must be present.

While I realize that says "key" and not "value", I also noticed that in the example you provided, there's only one variable, which (at least in the example provided here) was using timestamp(). The timestamp() function has historically been a bit finicky, with the following note in the reference material:

Due to the constantly changing return value, the result of this function cannot be predicted during Terraform's planning phase, and so the timestamp will be taken only once the plan is being applied.

I'm curious if this is causing the strange behavior. There's an alternate function that helps deal with some of those issues -- plantimestamp(). It may be worth testing that or adding a second, more static environment variable, to see if that clears the issue up for you.

You may also be able to discern some of this from debug logging without needing to test those suggestions -- particularly by looking for what the plan shows for the environment block. I hope that helps until we're able to look into it a bit more thoroughly!