output_base64sha256 often returns the hash of an empty file instead of the hash of the contents

samjgalbraith commented 6 years ago

In short, Terraform keeps wanting to re-deploy my lambda functions at random (not even consistent between plan-cancel-plan). It turns out that in each case, the computed hash for the lambda zip file in the plan is equal to the hash of the empty file:

touch /tmp/emptyfile
openssl dgst -binary -sha256 /tmp/emptyfile | openssl base64

Output (same as the hash that's being returned by output_base64sha256 at random):

47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=

My guess is that there's a race condition whereby the hash is sometimes performed on an empty file.

Terraform Version

Terraform v0.11.8

provider.archive v1.1.0
provider.aws v1.40.0
provider.local v1.1.0
provider.null v1.0.0
provider.template v1.0.0

Affected Resource(s)

data.archive_file

Terraform Configuration Files

These are extracts from two submodules which declare the affected lambda functions. The problem seems to apply to all lambda functions in my entire stack though.

data "archive_file" "notify_model_ready_lambda_code_zip" {
  type = "zip"
  source_file = "${path.module}/lambda/notify_model_ready.py"
  output_path = "${path.module}/lambda/notify_model_ready.zip"
}

resource "aws_lambda_function" "notify_model_ready" {
  function_name = "${var.service_name_short}-${var.environment_name}-notify-model-ready"
  description = "Notifies that the model for ${var.service_name_long} is ready by publishing to an SNS topic."
  role = "${module.notify_model_ready_lambda_role.role_arn}"
  runtime = "python2.7"
  handler = "notify_model_ready.lambda_handler"
  filename = "${data.archive_file.notify_model_ready_lambda_code_zip.output_path}"
  source_code_hash = "${data.archive_file.notify_model_ready_lambda_code_zip.output_base64sha256}"
  publish = true
  tags = "${local.all_tags}"
  environment {
    variables = {
      TOPIC_ARN = "${aws_sns_topic.model_ready.arn}"
    }
  }
  lifecycle {
    ignore_changes = ["filename"]
  }
}

...

data "archive_file" "model_refresher_lambda_code_zip" {
  type = "zip"
  source_file = "${path.module}/lambda/refresh_model.py"
  output_path = "${path.module}/lambda/refresh_model.zip"
}

resource "aws_lambda_function" "refresh_service_model" {
  function_name = "${var.service_name_short}-${var.environment_name}-refresh-model"
  description = "Takes a training output model and uses it to refresh the model for the service ${var.service_name_long}."
  role = "${module.model_refresher_iam_role.role_arn}"
  runtime = "python2.7"
  handler = "refresh_model.lambda_handler"
  filename = "${data.archive_file.model_refresher_lambda_code_zip.output_path}"
  source_code_hash = "${data.archive_file.model_refresher_lambda_code_zip.output_base64sha256}"
  publish = true
  tags = "${local.all_tags}"
  timeout = 300
  environment {
    variables = {
      ENABLED = "${var.enabled ? "true" : "false"}"
      TRAINING_OUTPUT_S3_BUCKET_NAME = "${var.training_output_s3_bucket_name}"
      SERVICE_MODEL_S3_BUCKET = "${var.service_model_s3_bucket_name}"
      ECS_CLUSTER_NAME = "${var.ecs_cluster_name}"
      ECS_SERVICE_NAME = "${var.service_name_long}-service"
      SERVICE_DOCKER_REPO_URI = "${var.service_docker_repository_url}"
    }
  }
  lifecycle {
    ignore_changes = ["filename"]
  }
}

Expected Behavior

Once the lambda functions are deployed, Terraform will not try to re-deploy them unless the source code has changed.

Actual Behavior

Terraform often (but not always) wants to re-deploy some of the lambda functions because it thinks that their source code hash has changed. The lambda functions which it wants to deploy each time are seemingly random, and change even between consecutive plan operations without applying. The source code hash in every case where it wants to re-deploy is always the same hash code, even though the different lambda functions have different source code. Note that the supposedly new hash for two different lambda functions here is "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="

Plan output

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  ~ update in-place
 <= read (data resources)

Terraform will perform the following actions:

 <= module.compute_shared.data.null_data_source.aws_batch_service_role_arn
      id:                   <computed>
      has_computed_default: <computed>
      inputs.%:             "1"
      inputs.arn:           "arn:aws:iam::695716229028:role/tm-ds/shared/aws-batch-service-role-prod-20180531031834797400000005"
      outputs.%:            <computed>
      random:               <computed>

  ~ module.data_science_services.module.sh_recommendations_trainer.aws_lambda_function.notify_model_ready
      last_modified:        "2018-10-08T00:26:01.608+0000" => <computed>
      qualified_arn:        "arn:aws:lambda:us-west-2:695716229028:function:sh-recs-prod-notify-model-ready:7" => <computed>
      source_code_hash:     "j3BpqsSkUvNFP2F3kRpoqCav+IlnEO6iFVwkGifFsqE=" => "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
      version:              "7" => <computed>

  ~ module.data_science_services.module.sh_recommendations_trainer.aws_lambda_function.submit_training_job
      last_modified:        "2018-08-30T04:48:15.435+0000" => <computed>
      qualified_arn:        "arn:aws:lambda:us-west-2:695716229028:function:sh-recs-prod-submit-training-job:8" => <computed>
      source_code_hash:     "25xwKLUPygqJcUnFK2Iv85GhPkvP2bU7KkO6Yc9J0+s=" => "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU="
      version:              "8" => <computed>

Plan: 0 to add, 2 to change, 0 to destroy.

pauldraper commented 3 years ago

Do you have multiple instances of this archive_file (like multiple instance of its module)?

If so, I suspect Terraform is recreating and hashing the file in parallel, leading to inconsistent results.

Workarounds:

Pass a unique parameter and use that in the output path of the archive_file
Create the archive once for entire workspace (singleton module), and then pass that value into the module that creates the Lambda function.

pauldraper commented 3 years ago

Though archive_file is problematic for other reasons as well, like being platform specific.

https://github.com/hashicorp/terraform-provider-archive/issues/34

chrisbloe commented 2 years ago

I have also had this problem for a long time... I tried both options on #34 but still saw the problem.

In my case, the zip is being created within a module, so the file should absolutely be identical for each instance of the module, however, the plan shows different hashes for the same file being used across the modules! Perhaps because each module instance is recreating the zip and a race condition is causing the hash to be generated when the zip hasn't finished being filled?

So, I just tried to change my code from...

resource "aws_lambda_function" "my_function" {
  ...
  source_code_hash = data.archive_file.my_zip.output_base64sha256
}

...to...

resource "aws_lambda_function" "my_function" {
  ...
  source_code_hash = filebase64sha256("${path.module}/files/my_file.zip")
}

...to calculate the hash directly, and, running the plan a few times seems to consistently show the correct hash value... so I think I'll change all mine over for now... and I think it would be worth updating the documentation here to suggest the output may not be consistent.

Note: The suggested output_file_mode fix in #34 may or may not have helped, I'm not sure!

pauldraper commented 2 years ago

Perhaps because each module instance is recreating the zip and a race condition is causing the hash to be generated when the zip hasn't finished being filled?

Correct.

hashicorp / terraform-provider-archive