Use of filemd5() function results in large memory usage

davidcallen commented 4 years ago

Running a typical upload of file to S3 results in large memory usage when file size is large. Example below :

resource "aws_s3_bucket_object" "object" {
  bucket = "your_bucket_name"
  key    = "new_object_key"
  source = "path/to/file"
  etag = "${filemd5("path/to/file")}"
}

When uploading a large file of 3.5GB the terraform process increased in memory from the typical 85MB (resident set size) up to 4GB (resident set size). The memory size remains high even when waiting at the "apply changes" prompt.

It looks like the use of filemd5() function is generating the md5 checksum by loading the entire file into memory and then not releasing that memory after finishing. I suspect the problem will occur with any filemd5() function usage (not just with resource "aws_s3_bucket_object").

The problem is increased when processing multiple files (using a fileset() function for a for loop) e.g. :

resource "aws_s3_bucket_object" "files-store-ibm-ilmt-bigfix" {
  for_each = fileset("${path.module}/files-for-uploading/", "*")
  bucket  = "your_bucket_name"
  key     = "files-uploaded/${each.value}"
  source  = each.value
  etag    = "${filemd5("${path.module}/files-for-uploading/${each.value}")}"
}

Environment : Fedora Linux 29 on a 8GB RAM laptop (runs out of RAM easily with this issue).

terraform -v
Terraform v0.12.13
provider.aws v2.45.0
provider.null v2.1.2

Expected Outcome : The md5 checksum should read the file in a buffer of sensible size. For instance, I tried running the "md5sum" utility under linux (fedora 29) on the 3.5GB file and this used 1.3MB (resident set size). That code could be a good example of how to implement the checksum.

Workaround : My current workaround for this may help others with problem. Its not ideal in that it is using a shell command but works well for now. Use alternative below :

resource "null_resource" "files-store-upload" { triggers = { always_run = "${timestamp()}" } provisioner "local-exec" { command = "aws s3 sync ${path.module}/files-for-uploading/ s3://${aws_s3_bucket.files-store.id}/uploaded-files/" } }

apparentlymart commented 3 years ago

Thanks for reporting this @davidcallen, and sorry for the delay in responding.

I've improved this in #28681 and that change should be included in the next v0.15.x release.

davidcallen commented 3 years ago

@apparentlymart - fantastic :) many thanks for this

github-actions[bot] commented 3 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

hashicorp / terraform

Use of filemd5() function results in large memory usage #23890