hashicorp / terraform-provider-http

Utility provider for interacting with generic HTTP servers as part of a Terraform configuration.
https://registry.terraform.io/providers/hashicorp/http/latest
Mozilla Public License 2.0
207 stars 117 forks source link

Files are stored in the state file multiple times making the state huge #287

Open sashee opened 1 year ago

sashee commented 1 year ago

Terraform CLI and Provider Versions

Terraform v1.5.1 on linux_amd64

Terraform Configuration

provider "aws" {
}

resource "random_id" "id" {
  byte_length = 8
}

data "http" "image" {
    url = "https://unsplash.com/photos/F3rDBnQQbQU/download?force=true&w=1300"
}

resource "local_file" "image" {
  content_base64  = data.http.image.response_body_base64
  filename = "/tmp/img-${random_id.id.hex}"
}

resource "aws_s3_object" "images" {
  key    = "testimage"
    source = local_file.image.filename
  bucket = aws_s3_bucket.bucket.bucket
  etag   = local_file.image.content_md5
}

resource "aws_s3_bucket" "bucket" {
  force_destroy = "true"
}

Expected Behavior

The state file is not too big.

Actual Behavior

Downloading a ~400kB image blows up the state file:

$ ls -l
total 4528
-rw-r--r-- 1 sashee sashee     546 Jun 28 09:55 main.tf
-rw-r--r-- 1 sashee sashee 4625325 Jun 28 10:05 terraform.tfstate
-rw-r--r-- 1 sashee sashee     181 Jun 28 10:05 terraform.tfstate.backup

Looking into it I see that the file contains the entire contents of the file multiple times:

image

It would be nice if either the body would not be in the state file at all or at least it wouldn't be included multiple times.

Steps to Reproduce

  1. terraform apply

How much impact is this issue causing?

Medium

Logs

https://gist.github.com/sashee/ee2392c311a64ec0a1f5789b319528f0

Additional Information

What I'm trying to do is to dynamically download a binary file (an image) and upload it to an S3 bucket. If there would be a way to specify a filename the http data source downloads the file without exposing the contents at all would also solve this problem.

Code of Conduct

bendbennett commented 1 year ago

Hi @sashee 👋

The contents of the state file reflect the values associated with the resource or data source. This is a fundamental aspect of Terraform and represents a design feature that is used for tracking changes in state and for making the values available for use elsewhere, such as further Terraform configuration. Consequently, not storing the values in state would mean that these values were no longer accessible or available for use.

sashee commented 1 year ago

Hi @bendbennett ,

Do you see a way to at least eliminate the repetition? Maybe an attribute that says which outputs are not needed. In the example above, I use the response_body_base64 but not the response_body nor the body, so I'd be happy if those two are set to null so that they don't swell the state. Adding something like expose_base64_response_only: true or something similar could work.

Alternatively, maybe a separate data source that writes the contents into a file would cover my use case as well. In that case, the state could omit the response body altogether.

bendbennett commented 1 year ago

Hi @sashee,

Thank you for the suggestion. We will consider your proposal in light of the level of community interest that this issue receives. In terms of the repetition, the body attribute is deprecated and will be removed in the future so this will reduce the size of the state file.

cunhafinrix commented 1 year ago

We found the same issue while developing our Apps, it would be nice if the size of the file got reduced

et304383 commented 11 months ago

It would be nice if it was at least stored just once.

sfertman commented 5 months ago

In some cases it is a problem even if it's stored once. I have a ~20MB binary I download before packaging it with some other files for deployment. The resulting state is ~300 MB which is completely unmanageable. Perhaps one way to go about it is to store a hash of the body if it exceeds a certain size?