hashicorp / terraform-provider-http

Utility provider for interacting with generic HTTP servers as part of a Terraform configuration.
https://registry.terraform.io/providers/hashicorp/http/latest
Mozilla Public License 2.0
210 stars 118 forks source link

Add ability to store/cache result in state to avoid "will be read during apply" #208

Open idelsink opened 1 year ago

idelsink commented 1 year ago

Terraform CLI and Provider Versions

Terraform v1.3.6 on linux_amd64

Use Cases or Problem Statement

The data source is always doing an HTTP request even if I know the source will not change unless I change the URL.

For example, I get a remote JSON file with some configuration, but this file is static and will not change. In the following example I'm using a file from a git repository, but this can also be some other static file that will not change. For example some static api like https://grafana.com/api/dashboards/9614/revisions/1/download.

locals {
  nginx_ingress_version = "4.4.0"
}

# Fetching static JSON where the response_body will only change when  I change the URL 
data "http" "grafana_nginx_ingress_controller" {
  request_headers = {
    Accept = "application/json"
  }
  url = "https://raw.githubusercontent.com/kubernetes/ingress-nginx/helm-chart-${local.nginx_ingress_version}/deploy/grafana/dashboards/nginx.json"
  lifecycle {
    postcondition {
      condition     = contains([200], self.status_code)
      error_message = "Error fetching Grafana Dashboard JSON file. Got HTTP Status code ${self.status_code}: ${self.response_body}"
    }
  }
}

The result from the above example is always:

Terraform will perform the following actions:                                      

  # data.http.grafana_nginx_ingress_controller will be read during apply                                                                         
  # (depends on a resource or a module with changes pending)                       
 <= data "http" "grafana_nginx_ingress_controller" {                               
      + body             = (known after apply)                                     
      + id               = (known after apply)                                     
      + request_headers  = {                                                       
          + "Accept" = "application/json"                                          
        }                                                                          
      + response_body    = (known after apply)                                     
      + response_headers = (known after apply)                                     
      + status_code      = (known after apply)                                     
      + url              = "https://raw.githubusercontent.com/kubernetes/ingress-nginx/helm-chart-4.4.0/deploy/grafana/dashboards/nginx.json"                         
    }

Even if the source data does not change and I've already applied that change. (because it is part of the "apply" phase. According to this (https://github.com/hashicorp/terraform/issues/25805#issuecomment-672071546) comment the data source should cache the result if the input does not change.

I understand why that is not happening here, but for some use cases it might be useful. (for example when using the above use case with a static API)

The reason why this poses an issue is when you're reviewing the plan and it gets filled with these kind of messages. It makes it harder to validate the plan and to see if anything that should not happen, happens. It also poses an impact on the resources that uses this data result, as commented in the following issue: https://github.com/hashicorp/terraform-provider-http/issues/101

Proposal

It would be good if there is some method to enable the caching of the result in the state unless the input configuration changes. By either some kind of a setting or by default (but that would mean a breaking change).

How much impact is this issue causing?

Medium

Additional Information

No response

Code of Conduct

apparentlymart commented 1 year ago

Hi @idelsink,

From what I can see in your plan output, I think what you are seeing here is not caused by the provider but rather by your configuration itself.

We can see in the plan that Terraform already knows the value of url, but the message includes the note "(depends on a resource or a module with changes pending)", which means that although the configuration for this data resource is sufficient to make the request, there's a dependency relationship between this and something else in your configuration.

I don't see any direct references to resources in the data block you shared so I would guess that the cause here is that this data block is in a child module and you've written a depends_on argument inside the module block that's calling it. When a whole module depends on something else that means that everything in the module must wait until that something else has had all of its changes applied, because those changes could affect the outcome.

If you declare your dependencies more precisely (rather than just declaring the whole module as depending on something else) then Terraform should be able to determine that it's okay to read from this data source during the plan step.

Data sources are never cached between runs because they represent external data assumed to be managed elsewhere, but if you make the configuration fully known during the plan step and you don't have any depends_on arguments reprsenting other "hidden dependencies" then Terraform will be able to read this data during the plan phase rather than during the apply phase.

As far as I can tell from what you shared, there's nothing that could change in this provider to avoid the problem you've encountered here. Terraform Core isn't even calling into the provider during the plan phase because the dependency relationships in the configuration block it from doing so.