gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
7.94k stars 965 forks source link

Issues when attempting to use air-gap/offline provider cache and private module (i.e. `provider_installation` possibly being ignored). #3244

Closed IAXES closed 1 month ago

IAXES commented 2 months ago

Describe the bug

I am attempting to terragrunt run-all plan in an air-gapped/offline-only/private-intranet use case. I basically need to use a private/proprietary module, along with the public github provider to communicate with a private GitHub Enterprise Server (GHES) stack. This work is somewhat similar to what is described in https://github.com/gruntwork-io/terragrunt/issues/3117.

I'm able to get the provider cache to work with various modules I can pull from registry.terraform.io, but not with private modules. For example, I have a private module, cheeseburger/pickle/3.0.0, hosted on a private (slow) server, terraform.food.lan. Since the private server is slow and unreliable, I'm attempting to manually install the provider into the cache. However, all attempts to get the private module to be sourced via my Terragrunt provider cache fail with the following error:

ERRO[0078] remote https://terraform.food.lan/v1/providers/cheeseburger/pickle/versions unreachable, could not forward: context canceled
...
...
...
Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
terraform.food.lan/cheeseburger/pickle: could not query provider registry for
terraform.food.lan/cheeseburger/pickle: the request failed after 2 attempts,
please try again later: Get
"http://127.0.0.1:5758/v1/providers//terraform.food.lan/cheeseburger/pickle/versions":
context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Steps To Reproduce

First, I setup the Terragrunt provider cache in my shell:

export TERRAGRUNT_PROVIDER_CACHE="1"
export TERRAGRUNT_PROVIDER_CACHE_REGISTRY_NAMES="registry.terraform.io,terraform.food.lan"
export TERRAGRUNT_PROVIDER_CACHE_HOST=127.0.0.1
export TERRAGRUNT_PROVIDER_CACHE_PORT=5758

I create a ~/.terraformrc file like so:

plugin_cache_dir = "$HOME/.cache/terragrunt/providers"
disable_checkpoint = true
provider_installation {
  filesystem_mirror {
    path    = "/home/owner/.cache/terragrunt/providers"
    include = [
      "terraform.food.lan/*/*",
    ]
  }
  direct {
    exclude = [
      "terraform.food.lan/*/*",
    ]
  }
}

For additional context, my respective provider cache folders are located at the following paths (and that, with the exception of this use case where I'm trying to manually install a private provider into the Terragrunt provider cache, both Terragrunt and vanilla Terraform provider caches appear to be working as expected):

Expected behavior

I expected terragrunt run-all plan to conclude normally (i.e. to successful completion), but without requiring any access to the private server, terraform.food.lan.

Nice to haves

Versions

$> uname -a Linux darkstar 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 13:20:23 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux



## Additional context

N/A. Happy to provide additional info as we go.
IAXES commented 2 months ago

After additional testing, it seems that aws and github providers, for example, are indeed being sourced from the local provider cache (can't get my private provider working this way, though).

However, all providers seem to be ignoring the direct.exclude statements, and are still attempting to query versions from the remote registries, preventing the air-gap use case from working (i.e. if local provider cache is properly populated, but remote version check fails, then the overall terragrunt run-all apply operation also fails early-on).


So, I think the public-versus-private provider topic I raised is a non issue. Instead, this seems to be entirely a function of Terragrunt and/or Terraform reaching out to remote provider registries when we don't want the tools to do this (either due to a misconfig on my end, or some feature/code-change needed to support air-gap deployments).

I've also tried going as far as using the dev_overrides feature in the provider_installation block, but no luck. It seems like the entire issue hinges on "honor direct.exclude and disable_checkpoint = true statements in ~/.terraformrc" in order for an air-gapped use case to work.

levkohimins commented 1 month ago

Resolved in v0.63.3 release.