gruntwork-io / terragrunt

Terragrunt is a flexible orchestration tool that allows Infrastructure as Code written in OpenTofu/Terraform to scale.
https://terragrunt.gruntwork.io/
MIT License
7.95k stars 966 forks source link

Terragrunt is pratically unusable on Windows without WSL #1755

Open stefan-van-de-griendt-quandago opened 3 years ago

stefan-van-de-griendt-quandago commented 3 years ago

I've attempted to use Terragrunt, as another team in my company is already successfully using it. However, that team works with Mac OS or Linux. As soon as you use source, Terragrunt creates some lengthy paths and on top of that, Terraform does so as well (within the .terraform directory). I honestly think the issue is with Terraform, but I'm also reporting it here, as with only Terraform I have no issues. This could be related to #581 and https://github.com/hashicorp/terraform/issues/21173 See also https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation for more details on Windows and long path names (note: this is all setup properly on my machine).

At the moment of trying to initialize something with a decently sized path, the following error is thrown

Error: Failed to install provider

Error while installing hashicorp/aws v3.50.0: open .terraform\providers\registry.terraform.io\hashicorp\aws\3.50.0\windows_amd64\terraform-provider-aws_v3.50.0_x5.exe: The system cannot find the path specified.

I could work around the issue a tiny bit by specifying a download_dir, but that didn't fix it for all modules. I tried to copy the above file myself, and this works without any problems. Note that after the copying, still nothing works. It seems that an application is actively checking for the path length rather than trying.

To reproduce this, do the following

  1. Create D:\Development\Work\infra-processes\terraform\aws-codebuild\shared\ftp_parameters
  2. Create a random Terraform file
  3. Create D:\Development\Work\infra-processes\terragrunt\aws-codebuild
  4. Add in there terragrunt.hcl with the following contents (replace the variables to match what you've got in AWS)
    
    generate provider {
    path      = "provider.tf"
    if_exists = "overwrite_terragrunt"
    contents  = <<EOF
    provider aws {
    assume_role {
    role_arn = "arn:aws:iam::${local.aws_account_id}:role/${local.aws_role_name}"
    }
    region = "${local.aws_region}"
    }
    EOF
    }

terraform { source = "${path_relative_from_include()}/../../terraform/aws-codebuild//${path_relative_to_include()}" }

download_dir = "D:\terragrunt-caches\infra-processes\aws-codebuild\${path_relative_to_include()}"

5. Create D:\Development\Work\infra-processes\terragrunt\aws-codebuild\shared\ftp_parameters
6. Add in there `terragrunt.hcl` with the following contents 

include { path = find_in_parent_folders() }


7. Open a command line and enter D:\Development\Work\infra-processes\terragrunt\aws-codebuild
8. Run `terragrunt run-all plan`
9. See the error and see that D:\terragrunt-caches\infra-processes\aws-codebuild\shared\ftp_parameters\ey5C4i7O2JiVeuSQN6lEei-wVxk\uRJHvo_apGCN6rzm-h1wJXKKxgk\shared\ftp_parameters\.terraform\providers\registry.terraform.io\hashicorp\aws\3.50.0\windows_amd64 is an empty directory (it should contain terraform-provider-aws_v3.50.0_x5.exe)

So having a dynamic path of only `aws-codebuild\shared\ftp_parameters` is already too long in order to work with Terragrunt for me. Sure. I could shorten `D:\terragrunt-caches\infra-processes` in some fashion and resolve this specific case, but if we ever have a structure that's a tiny bit longer, it'll fail again.

Instead of having this fixed directly on Windows, a detailed guide on how to get this working under WSL could also work. Albeit only half of a solution in my opinion.
stefan-van-de-griendt-quandago commented 3 years ago

I think I've got a workaround for this issue. I don't want to run special commands before I can use Terragrunt, so all had to be done within terragrunt.hcl.

With the following terraform block I managed to force Terraform to store the .terraform directory outside the .terragrunt-caches directory. This way there's no amplification on the issue and paths are stull supposed to be unique. The only thing I don't really like, is that I'm forced to include the constant infra-processes\aws-codebuild: it would be nice if it could detect that somehow. The search continues...

locals {
  repository_name = "infra-processes"
  subpath = "aws-codebuild"
}

terraform {
  source = "${path_relative_from_include()}/../../aws-codebuild//${path_relative_to_include()}"
  extra_arguments terraform_path {
    commands = [
      "apply",
      "destroy",
      "init",
      "plan",
      "validate",
    ]
    env_vars = {
      TF_DATA_DIR = "D:\\terraform-caches\\${local.repository_name}\\${local.subpath}\\${path_relative_to_include()}"
    }
  }
}
download_dir = "D:\\terragrunt-caches\\${local.repository_name}\\${local.subpath}\\${path_relative_to_include()}"

Also, anyone got a clue to get all Terraform commands? I've only seen subsets, such as get_terraform_commands_that_need_vars(). Right now I included only the main/major commands listed on the Terraform site (https://www.terraform.io/docs/cli/commands/index.html).

phatcher commented 3 years ago

@stefan-van-de-griendt-quandago The trick on Windows is to set appropriate cache directories for both terragrunt and terraform; this will fix the long path problem you are encountering. I've actually set this as global environment variables but you can do this in VSCode via powershell for example

$Env:TF_PLUGIN_CACHE_DIR="C:\.tf-plugin-cache"
$Env:TERRAGRUNT_DOWNLOAD="C:\.tg-cache"

If you have control over your machine, also turn on Developer mode as this then allows the use of symbolic links and the cache directories for each run drop from ~150Mb to ~1.5Mb

stefan-van-de-griendt-quandago commented 3 years ago

Thank you for this hint, but this is not very user friendly: forcing environment variables to be set each time you open a terminal. I therefore managed to solve that problem with the previous code snippet and keeping it in code. As a matter of fact, locally I use locals for the root directories that can be overridden with environment variables.

Maybe I'm wrong, but I don't like all my repositories being mixed together into one directory. I may also have multiple Terragrunt modules within one repository and I don't want any clashes. Therefore it seems that a static directory won't cut it, hence I let Terragrunt inject a relative path as well.

If you have control over your machine, also turn on Developer mode as this then allows the use of symbolic links and the cache directories for each run drop from ~150Mb to ~1.5Mb

I can't find anything about this. Do you have any reference on how to do so? 100 times less disk space used sounds great!

brikis98 commented 3 years ago

Yes, unfortunately, we do not have an automated testing pipeline for Terragrunt on Windows, and we definitely have long path bugs as a result (if you search issues, you'll see a number of previous related issues). Suggestions and bug fixes are welcome, but the real solution would be setting up Windows CI / CD. Unfortunately, I'm not sure when we'll be able to get to that.

stefan-van-de-griendt-quandago commented 3 years ago

I think that when Terraform 'fixes' support for Windows Long Paths (https://github.com/hashicorp/terraform/issues/21173), this issue is mostly resolved as well. The problem typically isn't in the part of the path Terragrunt generates, but the path Terraform generates for the providers afterwards.

As shared before in https://github.com/gruntwork-io/terragrunt/issues/1755#issuecomment-885542362 this is for me the best and scalable solution that works for all. As also mentioned, I made those paths configurable through environment variables.

If you'd like to add a Windows test, I understand you want to run the tests on a Windows environment. However, I think it's not bad of an intermediate test to test the generated path lengths and check if they're below, let's say, 250 characters. But I think you'll expect those tests to fail and 'hide' (I don't mean it as negative as it looks like) behind the argument for a Windows environment?

I haven't checked the code of Terragrunt yet, maybe in a couple of weeks (busy times) I'll give it a go to add a test like this, if you think that's helpful?

brikis98 commented 3 years ago

We're actually just starting to add Windows testing in https://github.com/gruntwork-io/terragrunt/pull/1769! Still early days, but little by little...

avishnyakov commented 2 years ago

It's not uncommon to have engineering teams working on different platforms, e.g. Windows, mac-book or Linux. If this is not enough, CI/CD agents add more variations with Ubuntu 18/20 and others.

One way to solve this "works on my machine" problem is to standardize on docker. For example, I usually go with two images to version pin, bake dependencies, ensure execution runtime is alike everywhere:

Btw, of course TerraTest is used to test these TF/TG images, kudos to gruntwork!

You can use that with docker run --rm -v ${PWD}:/src, but it might be a good idea to add a small wrapper script to run TerraTest/TerragGrunt container. Anything that runs on all platforms can be used, say python or pwsh or similar. This runner script can add more smarts and can also be used under CI/CD agents.

Win-win, same execution approach, same runtime (docker image) that works consistently everywhere, less worries to troubleshoot - all runs looks and behave alike across all local laptops and CI/CD agents.