hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.29k stars 9.48k forks source link

reuse git checkout repo during "terraform get" when possible #14036

Closed timurb closed 6 years ago

timurb commented 7 years ago

We are using module source in external Git repo in the form of

module "host1" {
  source = "git::ssh://git@bitbucket.org/org/org_modules//myhost?ref=HEAD"
  param1 = "foo"
}

module "host2" {
  source = "git::ssh://git@bitbucket.org/org/org_modules//myhost?ref=HEAD"
  param1 = "bar"
}

(as recommended in https://blog.gruntwork.io/how-to-create-reusable-infrastructure-with-terraform-modules-25526d65f73d)

When I run terraform get a new checkout is done from the scratch for every module definition referencing myhost even though they reference exactly the same remote git ref.

This is very slow -- in my setup it takes 7min just to do terraform get and in all module definitions I'm referring only a single git repo (my whole plan expands into 120 checkouts). In addition to that if I git push during terraform get to the branch I'm referring in TF code different parts of my terraform code may result in using different git SHAs.

It would be nice if the checkout was done only once and after that was reused wherever it can be.

holmesb commented 7 years ago

+1 this would be a massive productivity booster. Multiple fetching of the same git repo during terraform get is nonsensical. During initial development when making frequent changes, this behavior makes modules with git source almost unusable. We've reverted to local file source for the time being.

in4mer commented 7 years ago

I don't think this is a "me too", but want to bump this with another impact report of real import.

I've come up with a module tree that lets us leverage a lot of existing boilerplate while having good flexibility. One of the consequences of this design is everything is heavily modular. One of the consequences of that is a single directory describing three AWS instances using count, only comprising 85 lines in *.tf, results in 27,376 files total in that directory, because of .git litter everywhere.

I have ~20 machine type definition directories. One set in a stage directory, one set in a prod directory, for 40 total. So this is actually causing some rather difficult system problems for me at this point. Just the machine definitions themselves should result in roughly 1.1M files. From 9,000 lines of TF, total. Counting prod and stage together.

Addendum: I think this is closely related to https://github.com/hashicorp/terraform/issues/10703, which is a request to allow the git checkout to specify the depth. There's no sense downloading the entire repo history for every single module.

holmesb commented 6 years ago

What's the workaround here? Is anyone actually using git as source when there's a significant number of modules? Are you waiting for a git clone for each and every module? We're using local files as source, but lose the benefits of versioned modules. If git is unusable, is there a non-git source that is:

  1. Fast
  2. Can use versioning (needed for integration with ARA pipeline) ?
in4mer commented 6 years ago

I've come up with a workaround of sorts, but it requires the use of nested modules.

In your main .tf files, call your modules using git::ssh. Ideally, if you're running a pretty segmented setup, you'll create a minimum of modules at the top level. Then, in all your modules that reference other modules, use:

source = "../../modules/whatever"

This filesystem reference will traverse from within the CWD of the module within the git repo, because an entire repo is downloaded with each git::ssh module reference, so all the other files will be there.

Ideally, it will only be a single git checkout for init or get -update=true

ketzacoatl commented 6 years ago

This is especially painful if you have all of your TF modules in one git repo >.<

mikelindsey-okta commented 6 years ago

I solved a giant pile of configuration and abstraction issues with intermediary modules aaaaand now I have this problem.

$ du -hs .terraform/
3.0G    .terraform/

This is after a refactor to cut out half the versioned references, and this is just ONE implementation directory of dozens.

rismoney commented 6 years ago

Is there a real workable solution to this? I don't see how modules actually scales, or how this has avoided the radar for so long given the extent that Hashicorp has modularized the platform.

@in4mer your workaround gets a single copy of the files there for use but ignores any benefits of git (branches/tags), etc.

apparentlymart commented 6 years ago

Hi everyone! Thanks for reporting this, and for the ongoing discussion.

This seems to be the same thing being discussed over in #11435, so I'm going to close this just to consolidate the discussion over there.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.