hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io
Other
43.14k stars 9.58k forks source link

terraform modules source: variable support in source for git username #23948

Open gscuderi opened 4 years ago

gscuderi commented 4 years ago

Hello terraform team, in working on a project I realized there is a feature which might be very useful within modules source, which is to support variable support for git source.

I know this has been discussed in the past already, and that this is not currently supported, I went through the various threads, anyway there was no mention about the use case I'm going to describe which is why I decided to open the feature request anyway.

Let's imagine I have a module in gerrit server, or any other git service on which you need to specify your user account in the source URL.

To use such a module, I will need to do something like:

module "my_module" {
  source = "git::ssh://myuser@gerrit.server.com:29418/repo.git//modules/mymodule?ref=v1.0.0"

  variable = "..."
  ....
}

The need to specify in the source url myuser upfront is what is creating issue here, since this is different for each users and cannot be generalized.

As I have no way to override the source URL, it means when I develop the scripts I need to put my username, my colleagues has to change it and put theirs, and if I'm using Jenkins for the automation we ALL need to remember to change it back to the one used from Jenkins before submitting the code.

Ideally I should be able to use override.tf file and specify my own username (or even the entire URL would be ok), so that we do not risk to forget to change it back to the gerrit CI user after working on the code (which is something that happens way TOO often!)

So, at the end, having the possibility to do something like:

module "my_module" {
  source = "git::ssh://${var.gerrit_user}@gerrit.server:29418/repo.git//modules/mymodule?ref=v1.0.0"

  variable = "..."
  ....
}

Or maybe:

module "my_module" {
  source = "${var.my_module_git_source_url}"

  variable = "..."
  ....
}

Is what I'm looking for.

Any other ways to achieve the same objective is perfectly fine, I just need to stop changing it manually since this is way too fragile and prone to human error, to be honest this exactly what I'm trying to prevent by using IaaC and automation!

Thank you for your kind consideration & help, Giordano

apparentlymart commented 4 years ago

Hi @gscuderi! Thanks for sharing this use-case.

It might interest you to know that Git itself has a feature that addresses a variant of this use-case: turning references to unauthenticated URLs that might appear in locations like Terraform configuration, npm modules, Go modules, etc into authenticated ones with a username of your choice.

For example, in my .gitconfig I have the following setting:

[url "git@github.com:"]
    insteadOf = https://github.com/

This tells Git that whenever I (or some other software such as Terraform on my behalf) runs git clone https://github.com/... it should instead use git@github.com:... as the remote address. This means that I can use Terraform Modules, Go modules, npm modules, etc that contain unpersonalized GitHub repository references like https://github.com/example/foo and make authenticated requests to those over SSH instead.

Perhaps in order to smooth your current workflow you could standardize on a particular placeholder user to commit in your configurations -- the "gerrit CI user" you mentioned, maybe -- and then each developer can add a rule like the above to tell Git to use your own username instead:

[url "ssh://gscuderi@gerrit.server.com:29418/"]
    insteadOf = ssh://ci-user@gerrit.server.com:29418/

I believe that would then allow you to work with your Terraform configurations without any direct modification, and let Git itself do the translation to a more appropriate username on your development systems.

gscuderi commented 4 years ago

I appreciate the work-around, unfortunately it won't work very well on my case...

I'm using Cloud Jenkins slaves on-demand which are configured through a script when they are needed, and then destroyed when unused. A server-wide setup would require to hardcode the ci user in the auto-provisioning script, this is not good.

Doing it on the single repository is even worst, as it requires a settings in the Jenkins declarative pipeline exposing the ci user on each single project repository. Then imagine if tomorrow I need to change the ci user, I'll have to ask each single project to make the change in their repository and previous versions will not work anymore which is a bad thing!

Frankly speaking would be much better having the feature on terraform, I'm sure you'll find many other use cases in which custom setup on the git repository won't work very well, especially since you always combine multiple tools together to achieve a full automation.

apparentlymart commented 4 years ago

Hi @gscuderi! Thanks for sharing that additional information.

In the interests of gathering as much context as possible about this problem so we can weigh various options, I have a further question:

Terraform is currently following the same practices as several other language ecosystems such as the ones I mentioned in my earlier comment (Go and npm) of allowing literal Git URLs for dependencies without any means to override them or customize them. I'm curious to know if the Gerrit server you mentioned here is used exclusively for Terraform, or if you are using it with some other ecosystems that also support direct Git URLs for dependencies, and if so if any of those systems have a good solution to the problem of swapping out different usernames that we could take inspiration from in Terraform.

In the ecosystems I'm aware of it's a common constraint that dependencies are expressed totally statically because, as with Terraform today, the dependency resolution and installation is a separated subsystem (or possibly even a separated system) that is used prior to "real" execution of the program, so I'd love to hear about any ecosystems you know about that you think have done a good job of supporting your use-case here, without relying on the Git feature I described in my previous comment.

If you don't have any such examples in mind, then no worries! I just think it's good to learn from prior art if possible, so we have a few different options to weigh.

mgrotheer commented 4 years ago

I'm struggling right now in trying to pass in specific credentials to the Terraform Module source (private repo) in our GitLab environment. One thing we have looked at doing is leveraging a GitLab deploy token but I'm not sure how we could do this since we wouldn't want to hard code the credentials in. Any other way I've tried to do it results in "access denied" error. Passing in variables to the Module source name would be helpful

rlove commented 4 years ago

I am fighting this as well, We have several private modules with references to other modules. The private modules are stored in GitHub.

We use GitHub Workflow Actions to run terraform.

If I check out my module with actions/checkout@v2 and a PAT (Personal Access Token) that has access to all the other repositories that contain the referenced modules. The git config for that specific repository is changed to allow future operations on https. It will even rewrite git submodules references from ssh to https.

When I call terraform init in and I have references to a module via HTTPS Git protocol I get the following message:

Could not download module "tags" (tags.tf:1) source code from
"github.com/orgname/example-tags?ref=v0.2": error downloading
'https://github.com/orgname/example-tags.git?ref=v0.2': /usr/bin/git
exited with 128: Cloning into '.terraform/modules/tags'...
fatal: could not read Username for 'https://github.com': No such device or
address 

It's even more interesting when you have a referenced module that uses SSH and HTTPS protocol for Git to other modules, which are sometimes out of your direct control.

None of this is typically noticed locally I have both SSH Keys and Credential Helper configured for HTTPS with git. Which is not an option for a Self Hosted Runner. As it allows other builds to use theses when they should not have the rights to do it.

So my needs would be for an ability to optionally pass a PAT on terraform CLI (or other similar mechanisms), and it will use it when checking out any GitHub references that use HTTPS.

A workaround is to never use HTTPS and only use ssh.

Another option is to be able to set customer headers in HTTPS URL, so the token could be download from a release page. Or another secure website managed by header tokens.

GaTechThomas commented 3 years ago

Same need here, but for Azure DevOps.

daveth commented 3 years ago

Hit a similar use case here too, but with a GCS bucket used as the module source. Our CI environment owns such a bucket, and is parameterised and able to be deployed to a bunch of independent environments, but all other infrastructure that needs the TF modules in one of those registry-buckets end up having the GCS location hard coded since we can't have variables in module sources.

Could a registry block work for this? You could define it in the same place as a backend, tag any modules that need it with a registry attribute referring to the one you just defined, and when terraform init runs it goes and grabs the modules from the appropriate registries.

Pseudo-HCL:

terraform {
  registry "gcs" {
    name = "my-gcs-registry"
    storage_location = "gs://some-bucket/sub-dir"
  }

  backend "gcs" {
    // ...
  }
}

module "blah" {
  registry = my-gcs-registry    // sorta like specifying a provider to use?
  source = "some-path/blah"     // for a gcs registry, would just append the path (gs://some-bucket/sub-dir/some-path/blah)

  // ...
}
ncdmr commented 2 years ago

Same need here, we'd like to have our gitlab URI as a variable so we have move flexibility in case of domain changes.

Eugene-Trufanov commented 2 years ago

Agree, would be very useful for many purposes. Currently have to use Terragrunt or sed in buildspec files.

apparentlymart commented 2 years ago

Hi all,

The current status of this issue is that we're looking for examples of other language ecosystems that have solved this problem in a different way than Terraform has and thus can better meet the use-case. Currently Terraform is consistent with various other language ecosystems we know of which support installation directly from Git repositories, and so the git configuration approach I shared above is one that is typically recommended for other similar systems like npm in the NodeJS ecosystem.

We understand that there is friction here but in order to make further progress we need to understand what makes Terraform different than the other systems with the same design (that is: dependencies are specified statically rather than dynamically, and are installed prior to runtime), why the git configuration solution can work for those ecosystems but not for Terraform, and ideally examples of other ecosystems which have a different solution to this problem.

We don't expect to implement something in Terraform that is entirely different from any other programming language ecosystem, because we aim to be consistent with other languages so that as much as possible the same processes and practices that work for other languages can work with Terraform too.

rlisnoff commented 2 years ago

Hey all, I wanna add a +1 here and my current reasoning for wanting this feature.

Our terraform modules are stored in s3, but in order to meet some compliance standards our system has to tolerate a region outage in AWS. Though s3's namespace is global, the actual data is stored regionally, so we have a replicated bucket in another region that will also contain our terraform modules. In the event of a disaster, we want the terraform files that consume these modules to be able to deploy into the disaster recovery region, but since we can't reference variables in the source parameter, we are stuck with creating a repeat module call with the source pointing to the other s3 bucket and coalescing these values later. It'd be a heck of a lot more DRY to have one module defined that pulls its source in a disaster-resilient way.

If there are alternate solutions here I'm interested in hearing them, we've just been unable to come up with any that fit our needs.

geoffo-dev commented 2 years ago

Apologies @apparentlymart - only just saw you responded when issue #30546 closed! Apologies...! Thank you for taking the time to reply!

I think the approach you suggested will not work for our use case sadly - that said I am also not sure how best to attack it when you compare it to other languages.

So I think I have been trying to wrap my head around the issue as I didnt really understand why it couldnt just be a string... but I forgot that as part of the initial validation/init, it needs to properly resolve these which I guess it needs to do before any variable resolution.

The only ones I am familiar with would resolve these initially and then use those for the build... Which I guess is what terraform is doing! Unless we could specify dependencies/sources in different files/maps.

apottere commented 2 years ago

@apparentlymart I know in this quote you're specifically talking about how terraform handles git authentication and not all variables in the source, but per your comment on #30546 I was redirected here and wanted to highlight how this doesn't hold for all use cases:

The current status of this issue is that we're looking for examples of other language ecosystems that have solved this problem in a different way than Terraform has and thus can better meet the use-case. Currently Terraform is consistent with various other language ecosystems we know of which support installation directly from Git repositories, and so the git configuration approach I shared above is one that is typically recommended for other similar systems like npm in the NodeJS ecosystem.

A huge point of friction for my current org and my past org is that there's no way to specify a module dependency for an entire project/module, and if we're using git refs as a module version it needs to be copied into every single module.source we write. We have a monorepo for all of our shared terraform modules that we tag with semver, so this version gets updated pretty frequently.

For our use-case, terraform differs significantly from other languages - for example take a simple NodeJS project. In NodeJS versions are declared once, in package.json, and then the dependencies can be referenced without a version later (import { ... } from '@scope/pkg/subpkg'). Imagine if you had to declare the dependency in each import in each file (import { ... } from '@scope/pkg/subpkg@1.4.1'), it would make maintaining a NodeJS project with dependencies a nightmare.

Edit: Note that I'm not suggesting that variables in the source are the only solution to this problem, but it would be one of the solutions.

apparentlymart commented 2 years ago

Thanks for sharing that difference, @apottere.

My understanding is that in the NodeJS ecosystem each package has one package.json file which specifies in a single location which version of each dependency to use. In that model, each package can specify only a single version constraint for each other package it depends on. Furthermore, in the case of dependencies that are not published in the registry the package.json file also serves to create a local mapping table from registry-like names to other sources such as Git URLs.

Terraform intentionally allowed a single module to call multiple versions of the same other module, and maintainers make use of that capability in situations where they want to roll out a new version over multiple steps: add a new module block using the new version while keeping the old one, then terraform apply to temporarily use both, then remove the old module block and terraform apply to remove the old one.

From this NodeJS example I think we can learn two main things:

Terraform currently has no direct analog to package.json; as you observed, each module block is totally self-contained today and does not rely on any other information declared in the module.

There are some things that NodeJS and Terraform seem to have in common, though:

Thanks for sharing this example!

ayashjorden commented 2 years ago

Hi @apparentlymart , Similar to @rlisnoff , our platform is distributed and we're evaluating different solutions.
Here is my comment on another issue:

Hi all, In my use-case, I want to pull modules from configurable location, mostly like same-region to avoid cross region traffic.

So source = "s3::https://s3-${var.region}.amazonaws.com/artifacts-${var.region}-dev/common-aws.1.0.0.tar.xz" , makes sense to me that should be supported.

Can anyone link here to the area in the code :

  • Where I can specify input arguments? (I guess that's in the main TF binary not a provider, I'd like to experiment
  • Where is the init functionality happens so I can try to support -var or -var-file ?

My logic tells me that input variables or var-files would be similar if not identical to the input of the rest of the configuration. in most cases.

Should that not be fruitful:

In anyway, even if not, experimenting with that would support the discussion... Best, Jordan