hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.35k stars 9.49k forks source link

Restrict terraform plugins(providers/modules) from downloading modules from unverified sources #33698

Open sahilsk opened 1 year ago

sahilsk commented 1 year ago

Terraform Version

NA

Use Cases

We cater to various internal developers across different teams, making it challenging to oversee their internet downloads. While we can confine downloads to the terraform registry source, the HashiCorp Terraform registry often redirects to GitHub for certain plugins. Consequently, we lack precise control over external access limitations.

One approach is to rely on tf-plan output to detect external access calls by modules or data sources. However, this approach might be too late, especially for data sources where code executes during tf-plan.

Attempted Solutions

Wrote a custom wrapper on top of terraform that parse every *.tf files and decide whether to allow this module/provider or not by matching it with our allowlist. But when request goes to hashicorp-registry and hashicorp registry downloads the modules from github, this defeats our purpose.

Terraform will download the module named "module" from the "hashicorp/example" namespace on the Terraform Registry.

However, it's worth noting that some modules from the Terraform Registry might have dependencies or plugins that are hosted on other platforms like GitHub. In such cases, Terraform will also download those dependencies from their respective sources.

Proposal

I propose implementing an allowlist or trusted module list. This would enable the Terraform binary to adhere to such a list no matter how deeper in the dependency chain the plugin is used.

References

NA

apparentlymart commented 1 year ago

Hi @sahilsk,

From what you've written I have the impression that you are primarily concerned about controlling which provider plugins can be used, and that you are only worried about modules because depending on a model can potentially introduce an indirect dependency on another provider.

If that is true, I think the CLI configuration file's provider installation settings already provide a potential solution.

Although that mechanism is primarily intended for customizing where providers get installed from rather than controlling what gets installed, it is possible to write a provider installation configuration that only specifies how to install a subset of the available providers. Any provider that doesn't match a provider installation rule will immediately fail installation, because there would be no installation method configured for it.

For example, if you were to write the following in your CLI configuration then Terraform would be unable to install any provider that doesn't belong to the "hashicorp" namespace on the public Terraform Registry:

provider_installation {
  direct {
    include = [
      "registry.terraform.io/hashicorp/*",
    ]
  }
}

If the HashiCorp namespace (which contains only Official provider plugins) is not sufficiently constrained then you can instead enumerate a fixed set of exact providers that are allowed:

provider_installation {
  direct {
    include = [
      "registry.terraform.io/hashicorp/aws",
      "registry.terraform.io/hashicorp/null",
      "registry.terraform.io/hashicorp/tls",
      # ...
    ]
  }
}

There is no comparable mechanism for overriding how modules are installed because module source addresses are typically direct physical locations, but if you constrain which providers are allowed as I showed above then it will effectively block the use of any modules that use providers outside of those allowed, because terraform init will fail to install those additional providers.

sahilsk commented 1 year ago

@apparentlymart Thanks for sharing this information. This does help restricting one of the possible gap. but the core of the problem still persist and unless we have a way of restricting sources used in modules this security gap persist.

If i may request , please let me know what you think could be done in CASE-3

Case 1: Using source directly

# downloading directly .i.e `terraform-get`
module "malicious-module" {
    source = "github.com/malicious-module.xx"
}

Solution in this case is simple: scan root module *.tf files and extract all source starting with "github.com/xxx"

Case 2: Using module indirectly via dependency

In case scanning all the child modules would take ages. So, what we can do is scan the "module.json" file generated at tf-init stage. Downside: Malicious module already being downloaded in the infrastructure, though not executed which is a fare compromise for speed

module "awesome-module" {
   module "awesome-module-child-1" {
     module "awesome-module-child-n" {
            ......
           source = "github.com/malicious-module.xx"
}

Case 3: hashicorp registry

Now thing complicates when hashicorp registry is involved. in this case module is downloaded from registry, however, registry(behind the back) can download the module from anywhere: github/bitbucket etc, which we won't know ???

# downloaded via registry or via terraform-get
module "awesome-module" {
    source = "hashicorp/malicious-module-sugar-coated.xx" -> github.com/malicious-module
}
apparentlymart commented 1 year ago

As currently implemented, modules from the registry.terraform.io registry are always from GitHub repositories, and the relationship between the source address and the underlying GitHub repository is systematic.

For example, hashicorp/dir/template is a shorthand for registry.terraform.io/hashicorp/dir/template, and corresponds to the GitHub repository https://github.com/hashicorp/terraform-template-dir.

The general rule is that registry.terraform.io/NAMESPACE/NAME/TARGETSYSTEM maps to https://github.com/NAMESPACE/terraform-TARGETSYSTEM-NAME.

I will concede that this is something that could potentially change in future, but I think it's very unlikely to change in the foreseeable future, because the namespace of Terraform Registry is tightly coupled to GitHub's namespace and it would be a big project to change that.

Using that information, perhaps you can adapt your approach for Case 2 to also support Case 3. Either you could prohibit using registry addresses altogether -- and always use the corresponding GitHub URL directly -- or you could reject entries from modules.json that refer to registry URLs not on your allowlist.


I forgot to mention in my previous comment that we already have #29362 open representing the possibility of custom installation methods for modules, with similar capabilities to the provider_installation block.

Do you think that something similar to the provider_installation block for the module installer would be an acceptable solution? If so, would it be sufficient for it to work only for module registry addresses (where we'd be able to follow a similar wildcard matching strategy as for provider source addresses), assuming that you could solve non-registry addresses as you described in case 2?

sahilsk commented 1 year ago

@apparentlymart Thank you for taking time to address my concern. I REALLY APPRECIATE IT.

As far as i can see, it(provider_installation) indeed sounds like a promising solution.

We thought about blocking hashicorp registry entirely and keep github.com only but added friction hampered developer efficiency. But other way around: blocking all github.com source but hashicorp ones sounds more promising.

With that being mentioned, I'd like to introduce a few additional points of concerns while we are it :

  1. Version whitelist/blacklist: AWS recently introduced their new 5.x provider, which unfortunately lacks backward compatibility. It's important that we establish a method for excluding version 5.x from our system . ( provider_installation should have helped here) . Additionally, in cases where problematic packages are identified, a mechanism should be in place to prevent their integration.

  2. Exclusive use of checksums/signed packages: This is particularly relevant for modules sourced from platforms like GitHub (indirectly through HashiCorp). Many teams typically don't require third-party modules unless they're working on significant projects like EKS/Kubernetes. However, in situations where they are necessary, implementing a process involving checksums or digital signatures would enable us to compare downloaded modules against verified standards. This step will guarantee that only recognized and validated modules become part of our enterprise infrastructure.


On #29362 I've added our solution to the problem raised there and a suggestion to improve it further

sahilsk commented 1 year ago

@apparentlymart any thoughts? is module_installation in the pipeline already? I am willing to offer help(coding/testing) if it helps expedite it,