hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.71k stars 9.55k forks source link

Feature request: Generate .terraform.lock.hcl including zh and h1 hash values for given platforms from required_providers block without downloading providers and modules #27264

Open minamijoyo opened 3 years ago

minamijoyo commented 3 years ago

Current Terraform Version

$ terraform version
Terraform v0.14.2

Use-cases

I have 200+ root modules and I'm automating a provider version up workflow in CI with tfupdate, which updates all version constraints in Terraform configurations recursively. My laptop is macOS and CI is Linux, so I want to pre-populate hash values for all platforms I need in the workflow to avoid a checksum mismatch error.

I'm looking for an efficient way to maintain .terraform.lock.hcl for multiple root modules and platforms environments.

Attempted Solutions

For downloading providers:

As you know, there are two hash formats, that is, zh and h1, and zh is recorded only when a provider zip package is downloaded from Terraform Registry. I want to avoid redundant downloads because I have a lot of root modules, so I tried to create a local filesystem mirror with the terraform providers mirror command at git project repository root, and then generated lock files at each sub directory with the terraform providers lock command using the mirror. It recorded only h1 hash values. It's ok.

I wrote a script to generate .terraform.lock.hcl for multiple directories and platforms.

#!/bin/bash
set -eo pipefail

# create a plugin cache dir
export TF_PLUGIN_CACHE_DIR="/tmp/terraform.d/plugin-cache"
mkdir -p "${TF_PLUGIN_CACHE_DIR}"

# create a local filesystem mirror to avoid duplicate downloads
FS_MIRROR="/tmp/terraform.d/plugins"
terraform providers mirror -platform=linux_amd64 -platform=darwin_amd64 "${FS_MIRROR}"

# update the lock file
ALL_DIRS=$(find . -type f -name '*.tf' | xargs -I {} dirname {} | sort | uniq | grep -v 'modules/')
for dir in ${ALL_DIRS}
do
  pushd "$dir"
  # always create a new lock to avoid duplicate downloads by terraoform init -upgrade
  rm -f .terraform.lock.hcl
  # get modules to detect provider dependencies inside module
  terraform init -input=false -no-color -backend=false -plugin-dir="${FS_MIRROR}"
  # remove a temporary lock file to avoid a checksum mismatch error
  rm -f .terraform.lock.hcl
  # generate h1 hashes for all platforms you need
  # recording zh hashes requires to download from origin, so we intentionally ignore them.
  terraform providers lock -fs-mirror="${FS_MIRROR}" -platform=linux_amd64 -platform=darwin_amd64
  # clean up
  rm -rf .terraform
  popd
done

However, with the lock file recorded only h1 hash values, if I run terraform init without any mirror or cache, the init command adds zh hash values to the lock file. It causes an unexpected lock file change.

I expect terraform init without the -upgrade flag not to update the existing lock file, because terraform init is essential command for all workflows and I think it's not desirable to cause a git diff unexpectedly. Is it intentional by design or a bug?

Pre-populating zh hash values requires redundant downloads, so I want to avoid it. My pain points are:

If my understanding is correct, it would be great if the Terraform Registry returns not only zh hash values, but also h1 hash values for all platforms without download.

For downloading modules:

If a root module depends on other modules, the terraform providers lock command requires terraform init. I understand selecting a correct version needs all module contents for completeness.

However, before Terraform v0.14, there was no lock file, so I've already defined all provider dependencies I need in the required_providers block at each root module. In that case, it's not always necessary to get all modules because I know it. It works but it's inefficient. It would be great if we can have a shallow check option at own risk.

Proposal

Allow the terraform providers lock command to generate .terraform.lock.hcl including zh and h1 hash values for given platforms from required_providers block without downloading providers and modules.

References

Related to:

viceice commented 3 years ago

This would be very helpful for renovate too, see renovatebot/renovate#7895

apparentlymart commented 3 years ago

Hi @minamijoyo! Thanks for raising this.

We have a few different technical constraints to work through in order to achieve what you described here. I'm going to describe them here just to kick off a potential discussion about what technical solutions might change these constraints, and not meaning to say that we don't want to achieve these goals.


Terraform Registry only supports the "ziphash" (zh:) scheme

In Terraform v0.12 and earlier, Terraform always downloaded providers from the site releases.hashicorp.com, which is a system that just generically hosts .zip archives and other distribution archives for various HashiCorp products. Part of the design of that system is that each "directory" of archives includes a _SHA256SUMS file containing hashes of the files inside. Because that system is not Terraform-specific, it treats the files as opaque bytes and so captures the hash of the archive as a whole rather than of the content of the archive. There's also a cryptographic signature covering the _SHA256SUMS file as a whole, which Terraform has historically verified as an additional integrity check.

In order to make gradual progress towards the decentralized provider namespace, the official registry of providers on Terraform Registry was initially built essentially as just an index for content already on releases.hashicorp.com, and so the registry protocol was designed around the data that was available there. It therefore inherited the design of using checksums of the entire archive rather than of the contents.

Terraform v0.14 changed Terraform CLI's installation behavior but did not have any corresponding change to the registry protocol, so for the initial release we made the compromise of preserving the old hashing scheme as zh: in the lock file, and seeding the lock file with all of the zh: hashes that were signed with the provider developer's key.

Because the registry protocol currently has no awareness of the possibility of multiple hashing schemes, there is no way for it to return h1: hashes or for the provider developer to sign them. Fully supporting h1: hashes without downloading actual provider packages will therefore require an extension to the provider registry protocol.


Local Filesystem Mirrors don't support the zh: scheme

The original motivation for introducing this new h1: hashing scheme is that local filesystem mirrors typically contain unpacked provider packages -- the extracted contents of the .zip file -- rather than the original .zip file. Therefore there is no way to recover a zh: hash from such a mirror, and so Terraform in that case locks only the specific h1: hash for providers it finds via local mirrors.

For in-house providers that are only ever distributed by local mirrors, this is typically not a big problem: there is no real origin registry for those providers and so they will only have h1: hashes, and not zh: hashes. However, it is more bothersome when a mirror contains a provider that is also available from an origin registry, because the lock file can end up in a different state depending on which installation method was most recently used.

It's for that reason that we introduced the terraform providers lock command: it serves as a way to proactively populate all of the hashes you might need for a particular provider regardless of installation method, rather than gradually switching between them as a side-effect of running terraform init.

However, that command is only able to include checksums it has access to, which (because of the other constraints) is only zh: checksums signed by the original registry -- if installing from a registry -- and h1: checksums from packages Terraform already has on local disk and so can calculate the checksums itself.


The first registry download validates the others

An interesting detail of installing from an origin registry (rather than a mirror) is that Terraform will only download the package for the current platform but it will lock all of the zh: checksums across all platforms.

To achieve this safely -- that is, to avoid someone "poisoning" the hash file with invalid hashes -- Terraform requires the following conditions to hold in order to lock all of the checksums:

After seeding with the full set of zh: hashes, Terraform will also add any h1: hash it encounters where the package also matches one of the existing zh: hashes, allowing Terraform to gradually learn about the valid h1: hashes and add them autoamtically.

However, this mechanism assumes that the first download will be from a registry. If the first download is from a mirror then Terraform cannot rely on the signature to track the full set of hashes, and so it will conservatively only lock checksums it was able to compute itself, locally. That is why building your lock file from terraform providers lock requires downloading the packages, and why using a mirror with terraform providers lock produces an incomplete result.


With all of the above said, the current system obviously assumes that changes to the locked providers are driven by humans intentionally upgrading from the remote registry and submitting the result, and it isn't well suited to systems like tfupdate and Renovate whose goal is to pre-emptively propose upgrades.

There is a philosophical point here about whether constantly upgrading is better than staying on known-good versions that meet your needs, but I don't have a strong opinion on that point and I would like to support both approaches, if we can find a technical design that does so either by working within the existing technical constraints or by making backward-compatible changes to the existing design to loosen those constraints.

Making the registry protocol support other hashing schemes is an improvement we already discussed and hope to do, but changes to an established protocol require far more coordination between systems and so we were unable to include that in the v0.14.0 scope. Extending the protocol to generalize the hashing was, however, a next step I already discussed with the team who maintains the Terraform Registry and so they are aware of the need though we will need to work with them to design a concrete protocol that they are comfortable to implement within their own technical constraints.

I don't think that protocol change will address everything here, but I think it will loosen the constraints enough to give us some leeway to better address the other concerns.

minamijoyo commented 3 years ago

@apparentlymart Thank you for your comprehensive explanation. Since the Terraform Registry cannot return h1 hashes for now, it's unavoidable to download each provider package at least once.

With that in mind, I would like to discuss a short term workaround without changing the registry protocol. My top priority is suppressing the unexpected git diff caused by terraform init.

I came up with some ideas. Let me share them:

(1) Use h1 hashes only in a lock file When we run terraform init without any mirror or cache, we verify a checksum and signature for zip before unpack but compare the unpacked content with only h1 hashes in a lock file and never add zh hashes. If a lock file contains h1 hashes and no zh hashes, I think adding new zh hashes doesn't make sense at least in my case, but I'm not sure it's suitable for all cases. If there are some meanings in some cases, it might be better to add a new flag or option.

(2) Calculate zh hashes if a mirror contains a zip file A local filesystem mirror supports both unpacked and packed layouts and the terraform providers mirror command seems to save a provider package as the packed layout by default. That is, we potentially have a chance to calculate zh hashes if packed. If we want to verify a checksum, we can store not only the zip file, but also the checksum and the signature file into the local filesystem mirror.

(3) Add zh hashes without download Strictly speaking, checksums are immutable and can be verified by a signature. That is, we actually doesn't need to download a zip file to add zh hashes to the lock file. (Of course, it still needs to calculate h1 hashes.) It's a bit tricky but we may be able to relax the constraints.

Do these conflict with the current security model or other constraints?

minamijoyo commented 3 years ago

(4) Add a new environment variable or CLI flag to suppress updating the lockfile

I found this discussion: https://github.com/hashicorp/terraform/issues/27241#issuecomment-748598577

Envvar or cli flag to disable updating the lockfile

Even though this doesn't resolve all problems described in my issue, I think it has the same effect as (1) in my workflow. If we could suppress the unintended git diff, it wouldn't conflict with the workflow which updates the lockfile in CI.

@apparentlymart Is it acceptable? If it's worth considering adding a new option to suppress updating the lockfile, I'll open a new feature request.

Thanks!

apparentlymart commented 3 years ago

Hi @minamijoyo,

Focusing on your latest comment only for now (I'll review the other ideas in more detail soon), I think it could be reasonable to add a new option to terraform init to force it to treat the lock file as read-only, similar to Go's -mod=readonly option for various Go toolchain commands.

My initial idea for that interface would be terraform init -lockfile=readonly, which would establish an entirely new option -lockfile which could potentially take other values in the future if we find use-cases for other behavior variants.

I think the main requirement for that option is that installation can succeed as long as all packages can be verified with information already recorded in the lock file. The option would disable Terraform from updating the file to record any new information it learned (such as a hash using a new scheme) but Terraform would still rely on and check against the information already recorded.

It seems to follow then that -upgrade would be incompatible with -lockfile=readonly, because terraform init should not install a new version without updating the lock file to reflect the checksums of what it installed. If it did so then any subsequent operation would fail because the lock file would not match the local cache in .terraform/providers. Using these two options together should therefore be an error, I think.

minamijoyo commented 3 years ago

I've opened a new feature request for terraform init -lockfile=readonly to clarify the scope for the partial workaround: https://github.com/hashicorp/terraform/issues/27506

petur commented 2 years ago

Is there any progress on this issue? I find that terraform providers lock is failing very frequently with errors like this:

 Error: Could not retrieve providers for locking
│ 
│ Terraform failed to fetch the requested providers for darwin_arm64 in order
│ to calculate their checksums: some providers could not be installed:
│ - registry.terraform.io/hashicorp/external: could not query provider
│ registry for registry.terraform.io/hashicorp/external: the request failed
│ after 2 attempts, please try again later: Get
│ "https://registry.terraform.io/v1/providers/hashicorp/external/versions":
│ net/http: request canceled while waiting for connection (Client.Timeout
│ exceeded while awaiting headers).
╵

It may work on the next attempt, or it fails on a different provider.

Even if it's not possible to eliminate all the downloads, it would be a major improvement if only the missing providers were downloaded. Most of the time only one provider at a time is getting upgraded, so the hashes for the others are already in the lockfile. It shouldn't be necessary to download those again.

minamijoyo commented 1 year ago

For those who struggle to update multiple .terraform.lock.hcl at scale

After more than two years of waiting for Terraform Registry protocol change with no progress, I finally implemented lock file updates in tfupdate myself without Terraform CLI, knowing that it is implementation details of Terraform.

The tfupdate v0.7.0 introduced a new tfupdate lock command, which parses the required_providers block in your configuration, downloads provider packages, and calculates hash values under the hood. The most important point is that it caches calculated hash values in memory, giving us a huge performance advantage when updating multiple directories using the recursive option. For details, see https://github.com/minamijoyo/tfupdate/pull/90

I know this isn't the ideal solution, but it helps me get things done in practice. I'll keep this issue open, hoping that the Terraform Registry protocol will improve in the future.

jdolitsky commented 7 months ago

I just created this ~50-line bash script to update all h1:... hashes in the lockfile for the 4 platforms darwin/amd64, darwin/arm64,linux/amd64, linux/arm64: https://gist.github.com/jdolitsky/dd100e362a0d722a0a423b4140ee8959