Open minamijoyo opened 3 years ago
This would be very helpful for renovate too, see renovatebot/renovate#7895
Hi @minamijoyo! Thanks for raising this.
We have a few different technical constraints to work through in order to achieve what you described here. I'm going to describe them here just to kick off a potential discussion about what technical solutions might change these constraints, and not meaning to say that we don't want to achieve these goals.
zh:
) schemeIn Terraform v0.12 and earlier, Terraform always downloaded providers from the site releases.hashicorp.com
, which is a system that just generically hosts .zip
archives and other distribution archives for various HashiCorp products. Part of the design of that system is that each "directory" of archives includes a _SHA256SUMS
file containing hashes of the files inside. Because that system is not Terraform-specific, it treats the files as opaque bytes and so captures the hash of the archive as a whole rather than of the content of the archive. There's also a cryptographic signature covering the _SHA256SUMS
file as a whole, which Terraform has historically verified as an additional integrity check.
In order to make gradual progress towards the decentralized provider namespace, the official registry of providers on Terraform Registry was initially built essentially as just an index for content already on releases.hashicorp.com
, and so the registry protocol was designed around the data that was available there. It therefore inherited the design of using checksums of the entire archive rather than of the contents.
Terraform v0.14 changed Terraform CLI's installation behavior but did not have any corresponding change to the registry protocol, so for the initial release we made the compromise of preserving the old hashing scheme as zh:
in the lock file, and seeding the lock file with all of the zh:
hashes that were signed with the provider developer's key.
Because the registry protocol currently has no awareness of the possibility of multiple hashing schemes, there is no way for it to return h1:
hashes or for the provider developer to sign them. Fully supporting h1:
hashes without downloading actual provider packages will therefore require an extension to the provider registry protocol.
zh:
schemeThe original motivation for introducing this new h1:
hashing scheme is that local filesystem mirrors typically contain unpacked provider packages -- the extracted contents of the .zip
file -- rather than the original .zip
file. Therefore there is no way to recover a zh:
hash from such a mirror, and so Terraform in that case locks only the specific h1:
hash for providers it finds via local mirrors.
For in-house providers that are only ever distributed by local mirrors, this is typically not a big problem: there is no real origin registry for those providers and so they will only have h1:
hashes, and not zh:
hashes. However, it is more bothersome when a mirror contains a provider that is also available from an origin registry, because the lock file can end up in a different state depending on which installation method was most recently used.
It's for that reason that we introduced the terraform providers lock
command: it serves as a way to proactively populate all of the hashes you might need for a particular provider regardless of installation method, rather than gradually switching between them as a side-effect of running terraform init
.
However, that command is only able to include checksums it has access to, which (because of the other constraints) is only zh:
checksums signed by the original registry -- if installing from a registry -- and h1:
checksums from packages Terraform already has on local disk and so can calculate the checksums itself.
An interesting detail of installing from an origin registry (rather than a mirror) is that Terraform will only download the package for the current platform but it will lock all of the zh:
checksums across all platforms.
To achieve this safely -- that is, to avoid someone "poisoning" the hash file with invalid hashes -- Terraform requires the following conditions to hold in order to lock all of the checksums:
After seeding with the full set of zh:
hashes, Terraform will also add any h1:
hash it encounters where the package also matches one of the existing zh:
hashes, allowing Terraform to gradually learn about the valid h1:
hashes and add them autoamtically.
However, this mechanism assumes that the first download will be from a registry. If the first download is from a mirror then Terraform cannot rely on the signature to track the full set of hashes, and so it will conservatively only lock checksums it was able to compute itself, locally. That is why building your lock file from terraform providers lock
requires downloading the packages, and why using a mirror with terraform providers lock
produces an incomplete result.
With all of the above said, the current system obviously assumes that changes to the locked providers are driven by humans intentionally upgrading from the remote registry and submitting the result, and it isn't well suited to systems like tfupdate
and Renovate whose goal is to pre-emptively propose upgrades.
There is a philosophical point here about whether constantly upgrading is better than staying on known-good versions that meet your needs, but I don't have a strong opinion on that point and I would like to support both approaches, if we can find a technical design that does so either by working within the existing technical constraints or by making backward-compatible changes to the existing design to loosen those constraints.
Making the registry protocol support other hashing schemes is an improvement we already discussed and hope to do, but changes to an established protocol require far more coordination between systems and so we were unable to include that in the v0.14.0 scope. Extending the protocol to generalize the hashing was, however, a next step I already discussed with the team who maintains the Terraform Registry and so they are aware of the need though we will need to work with them to design a concrete protocol that they are comfortable to implement within their own technical constraints.
I don't think that protocol change will address everything here, but I think it will loosen the constraints enough to give us some leeway to better address the other concerns.
@apparentlymart Thank you for your comprehensive explanation. Since the Terraform Registry cannot return h1 hashes for now, it's unavoidable to download each provider package at least once.
With that in mind, I would like to discuss a short term workaround without changing the registry protocol. My top priority is suppressing the unexpected git diff caused by terraform init
.
I came up with some ideas. Let me share them:
(1) Use h1 hashes only in a lock file
When we run terraform init
without any mirror or cache, we verify a checksum and signature for zip before unpack but compare the unpacked content with only h1 hashes in a lock file and never add zh hashes.
If a lock file contains h1 hashes and no zh hashes, I think adding new zh hashes doesn't make sense at least in my case, but I'm not sure it's suitable for all cases. If there are some meanings in some cases, it might be better to add a new flag or option.
(2) Calculate zh hashes if a mirror contains a zip file
A local filesystem mirror supports both unpacked and packed layouts and the terraform providers mirror
command seems to save a provider package as the packed layout by default. That is, we potentially have a chance to calculate zh hashes if packed. If we want to verify a checksum, we can store not only the zip file, but also the checksum and the signature file into the local filesystem mirror.
(3) Add zh hashes without download Strictly speaking, checksums are immutable and can be verified by a signature. That is, we actually doesn't need to download a zip file to add zh hashes to the lock file. (Of course, it still needs to calculate h1 hashes.) It's a bit tricky but we may be able to relax the constraints.
Do these conflict with the current security model or other constraints?
(4) Add a new environment variable or CLI flag to suppress updating the lockfile
I found this discussion: https://github.com/hashicorp/terraform/issues/27241#issuecomment-748598577
Envvar or cli flag to disable updating the lockfile
Even though this doesn't resolve all problems described in my issue, I think it has the same effect as (1) in my workflow. If we could suppress the unintended git diff, it wouldn't conflict with the workflow which updates the lockfile in CI.
@apparentlymart Is it acceptable? If it's worth considering adding a new option to suppress updating the lockfile, I'll open a new feature request.
Thanks!
Hi @minamijoyo,
Focusing on your latest comment only for now (I'll review the other ideas in more detail soon), I think it could be reasonable to add a new option to terraform init
to force it to treat the lock file as read-only, similar to Go's -mod=readonly
option for various Go toolchain commands.
My initial idea for that interface would be terraform init -lockfile=readonly
, which would establish an entirely new option -lockfile
which could potentially take other values in the future if we find use-cases for other behavior variants.
I think the main requirement for that option is that installation can succeed as long as all packages can be verified with information already recorded in the lock file. The option would disable Terraform from updating the file to record any new information it learned (such as a hash using a new scheme) but Terraform would still rely on and check against the information already recorded.
It seems to follow then that -upgrade
would be incompatible with -lockfile=readonly
, because terraform init
should not install a new version without updating the lock file to reflect the checksums of what it installed. If it did so then any subsequent operation would fail because the lock file would not match the local cache in .terraform/providers
. Using these two options together should therefore be an error, I think.
I've opened a new feature request for terraform init -lockfile=readonly
to clarify the scope for the partial workaround: https://github.com/hashicorp/terraform/issues/27506
Is there any progress on this issue? I find that terraform providers lock
is failing very frequently with errors like this:
Error: Could not retrieve providers for locking
│
│ Terraform failed to fetch the requested providers for darwin_arm64 in order
│ to calculate their checksums: some providers could not be installed:
│ - registry.terraform.io/hashicorp/external: could not query provider
│ registry for registry.terraform.io/hashicorp/external: the request failed
│ after 2 attempts, please try again later: Get
│ "https://registry.terraform.io/v1/providers/hashicorp/external/versions":
│ net/http: request canceled while waiting for connection (Client.Timeout
│ exceeded while awaiting headers).
╵
It may work on the next attempt, or it fails on a different provider.
Even if it's not possible to eliminate all the downloads, it would be a major improvement if only the missing providers were downloaded. Most of the time only one provider at a time is getting upgraded, so the hashes for the others are already in the lockfile. It shouldn't be necessary to download those again.
For those who struggle to update multiple .terraform.lock.hcl at scale
After more than two years of waiting for Terraform Registry protocol change with no progress, I finally implemented lock file updates in tfupdate myself without Terraform CLI, knowing that it is implementation details of Terraform.
The tfupdate v0.7.0 introduced a new tfupdate lock command, which parses the required_providers block in your configuration, downloads provider packages, and calculates hash values under the hood. The most important point is that it caches calculated hash values in memory, giving us a huge performance advantage when updating multiple directories using the recursive option. For details, see https://github.com/minamijoyo/tfupdate/pull/90
I know this isn't the ideal solution, but it helps me get things done in practice. I'll keep this issue open, hoping that the Terraform Registry protocol will improve in the future.
I just created this ~50-line bash script to update all h1:...
hashes in the lockfile for the 4 platforms darwin/amd64
, darwin/arm64
,linux/amd64
, linux/arm64
: https://gist.github.com/jdolitsky/dd100e362a0d722a0a423b4140ee8959
Current Terraform Version
Use-cases
I have 200+ root modules and I'm automating a provider version up workflow in CI with tfupdate, which updates all version constraints in Terraform configurations recursively. My laptop is macOS and CI is Linux, so I want to pre-populate hash values for all platforms I need in the workflow to avoid a checksum mismatch error.
I'm looking for an efficient way to maintain
.terraform.lock.hcl
for multiple root modules and platforms environments.Attempted Solutions
For downloading providers:
As you know, there are two hash formats, that is, zh and h1, and zh is recorded only when a provider zip package is downloaded from Terraform Registry. I want to avoid redundant downloads because I have a lot of root modules, so I tried to create a local filesystem mirror with the
terraform providers mirror
command at git project repository root, and then generated lock files at each sub directory with theterraform providers lock
command using the mirror. It recorded only h1 hash values. It's ok.I wrote a script to generate
.terraform.lock.hcl
for multiple directories and platforms.However, with the lock file recorded only h1 hash values, if I run
terraform init
without any mirror or cache, the init command adds zh hash values to the lock file. It causes an unexpected lock file change.I expect
terraform init
without the-upgrade
flag not to update the existing lock file, becauseterraform init
is essential command for all workflows and I think it's not desirable to cause a git diff unexpectedly. Is it intentional by design or a bug?Pre-populating zh hash values requires redundant downloads, so I want to avoid it. My pain points are:
If my understanding is correct, it would be great if the Terraform Registry returns not only zh hash values, but also h1 hash values for all platforms without download.
For downloading modules:
If a root module depends on other modules, the
terraform providers lock
command requiresterraform init
. I understand selecting a correct version needs all module contents for completeness.However, before Terraform v0.14, there was no lock file, so I've already defined all provider dependencies I need in the
required_providers
block at each root module. In that case, it's not always necessary to get all modules because I know it. It works but it's inefficient. It would be great if we can have a shallow check option at own risk.Proposal
Allow the
terraform providers lock
command to generate.terraform.lock.hcl
including zh and h1 hash values for given platforms fromrequired_providers
block without downloading providers and modules.References
Related to: