hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.5k stars 9.52k forks source link

"terraform providers" command must succeed even if the lock file or provider cache directory is invalid #31136

Open raffato opened 2 years ago

raffato commented 2 years ago

Current Terraform Version

1.2.1

Use-cases

My company uses Apple and most of my team now have M1 Macbooks. I try to run terraform init but the codebase has 50+ instances of data "template_file".... Fixing this seems to be straight-forward - replace template_file data sources with templatefile() functions. But when I'm done, terraform init still fails because "something" still wants the template provider. I try to run terraform providers to identify what requires it, but this fails without giving any helpful information because the provider is missing. Duh! I wonder if sub-modules require it, so I unpin all module versions, but terraform init still fails with this kinda useless message:

Error: Required plugins are not installed
The installed provider plugins are not consistent with the packages selected in the dependency lock file:
 - registry.terraform.io/hashicorp/template: there is no package for registry.terraform.io/hashicorp/template 2.2.0 cached in .terraform/providers

Terraform uses external plugins to integrate with a variety of different infrastructure services. To download the plugins required for this configuration,
run:
  terraform init

TF_LOG=TRACE generates a ton of text but nothing that points to a root cause. After cursing Terraform for a day, the next morning (showerthoughs) it dawns on me that Terraform needs the provider because the current state references it. But I have to guess this because I can't run terraform providers.

Attempted Solutions

Tried to run terraform providers but it fails with this error:

╷
Error: Required plugins are not installed
The installed provider plugins are not consistent with the packages selected in the dependency lock file:
   - registry.terraform.io/hashicorp/template: there is no package for registry.terraform.io/hashicorp/template 2.2.0 cached in .terraform/providers

Terraform uses external plugins to integrate with a variety of different infrastructure services. To download the plugins required for this configuration, run:
   terraform init

Proposal

  1. Allow terraform providers to provide partial information! Surely it knows which providers are currently required, even if they're not installed? Should it not simply flag that it doesn't have specific version information because terraform init is required, instead of bailing out and providing nothing?
  2. If a plugin cannot be found, the same partial providers output would be useful in the terraform init error message. For example, a much more useful error message could look like this:
Error: Required plugins are not installed

The installed provider plugins are not consistent with the packages selected in the dependency lock file:
  - registry.terraform.io/hashicorp/template: there is no package for registry.terraform.io/hashicorp/template 2.2.0 cached in .terraform/providers

Providers required by configuration:
├── module.database
└── module.vault
    └── provider[registry.terraform.io/hashicorp/template]

Providers required by state:
    provider[registry.terraform.io/hashicorp/template]

Terraform uses external plugins to integrate with a variety of different infrastructure services. To download the plugins required for this configuration,
run:
   terraform init

References

https://github.com/hashicorp/terraform/issues/29993#issuecomment-1023787499

kmoe commented 2 years ago

Thanks for the issue report. Verified that the "Required plugins are not installed" error happens if you have a lockfile with providers from a non-matching architecture, then run terraform providers. It does seem that we can do better than this error and should be able to output a list of providers even in this case.

If you delete the lockfile, terraform providers will give the output you describe in the proposal, distinguishing between providers required by config and providers required by state. Is this a suitable workaround for your issue?

Tagged as bug because I think we can fix the output of terraform providers and make it more helpful in such cases.

apparentlymart commented 2 years ago

Yes, indeed it was an original design goal for terraform providers to make a best effort to describe the situation even if it's currently not totally valid, because it's primary purpose is to help with debugging problems like this.

It looks like when we later retrofitted the dependency lock file it introduced some new error cases we didn't know to expect when originally implementing that command, and so it currently fails on those. I think we should be able to make it more resilient so that e.g. if it detects any problems related to the provider cache consistency with the lock file it will still proceed as if there had been no lock file at all, and then emit the error diagnostics only after it's displayed the subset of information it was able to determine.

ezpuzz commented 2 years ago

Wasted a good amount of time today because grepping repo returned no results for pulling in the problem dependency. Turned out terraform providers has a section Providers required by state: that gives me the root of the problem.

This has to do with hashicorp/template in my case.

apparentlymart commented 2 years ago

I think the situation has changed a little since this issue was opened, but I don't think it's fully fixed.

What's changed is that the terraform init messaging should now recommend to run terraform providers to see where all of the dependencies are coming from.

However, I don't think we've yet made any changes to avoid terraform providers from failing when the lock file is inconsistent with the cache. I suggest that we consider this issue to represent that bug, and therefore it's fixed once we're sure that terraform providers can always produce at least a partial result even if the provider cache directory is incomplete or the lock file is somehow invalid.


I think there is a broader question here about whether it would be viable to ignore data resources in the state when deciding what the state depends on. I don't think we'll be able to address that here because it's a more invasive thing to change, but it is interesting to note that installin providers for data resources that only exist in the state is only in service of some relatively unimportant situations:

In particular, terraform apply (and terraform plan) don't need to involve a provider to deal with a data resource that has been removed from the configuration already, because the only reasonable action in that case is to remove the stale object from the state entirely and Terraform Core can just do that itself without any need to inspect the provider-specific result data.

I wonder about somehow loosening the design requirements for terraform console and terraform show so that they can be permitted to just assume any data resource that isn't in the configuration doesn't exist at all, even if it does happen to still exist in the state. It seems relatively unlikely that someone would really need to look at the stale previous result from a data resource they've now removed, but that's just a hunch on my part.

I don't think we should block on resolving this question in order to fix the terraform providers bug, but it would be nice to avoid this problem in the first place by not even trying to install the hashicorp/template provider once all of data blocks using it are removed from the configuration.