hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.75k stars 9.56k forks source link

terraform validate plugin resolution breaks on huge TMPDIR #32787

Open terlar opened 1 year ago

terlar commented 1 year ago

Terraform Version

Terraform v1.3.9

Terraform Configuration Files

resource "local_file" "foo" {
  content  = "foo!"
  filename = "${path.module}/foo.bar"
}

Debug Output

2023-03-06T23:54:49.211+0100 [INFO]  Terraform version: 1.3.9
2023-03-06T23:54:49.211+0100 [DEBUG] using github.com/hashicorp/go-tfe v1.9.0
2023-03-06T23:54:49.211+0100 [DEBUG] using github.com/hashicorp/hcl/v2 v2.16.0
2023-03-06T23:54:49.211+0100 [DEBUG] using github.com/hashicorp/terraform-config-inspect v0.0.0-20210209133302-4fd17a0faac2
2023-03-06T23:54:49.211+0100 [DEBUG] using github.com/hashicorp/terraform-svchost v0.0.0-20200729002733-f050f53b9734
2023-03-06T23:54:49.211+0100 [DEBUG] using github.com/zclconf/go-cty v1.12.1
2023-03-06T23:54:49.211+0100 [INFO]  Go runtime version: go1.19.6
2023-03-06T23:54:49.211+0100 [INFO]  CLI args: []string{"./terraform", "validate"}
2023-03-06T23:54:49.211+0100 [TRACE] Stdout is a terminal of width 106
2023-03-06T23:54:49.211+0100 [TRACE] Stderr is a terminal of width 106
2023-03-06T23:54:49.211+0100 [TRACE] Stdin is a terminal
2023-03-06T23:54:49.211+0100 [DEBUG] Attempting to open CLI config file: /home/terje.larsen/.terraformrc
2023-03-06T23:54:49.211+0100 [DEBUG] File doesn't exist, but doesn't need to. Ignoring.
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory terraform.d/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /home/terje.larsen/.terraform.d/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /home/terje.larsen/.local/share/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /nix/var/nix/profiles/default/share/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /home/terje.larsen/.nix-profile/share/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /usr/share/ubuntu/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /usr/local/share/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /usr/share/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /var/lib/snapd/desktop/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /nix/var/nix/profiles/default/share/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /home/terje.larsen/.nix-profile/share/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /usr/share/ubuntu/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /usr/local/share/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /usr/share/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /var/lib/snapd/desktop/terraform/plugins
2023-03-06T23:54:49.211+0100 [DEBUG] ignoring non-existing provider search directory /var/lib/snapd/desktop/terraform/plugins
2023-03-06T23:54:49.211+0100 [INFO]  CLI command args: []string{"validate"}
2023-03-06T23:54:49.212+0100 [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers
2023-03-06T23:54:49.212+0100 [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/local v2.3.0 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/local/2.3.0/linux_amd64
2023-03-06T23:54:49.212+0100 [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/local/2.3.0/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/local 2.3.0
2023-03-06T23:54:49.261+0100 [DEBUG] checking for provisioner in "."
2023-03-06T23:54:49.261+0100 [DEBUG] checking for provisioner in "/home/terje.larsen/terraform_test"
2023-03-06T23:54:49.262+0100 [TRACE] terraform.NewContext: starting
2023-03-06T23:54:49.262+0100 [TRACE] terraform.NewContext: complete
2023-03-06T23:54:49.262+0100 [DEBUG] Building and walking validate graph
2023-03-06T23:54:49.262+0100 [TRACE] building graph for walkValidate
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.ConfigTransformer
2023-03-06T23:54:49.262+0100 [TRACE] ConfigTransformer: Starting for path:
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.ConfigTransformer with new graph:
  local_file.foo - *terraform.NodeValidatableResource
  ------
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.RootVariableTransformer
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.RootVariableTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.ModuleVariableTransformer
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.ModuleVariableTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.LocalTransformer
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.LocalTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.OutputTransformer
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.OutputTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.OrphanResourceInstanceTransformer
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.OrphanResourceInstanceTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.StateTransformer
2023-03-06T23:54:49.262+0100 [TRACE] StateTransformer: pointless no-op call, creating no nodes at all
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.StateTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.AttachStateTransformer
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.AttachStateTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.OrphanOutputTransformer
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.OrphanOutputTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.AttachResourceConfigTransformer
2023-03-06T23:54:49.262+0100 [TRACE] AttachResourceConfigTransformer: attaching to "local_file.foo" (*terraform.NodeValidatableResource) config from main.tf:1,1-28
2023-03-06T23:54:49.262+0100 [TRACE] AttachResourceConfigTransformer: attaching provider meta configs to local_file.foo
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.AttachResourceConfigTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.graphTransformerMulti
2023-03-06T23:54:49.262+0100 [TRACE] (graphTransformerMulti) Executing graph transform *terraform.ProviderConfigTransformer
2023-03-06T23:54:49.262+0100 [TRACE] (graphTransformerMulti) Completed graph transform *terraform.ProviderConfigTransformer with new graph:
  local_file.foo - *terraform.NodeValidatableResource
  ------
2023-03-06T23:54:49.262+0100 [TRACE] (graphTransformerMulti) Executing graph transform *terraform.MissingProviderTransformer
2023-03-06T23:54:49.262+0100 [DEBUG] adding implicit provider configuration provider["registry.terraform.io/hashicorp/local"], implied first by local_file.foo
2023-03-06T23:54:49.262+0100 [TRACE] (graphTransformerMulti) Completed graph transform *terraform.MissingProviderTransformer with new graph:
  local_file.foo - *terraform.NodeValidatableResource
  provider["registry.terraform.io/hashicorp/local"] - *terraform.NodeApplyableProvider
  ------
2023-03-06T23:54:49.262+0100 [TRACE] (graphTransformerMulti) Executing graph transform *terraform.ProviderTransformer
2023-03-06T23:54:49.262+0100 [TRACE] ProviderTransformer: exact match for provider["registry.terraform.io/hashicorp/local"] serving local_file.foo
2023-03-06T23:54:49.262+0100 [DEBUG] ProviderTransformer: "local_file.foo" (*terraform.NodeValidatableResource) needs provider["registry.terraform.io/hashicorp/local"]
2023-03-06T23:54:49.262+0100 [TRACE] (graphTransformerMulti) Completed graph transform *terraform.ProviderTransformer with new graph:
  local_file.foo - *terraform.NodeValidatableResource
    provider["registry.terraform.io/hashicorp/local"] - *terraform.NodeApplyableProvider
  provider["registry.terraform.io/hashicorp/local"] - *terraform.NodeApplyableProvider
  ------
2023-03-06T23:54:49.262+0100 [TRACE] (graphTransformerMulti) Executing graph transform *terraform.PruneProviderTransformer
2023-03-06T23:54:49.262+0100 [TRACE] (graphTransformerMulti) Completed graph transform *terraform.PruneProviderTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.graphTransformerMulti with new graph:
  local_file.foo - *terraform.NodeValidatableResource
    provider["registry.terraform.io/hashicorp/local"] - *terraform.NodeApplyableProvider
  provider["registry.terraform.io/hashicorp/local"] - *terraform.NodeApplyableProvider
  ------
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.RemovedModuleTransformer
2023-03-06T23:54:49.262+0100 [TRACE] Completed graph transform *terraform.RemovedModuleTransformer (no changes)
2023-03-06T23:54:49.262+0100 [TRACE] Executing graph transform *terraform.AttachSchemaTransformer
2023-03-06T23:54:49.262+0100 [TRACE] terraform.contextPlugins: Initializing provider "registry.terraform.io/hashicorp/local" to read its schema
2023-03-06T23:54:49.262+0100 [DEBUG] created provider logger: level=trace
2023-03-06T23:54:49.262+0100 [INFO]  provider: configuring client automatic mTLS
2023-03-06T23:54:49.284+0100 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/local/2.3.0/linux_amd64/terraform-provider-local_v2.3.0_x5 args=[.terraform/providers/registry.terraform.io/hashicorp/local/2.3.0/linux_amd64/terraform-provider-local_v2.3.0_x5]
2023-03-06T23:54:49.284+0100 [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/local/2.3.0/linux_amd64/terraform-provider-local_v2.3.0_x5 pid=3018006
2023-03-06T23:54:49.284+0100 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/local/2.3.0/linux_amd64/terraform-provider-local_v2.3.0_x5
2023-03-06T23:54:49.288+0100 [ERROR] provider.terraform-provider-local_v2.3.0_x5: plugin init error: error="listen unix /tmp/tmp.bUsrhhhWdZ-this-is-a-very-very-very-very-very-very-very-very-very-very-long-long-path/plugin1330923084: bind: invalid argument" timestamp=2023-03-06T23:54:49.288+0100
2023-03-06T23:54:49.289+0100 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/local/2.3.0/linux_amd64/terraform-provider-local_v2.3.0_x5 pid=3018006
2023-03-06T23:54:49.289+0100 [TRACE] Completed graph transform *terraform.AttachSchemaTransformer (no changes)
╷
│ Error: failed to read schema for local_file.foo in registry.terraform.io/hashicorp/local: failed to instantiate provider "registry.terraform.io/hashicorp/local" to obtain schema: Unrecognized remote plugin message: 
│ 
│ This usually means that the plugin is either invalid or simply
│ needs to be recompiled to support the latest protocol.
│ 
│ 
╵
2023-03-06T23:54:49.289+0100 [WARN]  provider: plugin failed to exit gracefully

Expected Behavior

Validation should work without errors.

Actual Behavior

Plugin resolution fails:

╷
│ Error: failed to read schema for local_file.foo in registry.terraform.io/hashicorp/local: failed to instantiate provider "registry.terraform.io/hashicorp/local" to obtain schema: Unrecognized remote plugin message: 
│ 
│ This usually means that the plugin is either invalid or simply
│ needs to be recompiled to support the latest protocol.

Steps to Reproduce

  1. TMPDIR="$(mktemp --directory --suffix -this-is-a-very-very-very-very-very-very-very-very-very-very-long-long-path)"
  2. export TMPDIR
  3. export TF_LOG=TRACE
  4. terraform init -backend=false
  5. terraform validate

Additional Context

I discovered this when using a build environment that sets a unique TMPDIR for each build and it is based on the name of the build target. Which in my case turned out to be quite long.

References

No response

apparentlymart commented 1 year ago

Hi @terlar! Thanks for reporting this.

Unfortunately the limit on the possible length of the path to a Unix domain socket is something decided by your operating system and not something Terraform can directly control. Therefore I'm not sure what we could change that would make this better, other than perhaps to handle this error better.

I think the reason this failed so oddly is because the problem is actually happening inside the hashicorp/local provider plugin, and so it's failing to complete the plugin handshake. Terraform Core therefore only knows that the plugin didn't speak the protocol correctly, and cannot determine why because there's not yet any real communication channel between the two -- the socket it was trying to create is the communication channel.

Therefore I think while this is unfortunate there might not be anything we could do to improve this. While looking to confirm that I was understanding this problem correctly I found https://github.com/golang/go/issues/6895 which seems to confirm that Go is just passing the path on to the bind function and the OS is the one declaring that it's an invalid argument. On Linux the maximum supported length is 108 characters.

There is a passing mention in that issue about the "abstract namespace" for Unix domain sockets, which is a Linux-specific extension. We could perhaps investigate that as an alternative protocol when running on Linux systems in particular, but since limitations on Unix domain socket path length exist on most (all?) Unix-derived OSes that would not be a complete solution.

terlar commented 1 year ago

Interesting, it was indeed very confusing and a bit tricky to track down. I did see the [bind](bind: invalid argument) error but wasn't sure where it came from and was not easy to search for.

Perhaps the path exceeding 108 characters could be detected early and raise a warning/affect the final error message. Or would that limit other platforms?

apparentlymart commented 1 year ago

I think the fact that this logic lives inside the provider plugins themselves will be a limiting factor on all solutions: until we have the communication channel available there isn't really any way for the plugin to tell Terraform Core why it's failing, aside from just writing arbitrary logs to stderr (which Terraform Core just copies verbatim into its own logs, because they are human-oriented logs rather than machine-oriented logs.)

I think one possible path here would be to extend the plugin handshake protocol with an explicit failure response. Today the only valid response from a provider during handshake is to announce which Unix socket it's listening on (along with some other session metadata), but we could potentially teach providers to respond to certain handshake-related errors by writing something we can unambiguously recognize as an error instead of the normal handshake message, and therefore let providers pass an arbitrary failure message hopefully saying a little more about why they couldn't initialize.

With that in place, the provider-side protocol implementation code could potentially notice when the socket bind returns "invalid argument" and return a more useful error message, which would hopefully include a hint about temporary directory path lengths if the plugin knows it was trying to create a Unix socket when it got that error.

Another possibility would be for the plugins to try to fall back to using a TCP socket if they fail to bind to a Unix socket with this particular error. The plugin protocol supports both Unix sockets and TCP sockets at the option of the plugin, but today the rule is that TCP sockets only get used when running on Windows while Unix sockets get used on all other platforms. I think if we wanted to go this route we'd need to do some further research to think about whether there would be any unwanted consequences of that fallback -- for example, if running on a system where opening a TCP listen socket on localhost would be problematic for some reason -- but technically I think the protocol already has everything it needs to be able to support that automatic negotiation without any changes to Terraform Core.