hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.4k stars 9.5k forks source link

`terraform init` timing out when installing AWS provider #30846

Open dmandyna opened 2 years ago

dmandyna commented 2 years ago

Terraform Version

Terraform v1.1.7 
Terraform v1.1.0 
Terraform v1.1.8 

Expected Behavior

Terraform installs all the providers during the terraform init.

Actual Behavior

terraform init fails to install AWS provider when using Terraform versions specified above but it works as expected when using Terraform v1.0.0.

Steps to Reproduce

We've got a number of projects (8) that we validate at the beginning of our release pipeline, that we run terraform init and terraform validate against. Not sure if the issue could be related to how many projects we try to init to in a short amount of time.

Additional Context

I've tried adding additional sleep between terraform init's for different projects, it increased our success rate, but it was still failing in about 30% of the runs.

The error message is always the same - it is failing to install the AWS provider.

We've tried previous versions of Terraform (tested versions above), previous versions of AWS provider, but it doesn't seem to matter, unless we set our Terraform version to 1.0.0.

TRACE Logs

``` Initializing provider plugins... - Finding hashicorp/time versions matching "0.7.1"... 2022-04-12T15:29:27.405Z [DEBUG] Service discovery for registry.terraform.io at https://registry.terraform.io/.well-known/terraform.json 2022-04-12T15:29:27.405Z [TRACE] HTTP client GET request to https://registry.terraform.io/.well-known/terraform.json 2022-04-12T15:29:27.421Z [DEBUG] GET https://registry.terraform.io/v1/providers/hashicorp/time/versions 2022-04-12T15:29:27.421Z [TRACE] HTTP client GET request to https://registry.terraform.io/v1/providers/hashicorp/time/versions - Finding hashicorp/aws versions matching "~> 3.0, >= 3.50.0"... 2022-04-12T15:29:27.432Z [DEBUG] GET https://registry.terraform.io/v1/providers/hashicorp/aws/versions 2022-04-12T15:29:27.432Z [TRACE] HTTP client GET request to https://registry.terraform.io/v1/providers/hashicorp/aws/versions - Finding latest version of hashicorp/local... 2022-04-12T15:29:27.447Z [DEBUG] GET https://registry.terraform.io/v1/providers/hashicorp/local/versions 2022-04-12T15:29:27.447Z [TRACE] HTTP client GET request to https://registry.terraform.io/v1/providers/hashicorp/local/versions - Finding hashicorp/random versions matching ">= 2.2.0"... 2022-04-12T15:29:27.459Z [DEBUG] GET https://registry.terraform.io/v1/providers/hashicorp/random/versions 2022-04-12T15:29:27.459Z [TRACE] HTTP client GET request to https://registry.terraform.io/v1/providers/hashicorp/random/versions - Finding latest version of hashicorp/tls... 2022-04-12T15:29:27.471Z [DEBUG] GET https://registry.terraform.io/v1/providers/hashicorp/tls/versions 2022-04-12T15:29:27.471Z [TRACE] HTTP client GET request to https://registry.terraform.io/v1/providers/hashicorp/tls/versions 2022-04-12T15:29:27.484Z [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers 2022-04-12T15:29:27.484Z [TRACE] getproviders.SearchLocalDirectory: failed to resolve symlinks for .terraform/providers: lstat .terraform/providers: no such file or directory 2022-04-12T15:29:27.484Z [TRACE] providercache.fillMetaCache: error while scanning directory .terraform/providers: cannot search .terraform/providers: lstat .terraform/providers: no such file or directory 2022-04-12T15:29:27.484Z [DEBUG] GET https://registry.terraform.io/v1/providers/hashicorp/random/3.1.2/download/linux/amd64 2022-04-12T15:29:27.484Z [TRACE] HTTP client GET request to https://registry.terraform.io/v1/providers/hashicorp/random/3.1.2/download/linux/amd64 2022-04-12T15:29:27.496Z [DEBUG] GET https://releases.hashicorp.com/terraform-provider-random/3.1.2/terraform-provider-random_3.1.2_SHA256SUMS 2022-04-12T15:29:27.496Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-random/3.1.2/terraform-provider-random_3.1.2_SHA256SUMS 2022-04-12T15:29:27.512Z [DEBUG] GET https://releases.hashicorp.com/terraform-provider-random/3.1.2/terraform-provider-random_3.1.2_SHA256SUMS.72D7468F.sig 2022-04-12T15:29:27.512Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-random/3.1.2/terraform-provider-random_3.1.2_SHA256SUMS.72D7468F.sig - Installing hashicorp/random v3.1.2... 2022-04-12T15:29:27.514Z [TRACE] providercache.Dir.InstallPackage: installing registry.terraform.io/hashicorp/random v3.1.2 from https://releases.hashicorp.com/terraform-provider-random/3.1.2/terraform-provider-random_3.1.2_linux_amd64.zip 2022-04-12T15:29:27.514Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-random/3.1.2/terraform-provider-random_3.1.2_linux_amd64.zip 2022-04-12T15:29:27.563Z [DEBUG] Provider signed by 34365D9472D7468F HashiCorp Security (hashicorp.com/security) 2022-04-12T15:29:27.675Z [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers 2022-04-12T15:29:27.676Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/random v3.1.2 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/random/3.1.2/linux_amd64 2022-04-12T15:29:27.676Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/random/3.1.2/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/random 3.1.2 - Installed hashicorp/random v3.1.2 (signed by HashiCorp) 2022-04-12T15:29:27.718Z [TRACE] providercache.fillMetaCache: using cached result from previous scan of .terraform/providers 2022-04-12T15:29:27.718Z [DEBUG] GET https://registry.terraform.io/v1/providers/hashicorp/tls/3.3.0/download/linux/amd64 2022-04-12T15:29:27.718Z [TRACE] HTTP client GET request to https://registry.terraform.io/v1/providers/hashicorp/tls/3.3.0/download/linux/amd64 2022-04-12T15:29:27.731Z [DEBUG] GET https://releases.hashicorp.com/terraform-provider-tls/3.3.0/terraform-provider-tls_3.3.0_SHA256SUMS 2022-04-12T15:29:27.731Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-tls/3.3.0/terraform-provider-tls_3.3.0_SHA256SUMS 2022-04-12T15:29:27.743Z [DEBUG] GET https://releases.hashicorp.com/terraform-provider-tls/3.3.0/terraform-provider-tls_3.3.0_SHA256SUMS.72D7468F.sig 2022-04-12T15:29:27.743Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-tls/3.3.0/terraform-provider-tls_3.3.0_SHA256SUMS.72D7468F.sig - Installing hashicorp/tls v3.3.0... 2022-04-12T15:29:27.745Z [TRACE] providercache.Dir.InstallPackage: installing registry.terraform.io/hashicorp/tls v3.3.0 from https://releases.hashicorp.com/terraform-provider-tls/3.3.0/terraform-provider-tls_3.3.0_linux_amd64.zip 2022-04-12T15:29:27.745Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-tls/3.3.0/terraform-provider-tls_3.3.0_linux_amd64.zip 2022-04-12T15:29:27.793Z [DEBUG] Provider signed by 34365D9472D7468F HashiCorp Security (hashicorp.com/security) 2022-04-12T15:29:27.900Z [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers 2022-04-12T15:29:27.900Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/random v3.1.2 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/random/3.1.2/linux_amd64 2022-04-12T15:29:27.901Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/tls v3.3.0 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/tls/3.3.0/linux_amd64 2022-04-12T15:29:27.901Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/tls/3.3.0/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/tls 3.3.0 2022-04-12T15:29:27.901Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/random/3.1.2/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/random 3.1.2 - Installed hashicorp/tls v3.3.0 (signed by HashiCorp) 2022-04-12T15:29:27.941Z [TRACE] providercache.fillMetaCache: using cached result from previous scan of .terraform/providers 2022-04-12T15:29:27.941Z [DEBUG] GET https://registry.terraform.io/v1/providers/hashicorp/time/0.7.1/download/linux/amd64 2022-04-12T15:29:27.941Z [TRACE] HTTP client GET request to https://registry.terraform.io/v1/providers/hashicorp/time/0.7.1/download/linux/amd64 2022-04-12T15:29:27.956Z [DEBUG] GET https://releases.hashicorp.com/terraform-provider-time/0.7.1/terraform-provider-time_0.7.1_SHA256SUMS 2022-04-12T15:29:27.957Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-time/0.7.1/terraform-provider-time_0.7.1_SHA256SUMS 2022-04-12T15:29:27.971Z [DEBUG] GET https://releases.hashicorp.com/terraform-provider-time/0.7.1/terraform-provider-time_0.7.1_SHA256SUMS.72D7468F.sig 2022-04-12T15:29:27.971Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-time/0.7.1/terraform-provider-time_0.7.1_SHA256SUMS.72D7468F.sig - Installing hashicorp/time v0.7.1... 2022-04-12T15:29:27.975Z [TRACE] providercache.Dir.InstallPackage: installing registry.terraform.io/hashicorp/time v0.7.1 from https://releases.hashicorp.com/terraform-provider-time/0.7.1/terraform-provider-time_0.7.1_linux_amd64.zip 2022-04-12T15:29:27.975Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-time/0.7.1/terraform-provider-time_0.7.1_linux_amd64.zip 2022-04-12T15:29:28.068Z [DEBUG] Provider signed by 34365D9472D7468F HashiCorp Security (hashicorp.com/security) 2022-04-12T15:29:28.172Z [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers 2022-04-12T15:29:28.172Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/random v3.1.2 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/random/3.1.2/linux_amd64 2022-04-12T15:29:28.172Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/time v0.7.1 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/time/0.7.1/linux_amd64 2022-04-12T15:29:28.173Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/tls v3.3.0 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/tls/3.3.0/linux_amd64 2022-04-12T15:29:28.173Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/random/3.1.2/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/random 3.1.2 2022-04-12T15:29:28.173Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/time/0.7.1/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/time 0.7.1 2022-04-12T15:29:28.173Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/tls/3.3.0/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/tls 3.3.0 - Installed hashicorp/time v0.7.1 (signed by HashiCorp) 2022-04-12T15:29:28.210Z [TRACE] providercache.fillMetaCache: using cached result from previous scan of .terraform/providers 2022-04-12T15:29:28.210Z [DEBUG] GET https://registry.terraform.io/v1/providers/hashicorp/aws/3.75.1/download/linux/amd64 2022-04-12T15:29:28.210Z [TRACE] HTTP client GET request to https://registry.terraform.io/v1/providers/hashicorp/aws/3.75.1/download/linux/amd64 2022-04-12T15:29:28.224Z [DEBUG] GET https://releases.hashicorp.com/terraform-provider-aws/3.75.1/terraform-provider-aws_3.75.1_SHA256SUMS 2022-04-12T15:29:28.224Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-aws/3.75.1/terraform-provider-aws_3.75.1_SHA256SUMS 2022-04-12T15:29:28.235Z [DEBUG] GET https://releases.hashicorp.com/terraform-provider-aws/3.75.1/terraform-provider-aws_3.75.1_SHA256SUMS.72D7468F.sig 2022-04-12T15:29:28.235Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-aws/3.75.1/terraform-provider-aws_3.75.1_SHA256SUMS.72D7468F.sig - Installing hashicorp/aws v3.75.1... 2022-04-12T15:29:28.237Z [TRACE] providercache.Dir.InstallPackage: installing registry.terraform.io/hashicorp/aws v3.75.1 from https://releases.hashicorp.com/terraform-provider-aws/3.75.1/terraform-provider-aws_3.75.1_linux_amd64.zip 2022-04-12T15:29:28.237Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-aws/3.75.1/terraform-provider-aws_3.75.1_linux_amd64.zip 2022-04-12T15:45:07.220Z [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers 2022-04-12T15:45:07.220Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/random v3.1.2 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/random/3.1.2/linux_amd64 2022-04-12T15:45:07.220Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/time v0.7.1 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/time/0.7.1/linux_amd64 2022-04-12T15:45:07.221Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/tls v3.3.0 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/tls/3.3.0/linux_amd64 2022-04-12T15:45:07.221Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/time/0.7.1/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/time 0.7.1 2022-04-12T15:45:07.221Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/tls/3.3.0/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/tls 3.3.0 2022-04-12T15:45:07.221Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/random/3.1.2/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/random 3.1.2 2022-04-12T15:45:07.221Z [DEBUG] GET https://registry.terraform.io/v1/providers/hashicorp/local/2.2.2/download/linux/amd64 2022-04-12T15:45:07.221Z [TRACE] HTTP client GET request to https://registry.terraform.io/v1/providers/hashicorp/local/2.2.2/download/linux/amd64 2022-04-12T15:45:07.235Z [DEBUG] GET https://releases.hashicorp.com/terraform-provider-local/2.2.2/terraform-provider-local_2.2.2_SHA256SUMS 2022-04-12T15:45:07.235Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-local/2.2.2/terraform-provider-local_2.2.2_SHA256SUMS 2022-04-12T15:45:07.250Z [DEBUG] GET https://releases.hashicorp.com/terraform-provider-local/2.2.2/terraform-provider-local_2.2.2_SHA256SUMS.72D7468F.sig 2022-04-12T15:45:07.250Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-local/2.2.2/terraform-provider-local_2.2.2_SHA256SUMS.72D7468F.sig - Installing hashicorp/local v2.2.2... 2022-04-12T15:45:07.251Z [TRACE] providercache.Dir.InstallPackage: installing registry.terraform.io/hashicorp/local v2.2.2 from https://releases.hashicorp.com/terraform-provider-local/2.2.2/terraform-provider-local_2.2.2_linux_amd64.zip 2022-04-12T15:45:07.251Z [TRACE] HTTP client GET request to https://releases.hashicorp.com/terraform-provider-local/2.2.2/terraform-provider-local_2.2.2_linux_amd64.zip 2022-04-12T15:45:07.290Z [DEBUG] Provider signed by 34365D9472D7468F HashiCorp Security (hashicorp.com/security) 2022-04-12T15:45:07.394Z [TRACE] providercache.fillMetaCache: scanning directory .terraform/providers 2022-04-12T15:45:07.394Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/local v2.2.2 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/local/2.2.2/linux_amd64 2022-04-12T15:45:07.394Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/random v3.1.2 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/random/3.1.2/linux_amd64 2022-04-12T15:45:07.394Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/time v0.7.1 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/time/0.7.1/linux_amd64 2022-04-12T15:45:07.394Z [TRACE] getproviders.SearchLocalDirectory: found registry.terraform.io/hashicorp/tls v3.3.0 for linux_amd64 at .terraform/providers/registry.terraform.io/hashicorp/tls/3.3.0/linux_amd64 2022-04-12T15:45:07.394Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/time/0.7.1/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/time 0.7.1 2022-04-12T15:45:07.394Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/tls/3.3.0/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/tls 3.3.0 2022-04-12T15:45:07.394Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/local/2.2.2/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/local 2.2.2 2022-04-12T15:45:07.394Z [TRACE] providercache.fillMetaCache: including .terraform/providers/registry.terraform.io/hashicorp/random/3.1.2/linux_amd64 as a candidate package for registry.terraform.io/hashicorp/random 3.1.2 - Installed hashicorp/local v2.2.2 (signed by HashiCorp) ╷ │ Error: Failed to install provider │ │ Error while installing hashicorp/aws v3.75.1: read tcp │ [2600:1f18:1a7b:f003:d587:aba5:cd6:4659]:56878->[2a04:4e42:79::439]:443: │ read: connection timed out ╵ ```

crw commented 2 years ago

Hi @dmandyna, just a quick question:

terraform init fails to install AWS provider - works as expected on version Terraform v1.0.0

We've tried previous versions of Terraform, previous versions of AWS provider, but it doesn't seem to matter.

Does downloading hashicorp/aws work consistently on 1.0.0? Or is that also intermittently failing?

dmandyna commented 2 years ago

Hey @crw,

Yes, it seems to be working consistently, I ran a total of around 10 tests yesterday with version 1.0.0 - each test validated around 10 projects, and each of them uses the AWS provider, all of them succeeded.

For comparison, when we used version 1.1.7, someone on my team tried to validate project 11 times, it only succeeded on the last run.

edit:

Does downloading hashicorp/aws work consistently on 1.0.0? Or is that also intermittently failing?

I've noticed that inconsistency in the issue description, I'll get it updated 🙂

apparentlymart commented 2 years ago

From the trace log it seems like Terraform is trying to use IPv6 to install the provider, presumably because it detected that the server supports it and your client has sufficient IPv6 support to be able to at least initially reach the server.

Is it possible that your internet connection supports IPv6 but that the connectivity over that protocol is unreliable in comparison to IPv4? For example, if IPv6 is using an additional tunnel of some sort which could be adding delay or packet loss.

Although we didn't change anything about the installer's network behavior in v1.1, we did (as usual) upgrade to a newer version of the Go standard library for the v1.1 release and so it's possible that we inherited some changes to the IPv6 detection heuristics that are now misclassifying your IPv6 connection as good enough to use when it really is not.

apparentlymart commented 2 years ago

(one way to test this, if you have sufficient access on your system, would be to disable IPv6 on the interface which connects you to the internet and try again. If IPv6 is the problem then that should force using IPv4 and therefore work as expected, rather than hitting this timeout.)

dmandyna commented 2 years ago

@apparentlymart, thanks for getting back to me so quickly, it makes sense. I'll definitely check this out and post a message here to let you know if that resolved the issue.

dmandyna commented 2 years ago

@crw / @apparentlymart I'm unable to disable IPv6 the machine where the task runs, but I tested running a curl from the same machine using both IPv4 and IPv6 to the endpoint, and I've been able to get the .zip packages for AWS provider 3.75.1 successfully.

IPv4 curl

``` sh-4.2$ curl -v -4 https://releases.hashicorp.com/terraform-provider-aws/3.75.1/terraform-provider-aws_3.75.1_linux_amd64.zip --output "testv4.zip" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 146.75.37.183:443... * Connected to releases.hashicorp.com (146.75.37.183) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt * CApath: none * TLSv1.2 (OUT), TLS header, Certificate Status (22): } [5 bytes data] * TLSv1.2 (OUT), TLS handshake, Client hello (1): } [512 bytes data] * TLSv1.2 (IN), TLS handshake, Server hello (2): { [102 bytes data] * TLSv1.2 (IN), TLS handshake, Certificate (11): { [2843 bytes data] * TLSv1.2 (IN), TLS handshake, Server key exchange (12): { [333 bytes data] * TLSv1.2 (IN), TLS handshake, Server finished (14): { [4 bytes data] * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): } [70 bytes data] * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): } [1 bytes data] * TLSv1.2 (OUT), TLS handshake, Finished (20): } [16 bytes data] * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): { [1 bytes data] * TLSv1.2 (IN), TLS handshake, Finished (20): { [16 bytes data] * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 * ALPN, server accepted to use h2 * Server certificate: * subject: CN=*.hashicorp.com * start date: May 3 19:03:10 2021 GMT * expire date: Jun 4 19:03:09 2022 GMT * subjectAltName: host "releases.hashicorp.com" matched cert's "*.hashicorp.com" * issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 DV TLS CA 2020 * SSL certificate verify ok. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 } [5 bytes data] * Using Stream ID: 1 (easy handle 0x1c11750) } [5 bytes data] > GET /terraform-provider-aws/3.75.1/terraform-provider-aws_3.75.1_linux_amd64.zip HTTP/2 > Host: releases.hashicorp.com > user-agent: curl/7.76.1 > accept: */* > { [5 bytes data] < HTTP/2 200 < cache-control: max-age=31536000, stale-while-revalidate=86400, stale-if-error=604800, public < content-disposition: attachment < last-modified: Thu, 24 Mar 2022 22:14:45 GMT < etag: "9165ebea2ca3c1ea395394d61ebac925" < content-type: application/zip < access-control-allow-origin: * < strict-transport-security: max-age=31536000; includeSubDomains; preload < x-terraform-protocol-version: 5 < x-terraform-protocol-versions: 5.0 < x-xss-protection: 1; mode=block < x-content-type-options: nosniff < x-frame-options: sameorigin < accept-ranges: bytes < age: 1109571 < date: Thu, 14 Apr 2022 13:12:14 GMT < content-length: 52441916 < { [5 bytes data] 100 50.0M 100 50.0M 0 0 322M 0 --:--:-- --:--:-- --:--:-- 322M * Connection #0 to host releases.hashicorp.com left intact ```

IPv6 curl

``` sh-4.2$ curl -v -6 https://releases.hashicorp.com/terraform-provider-aws/3.75.1/terraform-provider-aws_3.75.1_linux_amd64.zip --output "testv6.zip" % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 2a04:4e42:79::439:443... * Connected to releases.hashicorp.com (2a04:4e42:79::439) port 443 (#0) * ALPN, offering h2 * ALPN, offering http/1.1 * Cipher selection: ALL:!EXPORT:!EXPORT40:!EXPORT56:!aNULL:!LOW:!RC4:@STRENGTH * successfully set certificate verify locations: * CAfile: /etc/pki/tls/certs/ca-bundle.crt * CApath: none * TLSv1.2 (OUT), TLS header, Certificate Status (22): } [5 bytes data] * TLSv1.2 (OUT), TLS handshake, Client hello (1): } [512 bytes data] * TLSv1.2 (IN), TLS handshake, Server hello (2): { [102 bytes data] * TLSv1.2 (IN), TLS handshake, Certificate (11): { [2843 bytes data] * TLSv1.2 (IN), TLS handshake, Server key exchange (12): { [333 bytes data] * TLSv1.2 (IN), TLS handshake, Server finished (14): { [4 bytes data] * TLSv1.2 (OUT), TLS handshake, Client key exchange (16): } [70 bytes data] * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1): } [1 bytes data] * TLSv1.2 (OUT), TLS handshake, Finished (20): } [16 bytes data] * TLSv1.2 (IN), TLS change cipher, Change cipher spec (1): { [1 bytes data] * TLSv1.2 (IN), TLS handshake, Finished (20): { [16 bytes data] * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256 * ALPN, server accepted to use h2 * Server certificate: * subject: CN=*.hashicorp.com * start date: May 3 19:03:10 2021 GMT * expire date: Jun 4 19:03:09 2022 GMT * subjectAltName: host "releases.hashicorp.com" matched cert's "*.hashicorp.com" * issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 DV TLS CA 2020 * SSL certificate verify ok. * Using HTTP2, server supports multi-use * Connection state changed (HTTP/2 confirmed) * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0 } [5 bytes data] * Using Stream ID: 1 (easy handle 0x16e9750) } [5 bytes data] > GET /terraform-provider-aws/3.75.1/terraform-provider-aws_3.75.1_linux_amd64.zip HTTP/2 > Host: releases.hashicorp.com > user-agent: curl/7.76.1 > accept: */* > { [5 bytes data] < HTTP/2 200 < cache-control: max-age=31536000, stale-while-revalidate=86400, stale-if-error=604800, public < content-disposition: attachment < last-modified: Thu, 24 Mar 2022 22:14:45 GMT < etag: "9165ebea2ca3c1ea395394d61ebac925" < content-type: application/zip < access-control-allow-origin: * < strict-transport-security: max-age=31536000; includeSubDomains; preload < x-terraform-protocol-version: 5 < x-terraform-protocol-versions: 5.0 < x-xss-protection: 1; mode=block < x-content-type-options: nosniff < x-frame-options: sameorigin < accept-ranges: bytes < age: 1109678 < date: Thu, 14 Apr 2022 13:14:02 GMT < content-length: 52441916 < { [5 bytes data] 100 50.0M 100 50.0M 0 0 378M 0 --:--:-- --:--:-- --:--:-- 376M * Connection #0 to host releases.hashicorp.com left intact ```

If the connection through IPv6 was an issue I think the connection would be failing intermittently for other providers too - but it's failing only for AWS provider 🙁

Here's an example pipeline run - we can see the first three projects ran init and validate successfully (and these projects use AWS provider), but the 4th one timed out. All of these tasks were ran on the same machine.

image

dmandyna commented 2 years ago

@crw I saw you re-added the waiting-response tag, do you need anything else from me?

apparentlymart commented 2 years ago

It is indeed interesting that this should affect only one provider. One possible difference here is that the AWS provider packages are likely larger than the other two listed, since that provider has a much larger number of resource types, and so perhaps all of the downloads are going slowly but only the AWS provider is large enough for "slowly" to hit the timeout.

That is admittedly just a guess with no concrete proof. Unfortunately at this point I feel at a loss about how to investigate further; this sort of issue is classically hard to debug remotely since it probably involves something specific about your network or internet connection, which we cannot see. 😖

dmandyna commented 2 years ago

@apparentlymart thanks for the answer. I don't think the connection or the network setup would be at fault here, all the containers that use Terraform live in AWS VPC, with egress only internet gateway for IPv6 and NAT/Internet gateway for IPv4 connections.

Just to illustrate how weird it is, the other 3 projects that validated as part of the same run, ran init successfully in seconds - all these were ran on the same machine within the same network: baseline r53 tgw

I understand it's not enough data to troubleshoot the issue with, I just thought that maybe a change was made on your side that potentially caused this.

We'll keep using Terraform version 1.0.0 as we're not seeing that issue there. Let me know if I can assist/answer any other questions, if not, we should be good to close this ticket.

apparentlymart commented 2 years ago

Hi @dmandyna,

Sorry for not being clearer; my intent with the previous comment was only to say that there can be valid reasons for a change in behavior like the one we were discussing earlier (using IPv6 instead of IPv4 now) would only affect a larger provider package like the hashicorp/aws provider, not to suggest that it was the size of the package alone that made it not work.

With that said, since you can see it working using Terraform v1.0 it might be useful to see if there's some difference in how that older version is installing the providers. I'm honestly not sure how best to inspect a working installation to see which IP version it is using, since Terraform's logging only includes that level of detail when reporting an error, but if you could share an equivalent trace from the working version as you previously shared for the failure then maybe we'll be able to notice a difference that I'm not thinking to ask about yet.

dmandyna commented 2 years ago

Hey @apparentlymart,

There was nothing useful in the TRACE logs generated by terraform init for a successful build, but I can upload that if needed. I've managed to dig up some flow logs related to the IPv6 address on the error message below: image

According to the flow logs, some of the requests from our machine gets rejected by the registry service e.g: src-addr dst-addr src-port dst-port protocol packets bytes start end action log-status
2a04:4e42:d:0:0:0:0:439 2600:1f14:662:8b04:754d:2ad1:9ced:3063 443 54634 6 246 349860 1650548008 1650548038 REJECT OK

I've attached our flow logs from the machine where the terraform init was ran - it ran terraform init for 7 projects all of which are using AWS provider, and only one failed. flow-logs.csv

I hope this makes sense, but please let me know if there is any additional info I can provide to help you further.

apparentlymart commented 2 years ago

Thanks for that additional information, @dmandyna!

I must admit that it's been a while since I interacted with an EC2 flow log and so I'm trying not to make too many unchecked assumptions as I read this, but some things I notice are:

I did some research to try to understand what exactly "action=REJECT" means in a flow log. I found Logging IP traffic using VPC flow logs which defines action as having two possible values with the following definitions:

  • ACCEPT — The recorded traffic was permitted by the security groups and network ACLs.
  • REJECT — The recorded traffic was not permitted by the security groups or network ACLs.

This definition seems to suggest that it was something in the AWS VPC fabric that rejected this packet, rather than something at the remote Fastly server.

It isn't clear to me what would make this specific stream be rejected by your security group rules and network ACLs when others are succeeding, or why that would vary by which Terraform version you are using. It can't just be the IP addresses because I can see other entries in the flow log which also related to 2a04:4e42:d:0:0:0:0:439 that were not rejected.

The documentation is a little unclear as to what "permitted" means, and since security groups in part behave effectively as a stateful firewall I suppose that in principle "not permitted by a security group" could mean that the packet wasn't related to an active stream being tracked for security group purposes and so was rejected on that premise.

I'm honestly running aground here and not sure what else to ask. It does seem like something fishy is going on with some participant in this request, and I agree with you that the fact that it seems to always work with Terraform v1.0 and to fail only with Terraform v1.1 leads to the conclusion that something changed in Terraform -- likely actually in the Go standard library, but materially the same for your purposes here. However, it seems to be something incredibly specific and so I'm not sure what to look for in the changes between the Go version we used in Terraform v1.0 and the Go version we use in Terraform v1.1. I'm going to leave my writeup about assumptions and partial conclusions here in the hope that someone else on our team, or even outside of our team, might have an idea of what to try next.

Thanks, and sorry there isn't a clearer answer here.

gemcdaniel commented 1 year ago

I've been having the same issue running IPv6 in AWS in us-east-1. Although it started off just the hashicorp/aws provider, we are seeing the same thing for various providers.

It looks like the registry/releases domains are now behind an AWS CloudFront distribution.

Out of curiosity, I temporarily set our hostnames for the registry.terraform.io and releases.hashicorp.com to one of the IPv6 addresses I get from my local machine. After that, I've been able to run our jobs 50 times without issue when before we had a success rate of 1-5% of jobs.

It seems like a regional issue with CloudFront, most likely us-east-1 given that is where we are requesting the DNS lookups.

dmandyna commented 1 year ago

Thanks a lot @gemcdaniel, I'll definitely try when I get a chance. We've decided just to keep using Terraform version 1.0.0, but it would be nice to get some of the new features!

apparentlymart commented 1 year ago

Indeed, it's true that registry.terraform.io and releases.hashicorp.com are now served through Amazon Cloudfront rather than Fastly. This has changed since my earlier comment and is subject to change again in future; the infrastructure used to serve those is an implementation detail.

gemcdaniel commented 1 year ago

@apparentlymart I completely understand. I was debugging the issue and just noticed the symptoms are the same (more accurately worse) but the IP addresses were different. I was able to see they were AWS which initially confused me into thinking the issue was internal to our environment but didn't end up being that.

At this moment, I'm at a lost as to the cause since this doesn't seem to be an issue with our networking as we can hit everything else including an IP address for a different region served up by the CF distro. It makes it seem like the issue is with CF in that specific region.

guilhermeassad commented 1 year ago

Hello. I'm also facing the same issue. I've been investigating it for a few days and in some cases I see connections that are established but no further packages are transferred, on other cases after a while I get a lot of re-transmission. Similarly to @gemcdaniel, I'm running github actions from AWS EKS cluster using IPV6, from eu-central-1 though.

gemcdaniel commented 1 year ago

To give an update. I've worked with AWS support on this issue and it was determined that the network was hitting limits with IPv6 packets which resulted in them being dropped at the ENI level.

There are rare use-cases (notably IPv6 with SACK option) in which the network packet headers exceed the default maximal supported size (96 bytes). Customers encountering such scenarios will experience network connectivity issues and packet drops. To mitigate this issue, customers are instructed to either disable SACK or enable the option of wide LLQ entries which increases the accepted header size to 224 bytes.

The option to enable wide LLQ entries is currently supported on EC2 4th and 5th generation instance-types and supported in ENA Linux github driver, Linux upstream (by E/APR 2022), DPDK, and FreeBSD. The option to enable wide LLQ entries on EC2 6th generation instance-types(cmr6i and c6gn) will be available by E/MAR 2022.

The solution they pointed me to is that of the the FAQ for the ENA driver:

Q: Part of my network traffic uses IPv6 header with extensions and also TCP header with options. I suspect my Tx packets are not sent out.

A: ENA LLQs in default mode support network headers size up to 96 bytes. If header size is larger, the packet will be dropped. To resolve this issue, we recommend to reload the ENA driver with module parameter force_large_llq_header=1. This will increase the supported header size to a maximum of 224 bytes. Please note that this option reduces the max Tx ring size form 1K to 512. An example of such use case is IPv6 protocol with TCP SACK enabled, which might result in the packet header exceeding 96 bytes. An alternative solution for this particular use-case would be to disable TCP SACK:

$ echo 0 > /proc/sys/net/ipv4/tcp_sack Please also note that this feature is only supported by the GitHub version of ENA driver and by AL2 distro.

References:- [1] ENA_Linux_Best_Practices

I still experienced the issue with disabling SACK but the force_large_llq_header parameter did seem to fix the issue.

To set force_large_llq_header=1 you need to modify the /boot/grub2/grub.cfg (we need edited the /etc/grub.d/10_linux file and ran grub2-mkconfig -o /boot/grub2/grub.cfg). The line that needed to be changed looks like:

linux /boot/vmlinuz-5.10.184-175.731.amzn2.x86_64 root=UUID=a9962bc2-1c87-41b5-ba10-7be5ba7cd663 ro console=tty0 console=ttyS0,115200n8 net.ifnames=0 biosdevname=0 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0

All that is needed, is to append ena.force_large_llq_header=1. So it should look like:

linux /boot/vmlinuz-5.10.184-175.731.amzn2.x86_64 root=UUID=a9962bc2-1c87-41b5-ba10-7be5ba7cd663 ro console=tty0 console=ttyS0,115200n8 net.ifnames=0 biosdevname=0 nvme_core.io_timeout=4294967295 rd.emergency=poweroff rd.shell=0 ena.force_large_llq_header=1

mikesplain commented 1 year ago

Thanks for the details @gemcdaniel, we're running into this same issue.

I was able to get the ena.force_large_llq_header=1 fix working in bottlerocket as well using:

[settings.boot]
reboot-to-reconcile = true

[settings.boot.kernel-parameters]
"ena.force_large_llq_header" = [
  "1"
]
guilhermeassad commented 10 months ago

An update also from my issue, that might be useful for other people. We have Github runners executing terraform tasks running in an IPV6-only EKS cluster. On EKS the option to set llq_header is not available because pods are using the CNI driver. Together with AWS support, we fixed it by disabling SACK. For that to properly work, you need to enable unsafe sysctl parameters. Here are pages related to it. https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/ https://itnext.io/of-kubernetes-unsafe-sysctls-performance-optimization-on-eks-d36cc0e3e894 And what you need to set on the pod is: securityContext: sysctls:

atz commented 6 months ago

Also experiencing this failure, specifically in GitLab runners that resolve the CDN to IPv6 address.

│ Error: Failed to install provider
│ 
│ Error while installing hashicorp/kubernetes v2.27.0: read tcp
│ [our_ipv6_addr::21]:37578->[2600:9000:25f4:7c00:5:e2b6:b380:93a1]:443:
│ read: connection timed out
╵
╷
│ Error: Failed to install provider
│ 
│ Error while installing hashicorp/aws v4.67.0: read tcp
│ [our_ipv6_addr::21]:60600->[2600:9000:25f4:e00:5:e2b6:b380:93a1]:443:
│ read: connection timed out
╵

We also experienced this w/ one of the smallest providers, namely template.

terraform init succeeds in our other (local and server) based environments, but it isn't directly comparable since those resolve to and connect via IPv4.

We already use terraform lock files (to reduce churn) and are pursuing both provider mirroring and (GitLab) dependency caching to mitigate. Those are good things to do anyway, but this is somewhat inscrutable.

It is unclear if our security/compliance constraints will allow the low-level workarounds suggested. We will also investigate disabling IPv6 during job execution.