hashicorp / terraform

Terraform enables you to safely and predictably create, change, and improve infrastructure. It is a source-available tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.
https://www.terraform.io/
Other
42.73k stars 9.56k forks source link

Client.Timeout exceeded while awaiting headers (Linux) #27742

Closed sjpb closed 3 years ago

sjpb commented 3 years ago

Terraform Version

Terraform v0.14.6

Terraform Configuration Files

terraform {
  required_version = ">= 0.14"
  required_providers {
    openstack = {
      source = "terraform-provider-openstack/openstack"
    }
  }
}

Debug Output

https://gist.github.com/sjpb/5cd405c09fc2ef925250e65a0ae4cb8b

Crash Output

N/A

Expected Behavior

Terraform initialised.

Actual Behavior

(venv) [centos@sb5 terraform]$ TF_LOG=TRACE TF_LOG_PATH=./tflog.txt terraform init
Initializing the backend...

Initializing provider plugins...
- Finding latest version of hashicorp/local...
- Finding latest version of terraform-provider-openstack/openstack...

Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
hashicorp/local: could not query provider registry for
registry.terraform.io/hashicorp/local: the request failed after 2 attempts,
please try again later: Get
"https://registry.terraform.io/v1/providers/hashicorp/local/versions":
net/http: request canceled while waiting for connection (Client.Timeout
exceeded while awaiting headers)

Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
terraform-provider-openstack/openstack: could not query provider registry for
registry.terraform.io/terraform-provider-openstack/openstack: the request
failed after 2 attempts, please try again later: Get
"https://registry.terraform.io/v1/providers/terraform-provider-openstack/openstack/versions":
net/http: request canceled while waiting for connection (Client.Timeout
exceeded while awaiting headers)

Steps to Reproduce

TF_LOG=TRACE TF_LOG_PATH=./tflog.txt terraform init

Additional Context

I can wget the paths which the debug log shows are timing out:

(venv) [centos@sb5 terraform]$ wget -O - https://checkpoint-api.hashicorp.com/v1/check/terraform?arch=amd64&os=linux&signature=f82b4763-fb7b-15e0-4936-aafd71d74e9d&version=0.14.6
<snip>
{"product":"terraform","current_version":"0.14.6","current_release":1612461662,"current_download_url":"https://releases.hashicorp.com/terraform/0.14.6/","current_changelog_url":"https://github.com/hashicorp/terraform/blob/v0.14.6/CHANGELOG.md","project_website":"https://www.terraform.io","alerts":[]}

This is on Centos 8.3. As linked below this looks like #26532 to me but that was on OSX. Since DNS was the cause of that issue I'll note that:

(venv) [centos@sb5 terraform]$ cat /etc/resolv.conf 
# Generated by NetworkManager
search openstacklocal novalocal
nameserver 10.109.0.4
nameserver 10.109.0.2
nameserver 10.109.0.3

References

26532

Vetchu commented 3 years ago

I had a similar issue when in VPN, most probably DNS or something blocking the route for terraform. See referenced issue, "good old dns" mentioned here as well.

sjpb commented 3 years ago

But if its a generic DNS problem why does wget work?

Vetchu commented 3 years ago

wgetworked for me as well, so no idea: perhaps terraform has some crazy way of downloading the providers. It's still a problem worth solving, but workaround is to try again via different network.

jbardin commented 3 years ago

Thanks for filing the issue @sjpb. The link to your gist seems to be broken, if you could add the trace output it may help narrow down the issue.

If this isn't a transient problem with the remote server, it would appear this is a similar DNS issue to #26532, but I do not believe we have seen this on a linux system before. The difference you see with wget would be that wget is using the glibc resolver and tls libraries from your system, while terraform is not. One way to verify this would be to build terraform locally on your system, allowing it to use the host libraries for name resolution, and see if this resolves your issue.

sjpb commented 3 years ago

Thanks @jbardin have fixed the link, sorry about that. After more digging it appears we have other DNS problems on this system so this might not really be a terraform bug. The fact that the system could reach the url while TF couldn't was a confusing user experience though, it took some searching to find #26532 and start suspecting DNS. I'll report back if there's any useful info from this end.

apparentlymart commented 3 years ago

For Linux systems I think the way to observe a potential disagreement between the libc hostname resolving (which could be consulting arbitrary data sources depending on /etc/nsswitch.conf) vs. the direct DNS resolving Terraform is effectively doing would be to compare the results of the following two commands:

getent hosts registry.terraform.io
dig registry.terraform.io

The first of these queries the libc resolver, which will hopefully use DNS indirectly, while the second will always use DNS. The two different query strategies (along with the fact that registry.terraform.io is CNAMEd to a CDN whose results may vary between queries) mean that the two are unlikely to align exactly, but hopefully the results will seem similar, today both returning the IP addresses of hostnames ending in fastly.net..

If you see a significant difference between these -- for example, if getent seems to be returning a local IP address or some other hostname that belongs to your organization -- that would suggest to me that your system is relying on some custom nsswitch.conf configuration that Terraform doesn't take into account due to using DNS lookups directly.

When I tried this for myself to make sure I was sharing the correct commands, one other possible variant came to mind: resolving registry.terraform.io typically returns both IPv4 and IPv6 addresses under the assumption that the client will choose whichever is appropriate for its IP stack configuration, but sometimes this process doesn't work out right for one reason or another and e.g. a system will try to connect to an IPv6 IP address even though it only has LAN IPv6 connectivity. Terraform implements the so-called Happy Eyeballs algorithm to try to mitigate this sort of oddity, but it could be subverted if something unusual is happening in the resolver that causes either Terraform or wget to see a modified or incomplete version of the upstream DNS response.

If you do learn what exactly is going on it would be useful if you would share that with us. Hopefully the above will give you some ideas to help with debugging.

cehoffman commented 3 years ago

I am seeing this problem as well. I have a correctly configured dual stack network. I can force resolution on both stacks using curl successfully while terraform fails. Disabling dual stack and choosing one or the other for the computer running terraform does resolve the issue, but that should not be required. Plain golang does not have this issue in IPv4 only, IPv6 only, or dual stack configuration of host computer that is running terraform.

The issue came down to only one of the set of DNS servers with an IPv6 address was rejecting requests. The two IPv4 listening servers and other IPv6 listening were accepting requests. Terraform was not falling back on the other configured DNS servers when it failed to get a response from the one it tried. The IPv4 servers are the primary for the configuration, so thats why no timeouts were seen anywhere else. It is strange that terraform would force using IPv6 server since IPv4 has no issue returning the AAAA record necessary to resolve the IPv6 address of services.

$ terraform init
Initializing modules...

Initializing the backend...

Initializing provider plugins...
- Finding community-terraform-providers/ignition versions matching "2.1.1"...

Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
community-terraform-providers/ignition: could not query provider registry for
registry.terraform.io/community-terraform-providers/ignition: the request
failed after 2 attempts, please try again later: Get
"https://registry.terraform.io/v1/providers/community-terraform-providers/ignition/versions":
net/http: request canceled while waiting for connection (Client.Timeout
exceeded while awaiting headers)
$ time curl -4 https://registry.terraform.io/v1/providers/community-terraform-providers/ignition/versions
{"id":"community-terraform-providers/ignition","versions":[{"version":"2.1.0","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"linux","arch":"amd64"},{"os":"windows","arch":"amd64"}]},{"version":"2.1.1","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"freebsd","arch":"386"},{"os":"freebsd","arch":"amd64"},{"os":"freebsd","arch":"arm"},{"os":"freebsd","arch":"arm64"},{"os":"linux","arch":"386"},{"os":"linux","arch":"amd64"},{"os":"linux","arch":"arm"},{"os":"linux","arch":"arm64"},{"os":"windows","arch":"386"},{"os":"windows","arch":"amd64"}]}],"warnings":null}
curl -4   0.01s user 0.00s system 20% cpu 0.088 total
$ time curl -6 https://registry.terraform.io/v1/providers/community-terraform-providers/ignition/versions
{"id":"community-terraform-providers/ignition","versions":[{"version":"2.1.0","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"linux","arch":"amd64"},{"os":"windows","arch":"amd64"}]},{"version":"2.1.1","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"freebsd","arch":"386"},{"os":"freebsd","arch":"amd64"},{"os":"freebsd","arch":"arm"},{"os":"freebsd","arch":"arm64"},{"os":"linux","arch":"386"},{"os":"linux","arch":"amd64"},{"os":"linux","arch":"arm"},{"os":"linux","arch":"arm64"},{"os":"windows","arch":"386"},{"os":"windows","arch":"amd64"}]}],"warnings":null}
curl -6   0.02s user 0.00s system 29% cpu 0.067 total

For those who don't know the curl flags, from the man:

       -4, --ipv4
              This option tells curl to resolve names to IPv4 addresses only, and not for example try IPv6.

              See also --http1.1 and --http2. This option overrides -6, --ipv6.

       -6, --ipv6
              This option tells curl to resolve names to IPv6 addresses only, and not for example try IPv4.

              See also --http1.1 and --http2. This option overrides -4, --ipv4.

The plain golang test.

package main

import (
    "fmt"
    "net/http"
    "net/http/httputil"
)

func main() {
    resp, _ := http.Get("https://registry.terraform.io/v1/providers/community-terraform-providers/ignition/versions")
    buf, _ := httputil.DumpResponse(resp, true)
    fmt.Printf("%s\n", string(buf))
}
$ go version
go version go1.15.8 darwin/amd64
$ go run main.go
HTTP/2.0 200 OK
Accept-Ranges: bytes
Age: 0
Cache-Control: public, max-age=30, stale-if-error=31536000
Content-Type: application/json
Date: Sun, 14 Feb 2021 05:25:36 GMT
Server: Cowboy
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Vary: Accept-Encoding
Via: 1.1 vegur, 1.1 varnish, 1.1 varnish
X-Cache: MISS, MISS
X-Cache-Hits: 0, 0
X-Served-By: cache-dca17753-DCA, cache-pdk17870-PDK

{"id":"community-terraform-providers/ignition","versions":[{"version":"2.1.0","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"linux","arch":"amd64"},{"os":"windows","arch":"amd64"}]},{"version":"2.1.1","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"freebsd","arch":"386"},{"os":"freebsd","arch":"amd64"},{"os":"freebsd","arch":"arm"},{"os":"freebsd","arch":"arm64"},{"os":"linux","arch":"386"},{"os":"linux","arch":"amd64"},{"os":"linux","arch":"arm"},{"os":"linux","arch":"arm64"},{"os":"windows","arch":"386"},{"os":"windows","arch":"amd64"}]}],"warnings":null}

An external report of IPv6 connectivity.

Screen Shot 2021-02-13 at 11 11 45 PM
sjpb commented 3 years ago

@apparentlymart thank you for a very informative/helpful post. Upgrades to the virtual network stack appear to have solved my specific problem so I won't have debug info to share unfortunately. I suspect I'll run into it again elsewhere though so those tips will be very useful.

jbardin commented 3 years ago

This issue appears to be resolved, so I'm going to close it out. If any new information arises, feel free to reply here or file a new issue with a new reproduction case.

Thanks!

ghost commented 3 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.