Closed sjpb closed 3 years ago
I had a similar issue when in VPN, most probably DNS or something blocking the route for terraform. See referenced issue, "good old dns" mentioned here as well.
But if its a generic DNS problem why does wget
work?
wget
worked for me as well, so no idea: perhaps terraform has some crazy way of downloading the providers. It's still a problem worth solving, but workaround is to try again via different network.
Thanks for filing the issue @sjpb. The link to your gist seems to be broken, if you could add the trace output it may help narrow down the issue.
If this isn't a transient problem with the remote server, it would appear this is a similar DNS issue to #26532
, but I do not believe we have seen this on a linux system before. The difference you see with wget
would be that wget is using the glibc resolver and tls libraries from your system, while terraform is not. One way to verify this would be to build terraform locally on your system, allowing it to use the host libraries for name resolution, and see if this resolves your issue.
Thanks @jbardin have fixed the link, sorry about that. After more digging it appears we have other DNS problems on this system so this might not really be a terraform bug. The fact that the system could reach the url while TF couldn't was a confusing user experience though, it took some searching to find #26532
and start suspecting DNS. I'll report back if there's any useful info from this end.
For Linux systems I think the way to observe a potential disagreement between the libc hostname resolving (which could be consulting arbitrary data sources depending on /etc/nsswitch.conf
) vs. the direct DNS resolving Terraform is effectively doing would be to compare the results of the following two commands:
getent hosts registry.terraform.io
dig registry.terraform.io
The first of these queries the libc resolver, which will hopefully use DNS indirectly, while the second will always use DNS. The two different query strategies (along with the fact that registry.terraform.io
is CNAME
d to a CDN whose results may vary between queries) mean that the two are unlikely to align exactly, but hopefully the results will seem similar, today both returning the IP addresses of hostnames ending in fastly.net.
.
If you see a significant difference between these -- for example, if getent
seems to be returning a local IP address or some other hostname that belongs to your organization -- that would suggest to me that your system is relying on some custom nsswitch.conf
configuration that Terraform doesn't take into account due to using DNS lookups directly.
When I tried this for myself to make sure I was sharing the correct commands, one other possible variant came to mind: resolving registry.terraform.io
typically returns both IPv4 and IPv6 addresses under the assumption that the client will choose whichever is appropriate for its IP stack configuration, but sometimes this process doesn't work out right for one reason or another and e.g. a system will try to connect to an IPv6 IP address even though it only has LAN IPv6 connectivity. Terraform implements the so-called Happy Eyeballs algorithm to try to mitigate this sort of oddity, but it could be subverted if something unusual is happening in the resolver that causes either Terraform or wget
to see a modified or incomplete version of the upstream DNS response.
If you do learn what exactly is going on it would be useful if you would share that with us. Hopefully the above will give you some ideas to help with debugging.
I am seeing this problem as well. I have a correctly configured dual stack network. I can force resolution on both stacks using curl successfully while terraform fails. Disabling dual stack and choosing one or the other for the computer running terraform does resolve the issue, but that should not be required. Plain golang does not have this issue in IPv4 only, IPv6 only, or dual stack configuration of host computer that is running terraform.
The issue came down to only one of the set of DNS servers with an IPv6 address was rejecting requests. The two IPv4 listening servers and other IPv6 listening were accepting requests. Terraform was not falling back on the other configured DNS servers when it failed to get a response from the one it tried. The IPv4 servers are the primary for the configuration, so thats why no timeouts were seen anywhere else. It is strange that terraform would force using IPv6 server since IPv4 has no issue returning the AAAA record necessary to resolve the IPv6 address of services.
$ terraform init
Initializing modules...
Initializing the backend...
Initializing provider plugins...
- Finding community-terraform-providers/ignition versions matching "2.1.1"...
Error: Failed to query available provider packages
Could not retrieve the list of available versions for provider
community-terraform-providers/ignition: could not query provider registry for
registry.terraform.io/community-terraform-providers/ignition: the request
failed after 2 attempts, please try again later: Get
"https://registry.terraform.io/v1/providers/community-terraform-providers/ignition/versions":
net/http: request canceled while waiting for connection (Client.Timeout
exceeded while awaiting headers)
$ time curl -4 https://registry.terraform.io/v1/providers/community-terraform-providers/ignition/versions
{"id":"community-terraform-providers/ignition","versions":[{"version":"2.1.0","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"linux","arch":"amd64"},{"os":"windows","arch":"amd64"}]},{"version":"2.1.1","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"freebsd","arch":"386"},{"os":"freebsd","arch":"amd64"},{"os":"freebsd","arch":"arm"},{"os":"freebsd","arch":"arm64"},{"os":"linux","arch":"386"},{"os":"linux","arch":"amd64"},{"os":"linux","arch":"arm"},{"os":"linux","arch":"arm64"},{"os":"windows","arch":"386"},{"os":"windows","arch":"amd64"}]}],"warnings":null}
curl -4 0.01s user 0.00s system 20% cpu 0.088 total
$ time curl -6 https://registry.terraform.io/v1/providers/community-terraform-providers/ignition/versions
{"id":"community-terraform-providers/ignition","versions":[{"version":"2.1.0","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"linux","arch":"amd64"},{"os":"windows","arch":"amd64"}]},{"version":"2.1.1","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"freebsd","arch":"386"},{"os":"freebsd","arch":"amd64"},{"os":"freebsd","arch":"arm"},{"os":"freebsd","arch":"arm64"},{"os":"linux","arch":"386"},{"os":"linux","arch":"amd64"},{"os":"linux","arch":"arm"},{"os":"linux","arch":"arm64"},{"os":"windows","arch":"386"},{"os":"windows","arch":"amd64"}]}],"warnings":null}
curl -6 0.02s user 0.00s system 29% cpu 0.067 total
For those who don't know the curl flags, from the man:
-4, --ipv4
This option tells curl to resolve names to IPv4 addresses only, and not for example try IPv6.
See also --http1.1 and --http2. This option overrides -6, --ipv6.
-6, --ipv6
This option tells curl to resolve names to IPv6 addresses only, and not for example try IPv4.
See also --http1.1 and --http2. This option overrides -4, --ipv4.
The plain golang test.
package main
import (
"fmt"
"net/http"
"net/http/httputil"
)
func main() {
resp, _ := http.Get("https://registry.terraform.io/v1/providers/community-terraform-providers/ignition/versions")
buf, _ := httputil.DumpResponse(resp, true)
fmt.Printf("%s\n", string(buf))
}
$ go version
go version go1.15.8 darwin/amd64
$ go run main.go
HTTP/2.0 200 OK
Accept-Ranges: bytes
Age: 0
Cache-Control: public, max-age=30, stale-if-error=31536000
Content-Type: application/json
Date: Sun, 14 Feb 2021 05:25:36 GMT
Server: Cowboy
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
Vary: Accept-Encoding
Via: 1.1 vegur, 1.1 varnish, 1.1 varnish
X-Cache: MISS, MISS
X-Cache-Hits: 0, 0
X-Served-By: cache-dca17753-DCA, cache-pdk17870-PDK
{"id":"community-terraform-providers/ignition","versions":[{"version":"2.1.0","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"linux","arch":"amd64"},{"os":"windows","arch":"amd64"}]},{"version":"2.1.1","protocols":["5.0"],"platforms":[{"os":"darwin","arch":"amd64"},{"os":"freebsd","arch":"386"},{"os":"freebsd","arch":"amd64"},{"os":"freebsd","arch":"arm"},{"os":"freebsd","arch":"arm64"},{"os":"linux","arch":"386"},{"os":"linux","arch":"amd64"},{"os":"linux","arch":"arm"},{"os":"linux","arch":"arm64"},{"os":"windows","arch":"386"},{"os":"windows","arch":"amd64"}]}],"warnings":null}
An external report of IPv6 connectivity.
@apparentlymart thank you for a very informative/helpful post. Upgrades to the virtual network stack appear to have solved my specific problem so I won't have debug info to share unfortunately. I suspect I'll run into it again elsewhere though so those tips will be very useful.
This issue appears to be resolved, so I'm going to close it out. If any new information arises, feel free to reply here or file a new issue with a new reproduction case.
Thanks!
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Terraform Version
Terraform Configuration Files
Debug Output
https://gist.github.com/sjpb/5cd405c09fc2ef925250e65a0ae4cb8b
Crash Output
N/A
Expected Behavior
Terraform initialised.
Actual Behavior
Steps to Reproduce
Additional Context
I can wget the paths which the debug log shows are timing out:
This is on Centos 8.3. As linked below this looks like #26532 to me but that was on OSX. Since DNS was the cause of that issue I'll note that:
References
26532