hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.29k stars 1.72k forks source link

Terraform not honouring OS IPv4 settings, using IPv6 dst to call *.googleapis.com #6782

Open mhanline opened 4 years ago

mhanline commented 4 years ago

Community Note

Terraform Version

tf version
Terraform v0.12.28
+ provider.google v3.29.0
+ provider.google-beta v3.29.0

Affected Resource(s)

All resources, not specific to any one.

Terraform Configuration Files

While this happens intermittently and it's not specific to this config, it seems to happen with longer Terraform runs. You may need to apply / destroy 1-2 times before seeing this issue.

gist link to config

Debug Output

I see this output sporadically, and not on the same API call. Note the DST IP is an IPv6 address, but Cloud Shell does not enable IPv6 in the OS: Link to gist

Console output when issue occurs (Note the IPv6 address is being used):

Error: Error when reading or editing Project Service [project-id]/trafficdirector.googleapis.com: Get "https://cloudresourcemanager.googleapis.com/v1/projects/[project-id]?alt=json&prettyPrint=false": dial tcp [2404:6800:4003:c00::5f]:443: connect: cannot assign requested address
Error: Error retrieving available container cluster versions: Get "https://container.googleapis.com/v1beta1/projects/[project-id]/locations/asia-east1-c/serverConfig?alt=json&prettyPrint=false": dial tcp [2404:6800:4003:c04::5f]:443: connect: cannot assign requested address
Error: Error when reading or editing Project Service [project-id]/trafficdirector.googleapis.com: Get "https://cloudresourcemanager.googleapis.com/v1/projects/[project-id]?alt=json&prettyPrint=false": dial tcp [2404:6800:4003:c03::5f]:443: connect: cannot assign requested address

Expected Behavior

Terraform / Google provider should respect the OS network settings and use IPv4 addresses to call out to *.googleapis.com.

Actual Behavior

tf apply / tf destroy does not always successfully complete, and will return the errors above.

Steps to Reproduce

  1. Open Google Cloud Shell (no IPv6 stack)
  2. Run tf apply or tf destroy on the linked config
  3. Most times it will succeed, but about every second attempt it report the above errors

Note, if I statically configure /etc/hosts to resolve to a specific IPv4 address - say 199.36.153.8, the above errors never occur.

Important Factoids

Authenticating using application default credentials, built into Cloud Shell.

Confirm IPv6 is not enabled on the OS:

myusername@cloudshell:~$ sudo sysctl -n net.ipv6.conf.all.disable_ipv6 && sysctl -n net.ipv6.conf.default.disable_ipv6
1
1

References

Similar issue 1 (with Go) Similar issue 2 Workaround solution

danawillow commented 4 years ago

Here's what I know so far:

Based on https://github.com/golang/go/issues/25321 and https://github.com/hashicorp/terraform-provider-vsphere/issues/636, something that could fix it would be to compile with CGO enabled. The build script that I assume our release pipeline uses explicitly disables CGO. This was introduced in https://github.com/hashicorp/terraform/pull/7107 because it ensures the compiled binaries are statically linked (https://github.com/hashicorp/terraform/issues/6714). If I'm reading https://blog.madewithdrew.com/post/statically-linking-c-to-go/ right, then there should be a way to resolve this without having to explicitly disable CGO. It's also possible that things are different now than they were 4 years ago when the previous issues were brought up.

@megan07, is that indeed the build script that's used for the providers? If you don't mind, could you ask around to see if anyone at HashiCorp has any ideas on this? In the meantime, marking it upstream since I think it'll be good to have open as a reference for people that run into this, but I don't expect there being much we can do on the provider end.

shermanyin commented 3 years ago

I'm running into similar issue intermittently as well in GCP cloud shell.

$ ~/bin/terraform --version
Terraform v0.13.5
+ provider registry.terraform.io/hashicorp/google v3.49.0
+ provider registry.terraform.io/hashicorp/google-beta v3.49.0
+ provider registry.terraform.io/hashicorp/http v2.0.0
+ provider registry.terraform.io/hashicorp/null v3.0.0
+ provider registry.terraform.io/hashicorp/random v3.0.0
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/hashicorp/time v0.6.0

In my particular case, my script uploads a file to a Windows Server. I first get this error:

module.dc.null_resource.upload-scripts: Still creating... [4m50s elapsed]
Error: timeout - last error: unknown error Post "https://35.236.28.181:5986/wsman": dial tcp 35.236.28.181:5986: i/o timeout

I checked that the firewalls are opened to the IP of the cloudshell instance. I try to do terraform apply again, and I would run into these "cannot assign requested address" errors while refreshing state. e.g. my first run I get:

module.cac.module.cac-regional[0].random_shuffle.zone: Refreshing state... [id=-]

Error: Error when reading or editing ComputeNetwork "projects/[project-id]/global/networks/vpc-cas": Get "https://compute.googleapis.com/compute/v1/projects/[project-id]/global/networks/vpc-cas?alt=json": dial tcp [2607:f8b0:400e:c09::5f]:443: connect: cannot assign requested address

Then immediately I run terraform apply again, and it would fail in a different place.

google_compute_router_nat.nat: Refreshing state... [id=[project-id]/us-west2/router/nat]

Error: Error when reading or editing Storage Bucket "pcoip-scripts-7d731c": Get "https://storage.googleapis.com/storage/v1/b/pcoip-scripts-7d731c?alt=json&prettyPrint=false": dial tcp [2607:f8b0:400e:c07::80]:443: connect: cannot assign requested address

Finally, 3rd time it would let me type "yes" to apply the changes, but it will fail again timing out trying to upload the files. We run this same script a few times a week but most of the time there are no issues.

$ sysctl  net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
$ sysctl  net.ipv6.conf.default.disable_ipv6
net.ipv6.conf.default.disable_ipv6 = 1
bharathkkb commented 3 years ago

@c2thorn @rileykarson some of the team members have been running into this lately. Any possibilities for a fix?

/cc @daniel-cit

ocervell commented 3 years ago

same here, could you give steps to resolve ?

ferrarimarco commented 3 years ago

Hi there :)

We experienced this as well in a relatively long terraform apply (5-6 mins), running from Cloud Shell. Thanks for your support!

Error: Error creating service account: Post "https://iam.googleapis.com/v1/projects/[REDACTED_PROJECT_ID]/serviceAccounts?alt=json&prettyPrint=false": dial tcp [REDACTED_IP_V6_ADDRESS]:443: connect: cannot assign requested address

/cc @jbrook

isimluk commented 3 years ago

Quick and dirt plug:

# Workaround https://github.com/hashicorp/terraform-provider-google/issues/6782
    sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 net.ipv6.conf.lo.disable_ipv6=1 > /dev/null
    export APIS="googleapis.com www.googleapis.com storage.googleapis.com iam.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com"
    for name in $APIS
    do
      ipv4=$(getent ahostsv4 "$name" | head -n 1 | awk '{ print $1 }')
      grep -q "$name" /etc/hosts || ([ -n "$ipv4" ] && sudo sh -c "echo '$ipv4 $name' >> /etc/hosts")
    done
# Workaround end
lirlia commented 2 years ago

get all gcp api endpoints

gcloud services list --available --filter="name:googleapis.com" --format "csv[no-heading](ID)" --format "value(NAME)"
rhyas commented 1 year ago

I can't believe this is still an issue 2+ years after the bug was opened.

Jubblin commented 1 year ago

I have exactly the same issue when executing from a mac

sean9999 commented 1 year ago

+1

rubber-ant commented 1 year ago

any update on this ?

jlenuffgsoi commented 1 year ago

2023 : this is still an issue. I also encounter this problem.

liamstevens commented 1 year ago

Another confirmation that this is still occurring. Very painful.

kevin-dimichel commented 11 months ago

Error: Error retrieving available secret manager secret versions: Get "https://secretmanager.googleapis.com/v1/projects//secrets//versions/latest?alt=json": Post "https://oauth2.googleapis.com/token": dial tcp [2607:f8b0:400f:807::200a]:443: connect: no route to host

While on a different than the OP, I recently encountered a similar issue ^ on my macOS system. For me, the resolution was changing the network WI-FI settings for DNS (from my ISP's router to a public DNS (like 1.1.1.1)). After this change, terraform plan and terraform apply were successful. Maybe this will help other users too.

rpjeff commented 10 months ago

The suggested work around by @kevin-dimichel ( change DNS to 1.0.0.1 and 1.1.1.1 ) fix this for me.

pspot2 commented 5 months ago

Can confirm this with Google CloudShell.

melinath commented 5 months ago

I've been looking into this and it looks like it should be possible for us to resolve on the provider side. We should be able to use nettest.SupportsIPv6 to detect whether the current environment supports IPv6 and then force the transport layer to use IPv4 if not. Something like adding the following after this line:

client.Transport = headerTransport

client.Transport.DialContext = func(ctx context.Context, network string, addr string) (net.Conn, error) {
    d := &net.Dialer{}
    if !nettest.SupportsIPv6() {
        return d.DialContext(ctx, "tcp4", addr)
    }
    return d.DialContext(ctx, network, addr)
}

However, I can't actually reproduce this bug on cloud shell, so I can't tell if the fix actually works. If anyone has a configuration that consistently and quickly causes this error in cloud shell, that would be extremely helpful!

EDIT: apparently the override isn't quite that simple, continuing to dig, but still - reproducible cases would be great. Alternative fix would be to force setting the GODEBUG=netdns=cgo when initializing the config, but that is definitely hackier than I would prefer (and may also not work.)

yaqs/47302089738551296

der-ali commented 3 months ago

I am facing similar issue with api.cloudflare.com

nhairs commented 2 months ago

I had similar issue and resolution to kevin-dimichel (above)

% terraform init

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
β•·
β”‚ Error: Failed to query available provider packages
β”‚ 
β”‚ Could not retrieve the list of available versions for provider hashicorp/aws: could not
β”‚ query provider registry for registry.terraform.io/hashicorp/aws: the request failed after
β”‚ 2 attempts, please try again later: Get
β”‚ "https://registry.terraform.io/v1/providers/hashicorp/aws/versions": dial tcp
β”‚ [2600:9000:2212:ee00:16:1aa3:1440:93a1]:443: connect: network is unreachable
β•΅

Version Info:

I resolved this by overriding the DNS servers for both IPv4/6 with the Quad9 servers.