hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.3k stars 1.73k forks source link

Transient `EOF` during project IAM binding #15350

Open duxbuse opened 1 year ago

duxbuse commented 1 year ago

Community Note

Terraform Version

Terraform v1.5.2

Affected Resource(s)

Debug Output

I know you want a gist or paste bin but there is well over 1 millions lines of output as this is a sizable terraform use case. I have isolated a few of the error blocks hoping to help diagnose the issue.

Error 1 ``` 2023-07-28T02:28:22.1775261Z 2023-07-28T12:24:44.368+1000 [DEBUG] provider.terraform-provider-google_v4.75.1_x5: 2023/07/28 12:24:43 [DEBUG] Retry Transport: Stopping retries, last request failed with non-retryable error: EOF ```
Error 2 ``` 2023-07-28T02:28:25.4431723Z 2023-07-28T12:25:18.533+1000 [DEBUG] created provider logger: level=debug 2023-07-28T02:28:25.4432078Z 2023-07-28T12:25:18.533+1000 [INFO] provider: configuring client automatic mTLS 2023-07-28T02:28:25.4433185Z 2023-07-28T12:25:18.543+1000 [DEBUG] provider: starting plugin: path=.terraform/providers/registry.terraform.io/hashicorp/null/3.2.1/linux_amd64/terraform-provider-null_v3.2.1_x5 args=[.terraform/providers/registry.terraform.io/hashicorp/null/3.2.1/linux_amd64/terraform-provider-null_v3.2.1_x5] 2023-07-28T02:28:25.4433959Z 2023-07-28T12:25:18.544+1000 [DEBUG] provider: plugin started: path=.terraform/providers/registry.terraform.io/hashicorp/null/3.2.1/linux_amd64/terraform-provider-null_v3.2.1_x5 pid=3708 2023-07-28T02:28:25.4434688Z 2023-07-28T12:25:18.544+1000 [DEBUG] provider: waiting for RPC address: path=.terraform/providers/registry.terraform.io/hashicorp/null/3.2.1/linux_amd64/terraform-provider-null_v3.2.1_x5 2023-07-28T02:28:25.4435382Z 2023-07-28T12:25:18.744+1000 [INFO] provider.terraform-provider-null_v3.2.1_x5: configuring server automatic mTLS: timestamp=2023-07-28T12:25:18.743+1000 2023-07-28T02:28:25.4435711Z 2023-07-28T12:25:18.786+1000 [DEBUG] provider: using plugin: version=5 2023-07-28T02:28:25.4436374Z 2023-07-28T12:25:18.786+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: plugin address: address=/tmp/plugin787149281 network=unix timestamp=2023-07-28T12:25:18.786+1000 2023-07-28T02:28:25.4437936Z 2023-07-28T12:25:18.828+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Calling provider defined Provider Metadata: @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server_getproviderschema.go:39 @module=sdk.framework tf_provider_addr=registry.terraform.io/hashicorp/null tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 tf_rpc=GetProviderSchema timestamp=2023-07-28T12:25:18.827+1000 2023-07-28T02:28:25.4439597Z 2023-07-28T12:25:18.828+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Called provider defined Provider Metadata: tf_provider_addr=registry.terraform.io/hashicorp/null tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 tf_rpc=GetProviderSchema @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server_getproviderschema.go:41 @module=sdk.framework timestamp=2023-07-28T12:25:18.828+1000 2023-07-28T02:28:25.4441187Z 2023-07-28T12:25:18.828+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Calling provider defined Provider GetSchema: tf_rpc=GetProviderSchema @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server.go:260 @module=sdk.framework tf_provider_addr=registry.terraform.io/hashicorp/null tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 timestamp=2023-07-28T12:25:18.828+1000 2023-07-28T02:28:25.4442654Z 2023-07-28T12:25:18.828+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Called provider defined Provider GetSchema: @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server.go:262 @module=sdk.framework tf_provider_addr=registry.terraform.io/hashicorp/null tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 tf_rpc=GetProviderSchema timestamp=2023-07-28T12:25:18.828+1000 2023-07-28T02:28:25.4444358Z 2023-07-28T12:25:18.829+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Calling provider defined Provider Resources: tf_provider_addr=registry.terraform.io/hashicorp/null tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 tf_rpc=GetProviderSchema @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server.go:330 @module=sdk.framework timestamp=2023-07-28T12:25:18.829+1000 2023-07-28T02:28:25.4445840Z 2023-07-28T12:25:18.829+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Called provider defined Provider Resources: tf_rpc=GetProviderSchema @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server.go:332 @module=sdk.framework tf_provider_addr=registry.terraform.io/hashicorp/null tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 timestamp=2023-07-28T12:25:18.829+1000 2023-07-28T02:28:25.4447436Z 2023-07-28T12:25:18.829+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Calling provider defined Resource GetSchema: @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server.go:411 tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 tf_resource_type=null_resource @module=sdk.framework tf_provider_addr=registry.terraform.io/hashicorp/null tf_rpc=GetProviderSchema timestamp=2023-07-28T12:25:18.829+1000 2023-07-28T02:28:25.4449136Z 2023-07-28T12:25:18.829+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Called provider defined Resource GetSchema: @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server.go:413 @module=sdk.framework tf_provider_addr=registry.terraform.io/hashicorp/null tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 tf_resource_type=null_resource tf_rpc=GetProviderSchema timestamp=2023-07-28T12:25:18.829+1000 2023-07-28T02:28:25.4450623Z 2023-07-28T12:25:18.829+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Calling provider defined Provider DataSources: @module=sdk.framework tf_provider_addr=registry.terraform.io/hashicorp/null tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 tf_rpc=GetProviderSchema @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server.go:152 timestamp=2023-07-28T12:25:18.829+1000 2023-07-28T02:28:25.4452197Z 2023-07-28T12:25:18.829+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Called provider defined Provider DataSources: tf_provider_addr=registry.terraform.io/hashicorp/null tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 tf_rpc=GetProviderSchema @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server.go:154 @module=sdk.framework timestamp=2023-07-28T12:25:18.829+1000 2023-07-28T02:28:25.4453802Z 2023-07-28T12:25:18.829+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Calling provider defined DataSource GetSchema: tf_provider_addr=registry.terraform.io/hashicorp/null tf_rpc=GetProviderSchema @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server.go:233 @module=sdk.framework tf_data_source_type=null_data_source tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 timestamp=2023-07-28T12:25:18.829+1000 2023-07-28T02:28:25.4455536Z 2023-07-28T12:25:18.829+1000 [DEBUG] provider.terraform-provider-null_v3.2.1_x5: Called provider defined DataSource GetSchema: tf_rpc=GetProviderSchema @caller=github.com/hashicorp/terraform-plugin-framework@v0.16.0/internal/fwserver/server.go:235 tf_data_source_type=null_data_source tf_req_id=e2296f9e-cc0f-d650-e185-af62650256d6 @module=sdk.framework tf_provider_addr=registry.terraform.io/hashicorp/null timestamp=2023-07-28T12:25:18.829+1000 2023-07-28T02:28:25.4455860Z 2023-07-28T12:25:18.830+1000 [DEBUG] No provider meta schema returned 2023-07-28T02:28:25.4456477Z 2023-07-28T12:25:18.833+1000 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF" 2023-07-28T02:28:25.4457312Z 2023-07-28T12:25:18.834+1000 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/null/3.2.1/linux_amd64/terraform-provider-null_v3.2.1_x5 pid=3708 2023-07-28T02:28:25.4457699Z 2023-07-28T12:25:18.834+1000 [DEBUG] provider: plugin exited 2023-07-28T02:28:25.4458035Z 2023-07-28T12:25:21.011+1000 [INFO] backend/local: plan operation completed 2023-07-28T02:28:25.4458387Z 2023-07-28T12:25:21.058+1000 [INFO] backend/local: writing plan output to: myplan 2023-07-28T02:28:25.5069100Z ╷ 2023-07-28T02:28:25.5071966Z │ Error: Error when reading or editing Resource "project \"project-sit-acd5\"" with IAM Member: Role "projects/project-sit-acd5/roles/platform_automation_sa" Member "serviceAccount:nonprod-folder-admin-sa@pltfm-build-nonprod-7644.iam.gserviceaccount.com": Error retrieving IAM policy for project "project-sit-acd5": Post "https://cloudresourcemanager.googleapis.com/v1/projects/project-sit-acd5:getIamPolicy?alt=json&prettyPrint=false": EOF 2023-07-28T02:28:25.5072208Z │  2023-07-28T02:28:25.5072753Z │  with google_project_iam_member.project_sa-binding["project-sit"], 2023-07-28T02:28:25.5073637Z │  on custom-role-bindings.tf line 81, in resource "google_project_iam_member" "project_sa-binding": 2023-07-28T02:28:25.5074090Z │  81: resource "google_project_iam_member" "project_sa-binding" { 2023-07-28T02:28:25.5074236Z │  2023-07-28T02:28:25.5074388Z ╵ 2023-07-28T02:28:25.5075087Z 2023-07-28T12:25:35.103+1000 [DEBUG] provider.terraform-provider-google_v4.75.1_x5: 2023/07/28 12:25:35 [DEBUG] [transport] [server-transport 0xc000a76000] Closing: Server.Stop called 2023-07-28T02:28:25.5075703Z 2023-07-28T12:25:35.103+1000 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF" 2023-07-28T02:28:25.5077019Z 2023-07-28T12:25:35.112+1000 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/hashicorp/google/4.75.1/linux_amd64/terraform-provider-google_v4.75.1_x5 pid=3582 2023-07-28T02:28:25.5077316Z 2023-07-28T12:25:35.112+1000 [DEBUG] provider: plugin exited 2023-07-28T02:28:25.5078955Z time=2023-07-28T12:27:01+10:00 level=info msg=2023-07-28T12:07:21.725+1000 [INFO] Terraform version: 1.5.1 2023-07-28T02:28:25.5079322Z 2023-07-28T12:07:21.728+1000 [DEBUG] using github.com/hashicorp/go-tfe v1.26.0 2023-07-28T02:28:25.5079689Z 2023-07-28T12:07:21.728+1000 [DEBUG] using github.com/hashicorp/hcl/v2 v2.16.2 2023-07-28T02:28:25.5080217Z 2023-07-28T12:07:21.728+1000 [DEBUG] using github.com/hashicorp/terraform-svchost v0.1.0 2023-07-28T02:28:25.5080563Z 2023-07-28T12:07:21.728+1000 [DEBUG] using github.com/zclconf/go-cty v1.12.2 2023-07-28T02:28:25.5080845Z 2023-07-28T12:07:21.728+1000 [INFO] Go runtime version: go1.20 2023-07-28T02:28:25.5081199Z 2023-07-28T12:07:21.728+1000 [INFO] CLI args: []string{"terraform", "output", "-json"} ```

Expected Behavior

It works, detects no changes and completes the plan.

Also somewhat expect that EOF is a retryable error.

Actual Behavior

Occasionally (~1/4 runs fail) I get the above errors when running my TF plan. It seems to happen regardless of if there is lots of changes planned or if there is no changes detected. But everytime it fails its a different iam binding that fails its not the same one each time.

Steps to Reproduce

Its transient and I'm not really sure, this is a large terraform chunk managing 100~200 project iam bindings and occasionally we get that issue.

  1. terraform plan

Important Factoids

Don't see the issue as much when running on local mac book pros that are pretty beefy. The smaller cicd runners see this issue more regularly. So it seems that when either there is a network glitch and some packets are lost, the grpc has already received a 200 header but then doesn't get all the body and fails, or perhaps when some RAM limits are met, the body of the response is lost. I really don't know the error message is hard to debug.

b/304725227

edwardmedia commented 1 year ago

@duxbuse I noticed you said this doesn't happen on local. Do you see the difference in the env between your local and the pipeline? Maybe you can debug starting from there. Without the repro steps, not sure how much we can do.

duxbuse commented 1 year ago

@edwardmedia I still see it happening locally just a bit less frequently.

Don't see the issue as much when running on local mac book pros that are pretty beefy. The smaller cicd runners see this issue more regularly.

edwardmedia commented 1 year ago

@duxbuse can you provide the config that I can repro?

duxbuse commented 1 year ago

I can, but would prefer to not do so in a public forum.

But my suspicion is that run heaps of resources (300+ iam bindings) on a machine with say 2gig of ram and then you might see a bunch of these EOF errors

edwardmedia commented 1 year ago

@duxbuse yeah I doubt there is much we can do at the provider level. You might consider some workaround such as breaking it down to smaller number to groups, etc.

edwardmedia commented 1 year ago

@duxbuse does my explanation make sense to you?

duxbuse commented 1 year ago

Yeah I'm sure that breaking it down will hide the issue, but I guess I'm currently seeing a scaling issue. That the go code under the hood, probably in the transport layer somewhere isn't respecting memory limits.

@Edward sun do you have a non public place I can share our config. I'm afraid the task of anonymising it is quite large.

On Tue, 8 Aug 2023, 11:11 pm Edward Sun, @.***> wrote:

@duxbuse https://github.com/duxbuse does my explanation make sense to you?

— Reply to this email directly, view it on GitHub https://github.com/hashicorp/terraform-provider-google/issues/15350#issuecomment-1669587825, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUNBC667REU5MFPXIOHFMTXUI3ILANCNFSM6AAAAAA27FRVKU . You are receiving this because you were mentioned.Message ID: @.***>

edwardmedia commented 1 year ago

@shuyama1 do you have any suggestions on this?