hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.33k stars 1.73k forks source link

Failing test(s): TestAccDataprocCluster_withNodeGroupAffinity #13634

Closed SarahFrench closed 15 hours ago

SarahFrench commented 1 year ago

Impacted tests:

Affected Resource(s)

Nightly builds:

Message:

provider_test.go:307: Step 1/1 error: Error running apply: exit status 1
Error: Error waiting for creating Dataproc cluster: Error code 9, message: Instance could not be scheduled due to no matching node with property compatibility.
Explanation:
The matching node(s) <test-nodegroup-wk1cg8gvhx-252j, test-nodegroup-wk1cg8gvhx-t8g5, test-nodegroup-wk1cg8gvhx-z6bc> are not in READY status.
Potential fix:
- Please consider waiting for the matching node(s) to become READY.
- Please consider deleting and recreating the node or contacting Google Cloud Support, in case it is taking several minutes for the node to become READY.
with google_dataproc_cluster.basic,
on terraform_plugin_test.tf line 25, in resource "google_dataproc_cluster" "basic":

b/299683841

SarahFrench commented 1 year ago

This test is also affected by quota limits, possible exacerbated by increasing the # of parallel tests

melinath commented 1 year ago

This seems to be flipping between quota issues and the other issue. I'm going to keep this as a service/dataproc issue; if it turns out we need to increase quota we can do that later.

melinath commented 12 months ago

Possibly the node groups here should also use a tf-test prefix.

melinath commented 8 months ago

Currently 50% failure - 62 / 124.

roaks3 commented 7 months ago

@melinath I ran the labeler today and the service/dataproc label was re-added because of the Affected Resource(s) block. It looks like the service/compute-sole-tenancy label was added at one point because the formatting was off such that the rest of the description was included as Affected Resource(s), and google_compute_node_group was found.

I think we want to keep service/dataproc and remove service/compute-sole-tenancy, but wanted to double check if there was a reason for removing service/dataproc previously.

melinath commented 7 months ago

I must have clicked on the wrong label? +1 that this is clearly dataproc, not compute-sole-tenancy

NickElliot commented 5 days ago

to amend this ticket, the most common test error message is the following, the quota issue being relatively rare:

        Error: Error waiting for creating Dataproc cluster: Error code 9, message: Instance could not be scheduled due to no matching node with property compatibility.

        Explanation:
        The matching node group(s) <test-nodegroup-randomsuffix> do not match the intance's machine family type.