Open melinath opened 2 years ago
TestAccTPUNode_tpuNodeFullExample Google Cloud - 35.2% failure Google Cloud Beta - 36% failure
b/261834151
There are also failures like this that don't happen consistently:
provider_test.go:320: Step 1/2 error: After applying this test step, the plan was not empty.
stdout:
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
~ update in-place
Terraform will perform the following actions:
# google_tpu_node.tpu will be updated in-place
~ resource "google_tpu_node" "tpu" {
id = "projects/[PROJECT]/locations/us-central1-b/nodes/tf-test-test-tpubdrsxsbxw1"
name = "tf-test-test-tpubdrsxsbxw1"
~ tensorflow_version = "1.15.3" -> "1.15.4"
# (10 unchanged attributes hidden)
# (1 unchanged block hidden)
}
Plan: 0 to add, 1 to change, 0 to destroy.
In the config tensorflow_version
is set using data.google_tpu_tensorflow_versions.available.versions[0]
, and I wonder if it's because the google_tpu_tensorflow_versions
datasource is returning different values between the first plan+apply and then the second plan step.
Maybe by provisioning something in a given zone we affect the "zonal availability" of TPU resources in that zone, and that affects the values returned by projects.locations.tensorflowVersions/list?
This is now only failing with the error in @SarahFrench 's comment. data.google_tpu_tensorflow_versions.available.versions[0]
is unfortunately using the oldest available version of Tensorflow, which I don't think we want for these tests. It's possible that is related to the inconsistency (the idea of the test impacting zonal availability seems plausible too). Note that data.google_tpu_tensorflow_versions.available.versions
also includes versions that are not stable releases, so we can't just use the last version in the list, and IMO we will most likely need to change these tests back to using a hard-coded Tensorflow version.
This test failed at 32% in Mar 2023, and it does come up for some of the other TPU tests as well.
Affected Resource(s)
Failure rate: 100% since 2022-10-08 Failure rate: 32% in Mar 2023
Impacted tests:
Nightly builds:
Message:
Note: this is separate from https://github.com/hashicorp/terraform-provider-google/issues/10222 which is flakey due to capacity issues.