Open aditya-facets opened 1 year ago
@aditya-facets since we don't own modules, can you repro the issue with resources? If yes, can you share the config?
I'm also experiencing this issue. Here's my setup:
Main.tf:
resource "google_sql_database_instance" "main" {
for_each = var.sql_instances
name = each.value.sql_instance_name
database_version = each.value.db_version
root_password = each.value.root_password
deletion_protection = each.value.deletion_protection
settings {
tier = each.value.tier
availability_type = each.value.availability_type
disk_autoresize = each.value.disk_autoresize
disk_size = each.value.disk_size
ip_configuration {
ipv4_enabled = each.value.ipv4_enabled
private_network = each.value.ipv4_enabled == false ? "projects/${var.project_id}/global/networks/${var.vpc_name}" : null
allocated_ip_range = each.value.ipv4_enabled == false ? "${var.vpc_name}-private-ip" : null
require_ssl = each.value.require_ssl
}
user_labels = each.value.user_labels
backup_configuration {
enabled = each.value.backup_enabled
binary_log_enabled = each.value.binary_log_enabled
transaction_log_retention_days = each.value.transaction_log_retention_days
backup_retention_settings {
retention_unit = each.value.retention_unit
retained_backups = each.value.retained_backups
}
}
}
}
resource "google_sql_database_instance" "read_replica" {
depends_on = [google_sql_database_instance.main]
for_each = var.sql_replicas
name = each.value.replica_name
database_version = each.value.db_version
root_password = each.value.root_password
deletion_protection = each.value.deletion_protection
master_instance_name = each.value.master_instance
settings {
tier = each.value.tier
availability_type = each.value.availability_type
disk_autoresize = each.value.disk_autoresize
disk_size = each.value.disk_size
ip_configuration {
ipv4_enabled = each.value.ipv4_enabled
private_network = "projects/${var.project_id}/global/networks/${var.vpc_name}"
allocated_ip_range = "${var.vpc_name}-private-ip"
}
}
Variables.tf:
variable "project_id" {
description = "The name of GCP project."
type = string
default = "app-prod"
}
variable "region" {
description = "The name of GCP region."
type = string
default = "me-west1"
}
variable "vpc_name" {
description = "The name of the VPC in the project."
type = string
default = "app-prod-vpc"
}
variable "sql_instances" {
description = "A map of SQL instances to deploy"
type = map(object({
sql_instance_name = string
db_version = string
root_password = string
deletion_protection = bool
user_labels = map(string)
tier = string
availability_type = string
disk_autoresize = bool
disk_size = number
ipv4_enabled = bool
require_ssl = bool
backup_enabled = bool
binary_log_enabled = bool
transaction_log_retention_days = number
retention_unit = string
retained_backups = number
db_name = string
charset = string
collation = string
}))
default = {}
}
variable "sql_replicas" {
description = "A map of SQL read replicas to deploy"
type = map(object({
replica_name = string
master_instance = string
db_version = string
root_password = string
deletion_protection = bool
tier = string
availability_type = string
disk_autoresize = bool
disk_size = number
ipv4_enabled = bool
}))
default = {}
}
Terraform.tfvars:
sql_instances = {
app_test = {
sql_instance_name = "app-test"
db_version = "MYSQL_5_7" // MYSQL_5_6, MYSQL_5_7, MYSQL_8_0
root_password = "NewPass123"
deletion_protection = false
user_labels = {
"env" = "prod"
"app" = "test"
}
# Instance settings:
tier = "db-custom-1-3840"
availability_type = "REGIONAL" // REGIONAL or ZONAL. REGIONAL will make it High-availability.
disk_autoresize = true // Automatically scale up hard drive when space runs out
disk_size = 10 // Size in GB.
# IP config:
ipv4_enabled = false // Whether or not to create a public IP for this instance.
require_ssl = false
# Backup config:
backup_enabled = true
binary_log_enabled = true
transaction_log_retention_days = 7
retention_unit = "COUNT"
retained_backups = 7
# DB of the main instance:
db_name = "app_db"
charset = "utf8" // https://dev.mysql.com/doc/refman/5.7/en/charset-charsets.html
collation = "utf8_general_ci" // https://dev.mysql.com/doc/refman/5.7/en/charset-charsets.html
},
}
### Read replicas ###
sql_replicas = {
app-test-replica = {
replica_name = "app-test-replica"
master_instance = "app-test" // The source SQL instance to replicate. Must match the name of the main instance.
db_version = "MYSQL_5_7"
root_password = "NewPass123"
deletion_protection = false
tier = "db-custom-1-3840"
availability_type = "REGIONAL" // REGIONAL or ZONAL
disk_autoresize = true // Automatically scale up hard drive when space runs out
disk_size = 10 // Size in GB.
ipv4_enabled = false // Whether or not to create a public IP for this instance.
},
app-test-replica2 = {
replica_name = "app-test-replica2"
master_instance = "app-test"
db_version = "MYSQL_5_7"
root_password = "NewPass123"
deletion_protection = false
tier = "db-custom-1-3840"
availability_type = "REGIONAL"
disk_autoresize = true
disk_size = 10
ipv4_enabled = false
},
}
This applies successfully but when destroying, the 2nd replica does get destroyed, but terraform doesn't seem to receive a success status code so it keeps trying and then it just can't find the resource anymore:
Error: Error, failed to delete instance app-test-replica2: Error waiting for Delete Instance: couldn't find resource (21 retries)
When performing a 2nd destroy, all goes well and the master and other replica gets destroyed properly.
@aditya-facets I have tried the config like yours and it fails to run. Are you able to make it simpler?
@edwardmedia Were you referring to the example I provided (accidentally tagging the original poster instead of me)?
@legojesus you are right. My question was intended to you
I have managed to setup up your config. My testing turns out to be fine. Yours does have the dependency on the master
for both replicas
. During the destroy
steps, I do see both replicas deleted first before the master. I do not know what had happened on yours. Do you have the debug log to share so I can take a closer look at yours?
Thanks for the info @edwardmedia .
I did not get the log, but a few days ago I happen to have deployed a much simpler version, with which the issue occurred again:
Deploy the following main.tf:
resource "google_sql_database_instance" "main" {
name = "main-test"
database_version = "MYSQL_5_7"
deletion_protection = false
settings {
tier = "db-f1-micro"
backup_configuration {
enabled = true
binary_log_enabled = true
transaction_log_retention_days = 7
backup_retention_settings {
retention_unit = "COUNT"
retained_backups = 7
}
}
}
}
resource "google_sql_database_instance" "read_replica" { depends_on = [google_sql_database_instance.main] name = "replica1" database_version = "MYSQL_5_7" master_instance_name = google_sql_database_instance.main.name deletion_protection = false settings { tier = "db-f1-micro" } }
resource "google_sql_database_instance" "read_replica2" { depends_on = [google_sql_database_instance.main] name = "replica2" database_version = "MYSQL_5_7" master_instance_name = google_sql_database_instance.main.name deletion_protection = false settings { tier = "db-f1-micro" } }
2. Destroy after deployment is completed.
You might want to change machine types in the instances because that little deployment took me an hour, and I think it is because the machine type might have something to do with it.
The error will show up after about 10 minutes of trying to destroy the deployment. Another destroy action after the error will immediately delete the main instance without a problem.
Hello, had the same issue running Google provider 4.53.1 and terraform v1.3.7:
╷
│ Error: Error, failed to delete instance <replica1>: Error waiting for Delete Instance: couldn't find resource (21 retries)
│
│
╵
╷
│ Error: Error, failed to delete instance <replica2>: Error waiting for Delete Instance: couldn't find resource (21 retries)
│
Note that the replicas are deleted from our GCP project but terraform returned this error
Looking into
@legojesus Using your config, I have tried 5 times, none hit the same error.
Based on the error and the behavior, it appeared the api failed to return DONE properly. I am not sure what caused that in your case. Do you want to share the full debug log so I can take a look?
The default timeout for delete is 30 minutes. Just curious, did you see the error after 30 minutes? The error is different if timeout was hit.
@edwardmedia Thanks for testing again. My terraform is 1.3.7 and I can still reproduce this on demand. Here's the latest log (rather long): log.txt
The delete doesn't take 30 minutes. After around 3-4 minutes of destroying the replicas, it starts getting the following response (according to the log):
{
"error": {
"code": 404,
"message": "The Cloud SQL instance operation does not exist.",
"errors": [
{
"message": "The Cloud SQL instance operation does not exist.",
"domain": "global",
"reason": "operationDoesNotExist"
}
]
}
}
After 10 minutes of trying, it gives up and throws the error mentioned in this discussion.
@legojesus looking at below section, it appears the api behaves a little weird. If the delete operation is complete, shouldn't it return DONE, instead of operationDoesNotExist
?
GET /sql/v1beta4/projects/test-prod/operations/d770bd46-c9cc-47c7-a06f-9c3900000053?alt=json&prettyPrint=false HTTP/1.1
Host: sqladmin.googleapis.com
User-Agent: google-api-go-client/0.5 Terraform/1.3.7 (+https://www.terraform.io) Terraform-Plugin-SDK/2.10.1 terraform-provider-google/dev
X-Goog-Api-Client: gl-go/1.18.1 gdcl/0.82.0
Accept-Encoding: gzip
-----------------------------------------------------: timestamp=2023-04-10T09:35:32.094+0300
2023-04-10T09:35:32.560+0300 [INFO] provider.terraform-provider-google_v4.33.0_x5: 2023/04/10 09:35:32 [DEBUG] Google API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 200 OK
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Cache-Control: private
Content-Type: application/json; charset=UTF-8
Date: Mon, 10 Apr 2023 06:35:32 GMT
Server: ESF
Vary: Origin
Vary: X-Origin
Vary: Referer
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 0
{
"kind": "sql#operation",
"targetLink": "https://sqladmin.googleapis.com/sql/v1beta4/projects/test-prod/instances/test-reader2",
"status": "RUNNING",
"user": "test@test.com",
"insertTime": "2023-04-10T06:34:07.382Z",
"startTime": "2023-04-10T06:34:07.544Z",
"operationType": "DELETE",
"name": "d770bd46-c9cc-47c7-a06f-9c3900000053",
"targetId": "test-reader2",
"selfLink": "https://sqladmin.googleapis.com/sql/v1beta4/projects/test-prod/operations/d770bd46-c9cc-47c7-a06f-9c3900000053",
"targetProject": "test-prod"
}
-----------------------------------------------------: timestamp=2023-04-10T09:35:32.560+0300
2023-04-10T09:35:32.560+0300 [INFO] provider.terraform-provider-google_v4.33.0_x5: 2023/04/10 09:35:32 [DEBUG] Retry Transport: Stopping retries, last request was successful: timestamp=2023-04-10T09:35:32.560+0300
2023-04-10T09:35:32.560+0300 [INFO] provider.terraform-provider-google_v4.33.0_x5: 2023/04/10 09:35:32 [DEBUG] Retry Transport: Returning after 1 attempts: timestamp=2023-04-10T09:35:32.560+0300
2023-04-10T09:35:32.561+0300 [INFO] provider.terraform-provider-google_v4.33.0_x5: 2023/04/10 09:35:32 [DEBUG] Got RUNNING while polling for operation d770bd46-c9cc-47c7-a06f-9c3900000053's status: timestamp=2023-04-10T09:35:32.560+0300
2023-04-10T09:35:32.561+0300 [INFO] provider.terraform-provider-google_v4.33.0_x5: 2023/04/10 09:35:32 [TRACE] Waiting 10s before next try: timestamp=2023-04-10T09:35:32.560+0300
module.sql_db[0].google_sql_database_instance.read_replica["test-db-reader2"]: Still destroying... [id=test-reader2, 7m20s elapsed]
2023-04-10T09:35:42.564+0300 [INFO] provider.terraform-provider-google_v4.33.0_x5: 2023/04/10 09:35:42 [DEBUG] Waiting for state to become: [success]: timestamp=2023-04-10T09:35:42.564+0300
2023-04-10T09:35:42.565+0300 [INFO] provider.terraform-provider-google_v4.33.0_x5: 2023/04/10 09:35:42 [DEBUG] Retry Transport: starting RoundTrip retry loop: timestamp=2023-04-10T09:35:42.565+0300
2023-04-10T09:35:42.565+0300 [INFO] provider.terraform-provider-google_v4.33.0_x5: 2023/04/10 09:35:42 [DEBUG] Retry Transport: request attempt 0: timestamp=2023-04-10T09:35:42.565+0300
2023-04-10T09:35:42.565+0300 [INFO] provider.terraform-provider-google_v4.33.0_x5: 2023/04/10 09:35:42 [DEBUG] Google API Request Details:
---[ REQUEST ]---------------------------------------
GET /sql/v1beta4/projects/test-prod/operations/d770bd46-c9cc-47c7-a06f-9c3900000053?alt=json&prettyPrint=false HTTP/1.1
Host: sqladmin.googleapis.com
User-Agent: google-api-go-client/0.5 Terraform/1.3.7 (+https://www.terraform.io) Terraform-Plugin-SDK/2.10.1 terraform-provider-google/dev
X-Goog-Api-Client: gl-go/1.18.1 gdcl/0.82.0
Accept-Encoding: gzip
-----------------------------------------------------: timestamp=2023-04-10T09:35:42.565+0300
2023-04-10T09:35:45.010+0300 [INFO] provider.terraform-provider-google_v4.33.0_x5: 2023/04/10 09:35:45 [DEBUG] Google API Response Details:
---[ RESPONSE ]--------------------------------------
HTTP/2.0 404 Not Found
Alt-Svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
Cache-Control: private
Content-Type: application/json; charset=UTF-8
Date: Mon, 10 Apr 2023 06:35:44 GMT
Server: ESF
Vary: Origin
Vary: X-Origin
Vary: Referer
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 0
{
"error": {
"code": 404,
"message": "The Cloud SQL instance operation does not exist.",
"errors": [
{
"message": "The Cloud SQL instance operation does not exist.",
"domain": "global",
"reason": "operationDoesNotExist"
}
]
}
}
b/278307339
@edwardmedia You are correct, it should return DONE but for some reason it just retries until it throws the error.
Do you require any other info from me/my setup?
same error here
@legojesus Is there a palliative solution?
@SamuelMolling Unfortunately no. The only way around this is to perform a 2nd terraform destroy
operation, which then works well.
That's what we've been doing, but is there any way to fix it? Is there a front of it?
I'm just a user like you, so I have no answer. @edwardmedia What does the "Upstream" label you've added do? Is this going to be addressed in the near future? Thank you.
I use terragrunt and solved it with auto retry, it's a tip 😀
Community Note
modular-magician
user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned tohashibot
, a community member has claimed the issue already.Terraform Version
Affected Resource(s)
Expected Behavior
Destroy of the cloudsql mysql/postgres resource should be clean updating terraform state likewise.
Actual Behavior
Destroy of the cloudsql instance with 2 replica-reader was destroyed which got reflected in the console. But the status of replica instances are not reflected in the terraform logs, rather it fails throwing the error message
Error, failed to delete instance <XYZ>-replicareader-0: Error waiting for Delete Instance: couldn't find resource (21 retries)
Console, does not show replica instances and only the master instaces shows up.
Steps to Reproduce
Using the module https://github.com/terraform-google-modules/terraform-google-sql-db/tree/v13.0.1/modules/postgresql for cloudsql - postgres SQL / mysql intialisation.
Important Factoids
error :
Error waiting for Delete Instance: couldn't find resource (21 retries)
is inconsistent, as this behaviour is noticed at a very random basis. Some times it destroys successfully but fails mostly.References
b/299600745