hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.25k stars 1.7k forks source link

google_sql_database_instance can break refresh when activation_policy is set to NEVER #7810

Open paddycarver opened 3 years ago

paddycarver commented 3 years ago

Community Note

Terraform Version

v3.18.0 All Terraform versions

Affected Resource(s)

Terraform Configuration Files

resource "google_sql_database_instance" "break-all-the-things" {
  name = "morgoth-breaker-of-things"
  database_version = "POSTGRES_11"
  region = "us-central1"

  settings {
    tier = "db-f1-micro"
    activation_policy = "NEVER"
  }
}

resource "google_sql_user" "user" {
  name     = "user"
  instance = google_sql_database_instance.break-all-the-things.name
  host     = "me.com"
  password = "changeme"
}

resource "google_sql_database" "database" {
  name     = "database"
  instance = google_sql_database_instance.break-all-the-things.name
}

Debug Output

Error reading SQLDatabase "morgoth-breaker-of-things:database": googleapi: Error 400: Invalid request: Invalid request since instance is not running., invalid
Error reading SQL User "user" in instance "morgoth-breaker-of-things": googleapi: Error 400: Invalid request: Invalid request since instance is not running., invalid

Panic Output

N/A

Expected Behavior

Terraform should have created a plan.

Actual Behavior

Terraform returned an error.

Steps to Reproduce

  1. terraform apply
  2. terraform plan

Important Factoids

N/A

References

N/A

Proposals to Fix

So there are some options to fix this situation:

Option 1: Ignore the error, move on with our lives

We could update the Read methods in sql_user and sql_database to ignore the error coming back and just not update the state of the database or user if the instance is off, since the state is presumably unknowable at that point.

The benefit of this is it requires little work and is straightforward to implement and reason about.

The drawback is it lies to the user a bit; the values in state may not, in fact, be correct at all, and the user may try to interpolate them in their config in other places, getting inappropriate values.

Option 2: Update state based on the error

We could update the Read methods in sql_user and sql_database to check for the error message and set a specific state when it's encountered, e.g. empty values for everything.

The benefit of this is that users won't get correct-looking but out of date state values when interpolating into other resources.

The drawback is it's more work to implement, and user configs will break loudly, which means they may break in situations where it would have been harmless to let them work even though they were "incorrect".

Option 3: Disallow this state

We could disallow setting the activation_policy of sql_database_instance to NEVER, which would prevent users from getting Terraform into an un-refreshable state.

The benefit of this is it's basically no work to do.

The drawback is it disables reasonable user behaviors and isn't a full solution; users and tools outside Terraform can still stop the database instance, which will make Terraform unable to refresh the state.

Option 4: Special database_instance_state field

We could add a special field to sql_user and sql_database called "sql_database_instance_state" or something. When this error is encountered, it is set to "OFF" or "STOPPED" or something. When this error isn't encountered, it's set to "ON" or "ACTIVE" or something. Then we combine this with Option 1 or Option 2, allowing user configs to more easily detect the accuracy of the state information they're getting for these resources.

Option 5: Check the activation_policy before refreshing

I'm mentioning this as an honorable mention, because I don't think it'll actually work, but it's been suggested by everyone (including me) who has seen this issue. The theory goes that we could just prevent refreshing when the activation_policy is set to NEVER. This can't actually be done, because the issue isn't in the sql_database_instance's refresh, it's in the downstream resources that depend on the sql_database_instance being active, and they don't have access to activation_policy. So I think we have to try the refresh and handle the error, instead of not trying the request at all.

b/301412091

paddycarver commented 3 years ago

This was resolved in v3.19.0 by #6162, but that was meant for when the parent instance is deleted, and so it removes the sql_user and sql_database from state entirely. In this case, the instance is still there, but off, and so removing the user and database from state means when the database is turned back on, they'll need to be imported, which isn't ideal.

It sounds like the right thing to do is either to adopt option 1, 2, or 4 from above and replace the current workaround with them, or to complicate the current workaround and make it check whether the database instance still exists before removing from state, falling back to option 1, 2, or 4 if it still exists.

brendonrapp commented 1 year ago

Does anyone have a reasonable workaround to this issue? We want to be able to shut down database instances for idle apps to save on cost, but this completely blocks that.

TrieBr commented 11 months ago

The activation_policy argument seems like it has no use as long as this bug exists. Consider removing it in the meantime.

Setting it to anything other than the default breaks since 99% of use cases would also have a google_sql_database or google_sql_user attached which would fail on any subsequent terraform apply.

Is there any status update or plan to fix this issue?

My use case is also to attempt to "pause" my resources when I'm not using them. As a hacky workaround, I have to create a seperate bash script to stop/start the instance before/after terraform runs.

Pause:

terraform apply -var="paused=true"
gcloud sql instances patch $(terraform output db_instance_name) --activation-policy=NEVER

Unpause:

gcloud sql instances patch $(terraform output db_instance_name) --activation-policy=ALWAYS
terraform apply -var="paused=false"
lodotek commented 7 months ago

Still no solution? 😢