hashicorp / terraform-provider-google

Terraform Provider for Google Cloud Platform
https://registry.terraform.io/providers/hashicorp/google/latest/docs
Mozilla Public License 2.0
2.33k stars 1.73k forks source link

googleapi: Error 400: Invalid request: Failed to delete user root. #3820

Closed shaulishaked closed 4 years ago

shaulishaked commented 5 years ago

Community Note

Terraform Version

Terraform v0.11.13

Affected Resource(s)

Terraform Configuration Files

resource "google_sql_database_instance" "<name>-db-instance" {
  name = "<name>-db-instance"
  region = "${var.region}"
  database_version = "POSTGRES_9_6"

  settings {
    tier = "db-f1-micro"
...
  }
}

resource "google_sql_database" "<name>-db" {
  name = "<name>-db"
  instance = "${google_sql_database_instance.<name>-db-instance.name}"
}

resource "google_sql_user" "<name>-db-user" {
  instance = "${google_sql_database_instance.<name>-db-instance.name}"
  name = "root"
  password = "${random_string.<name>-db-password.result}"
}

Panic Output

* google_sql_user.<name>-db-user: Error, failed to deleteuser <name>-db-user in instance <name>-db-instance: googleapi: Error 400: Invalid request: Failed to delete user <name>-user. Detail: pq: role "<name>-db-user" cannot be dropped because some objects depend on it

Expected Behavior

Terraform should destroy the user and database successfully on the first run.

Actual Behavior

Database and user aren't deleted on the first run. Re-run (with the same code) deletes successfully (sometimes require more than one re-run).

Steps to Reproduce

  1. terraform destroy
emilymye commented 5 years ago

If you can repro and provide debug logs it helps a lot (i.e. TF_LOG=DEBUG terraform apply) but my guess is that the user owns the created database and the error is happening when the user delete call happens before the database deletion on a given run. Could you confirm this from your terraform logs or see if adding a explicit depends_on = ["google_sql_user.<name>-db-user"] to the database fixes the issue?

shaulishaked commented 5 years ago

I'm trying to re-produce while running with the debug enabled. Since it's not consistent behavior, it's harder.

(By the way, I tried upgrading Terraform to 0.12.1 and the Google provider, and the same thing happened).

I will attach with debug enabled as soon as it happens again.

Thanks,

shaulishaked commented 5 years ago

Added logs from Terraform.

I tried to remove all the resources that are not relevant in the log.

Hope this will help.

terraform_log.log

emilymye commented 5 years ago

Hi @shaulishaked, could you actually give the full logs, if possible? The debug logs you gave are missing some of the HTTP requests and logs I might need to debug.

If you could also give the config you are using or which users and databases are linked to which instances, that would be helpful.

shaulishaked commented 5 years ago

Hello @emilymye,

Could I send you the full logs and the config in private? I prefer not to share it publicly.

Thanks again,

shaulishaked commented 5 years ago

Hi,

Any update on this?

I sent the logs I could share, and I can send full logs privately. Upgrading both Terraform and the Provider didn't help.

We are unable to destroy a setup in one run and require to run Terraform twice, which isn't reliable.

We still get sporadic errors of:

Error: Error, failed to deleteuser root in instance server-db-instance: googleapi: Error 400: Invalid request: Failed to delete user root. Detail: pq: role "root" cannot be dropped because some objects depend on it
., invalid

Please let me know if any additional info can help.

Thanks,

lbornov2 commented 5 years ago

We have the same problem as well.

terraform --version
Terraform v0.12.5
emilymye commented 5 years ago

Hi @shaulishaked - so sorry for the late reply! If you are still encountering this issue, you can send your logs to me at my google.com address (emilyye).

JlUgia commented 5 years ago

I have encountered this issue too. I think this is the result of dependencies at the DB level, getting to be at conflict with others at the infra level that we define ourselves depending on how we layout our configuration files at Terraform.

At the DB level: Database -> User -> Instance, such that users can't be deleted if there are DBs that depend on them (eg.: they own), and users themselves can't be deleted if there are instances with dependencies to users.

At the same time, DBs will conflict at deletion time if there are resources using them (eg.: existing connections). With that, there is a deletion flow that allows for these dependencies to succeed: Delete Instances connected to the database instance -> Database -> Users -> DB instance

Now, that's all great, unless one is in need of using the IPs for newly created machines to define the authorized nets / fw rules of the SQL instance. In that case, the machines need to be created first, hence deleted last, breaking the flow above.

One way to circumvent that is to ignore users and DBs at destroy time by taking manually these out of the state (terraform state rm <resource>), but the need to pollute a pipeline with custom code like that is far from ideal.

Without getting into more fundamental discussions of who has the responsibility to provision infra and DB schemas, one pattern that I find appealing is to have Terraform take care of the infra, which includes the creation of an implicit root / admin / base user and use SW provisioning technologies (Spinnaker, Ansible, [local-remote]-exec) to take care of the rest. Cloud SQL already includes a default postgres user, though to set its password it is necessary to create an additional google_sql_user resource (that brings back the dependency issues at destroy time mentioned above).

A solution to this would be to add the ability to set the default password for the instance on the google_sql_database_instance resource itself, such that, creating and destroying the admin user and password is tightly bound to the provisioning of the instance, and hence something that happens synchronously in one block.

shaulishaked commented 4 years ago

@emilymye can what @JlUgia wrote help you? Can you work with that?

I'll try to reproduce again and send you the logs once this happens on my setup again.

Thanks,

jclarksnps commented 4 years ago

We're having lots of problems with the sql instance relating to the above feedback, basically when I say destroy sql instance, i expect it to be fully torn down (I don't care if there are connections open or if a user owns a database they shouldn't etc.. ) . Destroy means destroy... and from what I see, it currently does not because of the intertwined dependencies. Basically delete sql instance should be equivalent to: gcloud sql instances delete INSTANCE_NAME --quiet

Our config for our modules have the database instance -- users (depend on instance) -- databases (depend on users) FYI. I've seen random failures deleting the sql instance, some have come from a dev creating a new database with the user, and when terraform tries to teardown it fails as the user owns another database. If the database instance is being destroyed, i dont see why it would matter what the user owns :)

emilymye commented 4 years ago

Sorry this fell on the backlog, I think I still need some clarification before I actually start trying to fix this. I feel like there are multiple issues being presented by @JlUgia and since I'm not firsthand using Cloud SQL, please bear with me.

1) Postgres will prevent deletion of a user if a database depends on it, so deletion order must be enforced from database --> user. Terraform doesn't inherently have a dependency between user and database, so in some cases it tries to delete the user first and fails. This is solved by adding a dependency for the database on the user (i.e. database depends_on = [ google_sql_user.user ]).

2) Cloud SQL will prevent deletion of User if specifically the only root for a DBInstance? Because, google_sql_user relies on google_sql_dbinstance, Terraform when building the dependency graph (i.e. something managed by core, we can't change in the provider) forces the deletion of users before instances, but SQL demands you delete the instance (and the user will just get deleted during that). The real issue is then that we force you to create a new root user, i.e. delete the root user on creation of an master instance. This was because up until around summer of 2019, we couldn't set root_password on an instance and would have to modify the user, so rather than do this implicitly we just had the user do it on their own and make it one operation (i.e. apply instead of import post instance-creation). That means, the solution for this problem is to (as @JlUgia suggested) add a root_password field (currently in the beta-provider only) to DB Instance so TFers no longer have to create a separate sql_user for this purpose. On our end, we'll need to change the behavior to allow for root user to continue to exist

3) Here's where I got a bit lost:

Now, that's all great, unless one is in need of using the IPs for newly created machines to define the authorized nets / fw rules of the SQL instance. In that case, the machines need to be created first, hence deleted last, breaking the flow above.

Is this related to your issue @shaulishaked? This sounds like something I'm not sure how to manage this in Terraform, both because it's not an inherent dependency link in Terraform (or is it?) and becuase I'm not the most familiar with the proper deletion order for networking Cloud SQL instances. I guess I'm really confused by what this means. Do you have example config or do you mind giving a explaination, @JlUgia?

shaulishaked commented 4 years ago

Hi,

1) Explicit dependencies: I tried adding an explicit dependency. It did not resolve the issue. Probably because the error is related to "root" user, and not the user I created myself.

2) root_password field: I am using Google Postgres SQL. root_password is ignored in Postgres SQL: https://www.terraform.io/docs/providers/google/r/sql_database_instance.html#root_password.

I agree with @jclarksnps with the saying that "Basically delete sql instance should be equivalent to: gcloud sql instances delete INSTANCE_NAME --quiet". For now, I need to do pre-destroy commands manually instead of relying on the Terraform module, which I really want to remove.

Thanks again,

JlUgia commented 4 years ago

I think it's two separate issues we are discussing.

  1. @shaulishaked's issue relates to the fact that, when using Terraform, attempting to delete a Cloud SQL user fails if there are other objects making use of that user. I think I saw this issue in the past, and IIRC, it's not related to the instance but to the database objects created in Terraform. Have you tried adding a dependency to your databases such you users can only be deleted once the DBs have been destroyed (on your google_sql_database, add a depends_on = [ google_sql_user.user ])? We decided to remove any intra-application/service logic, like DB creation and management (google_sql_database), away from Terraform. After all, TF shines on infra/resource provisioning, and one can argue that databases inside of a database engine or instance is something beyond that purpose. In any case, @jclarksnps's point stands IMO: deleting a Cloud SQL instance probably should not look at application or software level dependencies (eg.: databases, connections, users), but only infrastructure related ones (eg.: IPs). Said so, the problem may automagically get resolved for those using the new root_password field introduced in the beta provider, since this will remove the dependency responsibility away from the TF user.

  2. The issue I experienced is also related to dependencies preventing deletion of a Cloud SQL instance, although in my case, it was VMs with active connections to the DB creating the dependency issue. @jclarksnps's suggestion would help in this case too. I'm happy to go into more detail about this one, though I feel it's beyond what's needed to address this case.

jclarksnps commented 4 years ago

@JlUgia Yeah our plan is to remove database creation from TF all-together, I agree that it does not belong there at all. I did try adding dependencies, although not reverse from the instance to the users. Most of my issues have been someone using the user to create another db or having connections open, which im not sure the dependencies would fix (unless I'm missing something)?

JlUgia commented 4 years ago

That's right per my understanding too. If there are other objects or connections open or created outside of the scope of Terraform, that'd conflict with the deletion of the resource.

kenmazaika commented 4 years ago

Any update on this? I'm running into this issue, too.

It seems the best option is to manually do terraform state rm on the postgres user outside of the terraform configuration.

Seems like a hack.

I tried to create a remote-exec or local-exec on the google_sql_user to automatically trigger the action when destroyed but it didn't seem to work.

IMO a good solution to this would be to allow me (the user) to execute custom SQL on destroy for the user.

Can't imagine terraform would be able to handle all use cases, but if we could drop into SQL to manually do what we need to during this sequence it would be extremely helpful.

I think if I could execute SQL that looks like this:

REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM username;
REVOKE ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public FROM username;
REVOKE ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public FROM username;
DROP USER username;

It could work, but would need some sort of SQL hook.

jclarksnps commented 4 years ago

@kenmazaika that's what we did, we remove the state and we actually use gcloud to tear the instance down.

kenmazaika commented 4 years ago

@jclarksnps Thanks for the update!

How do you hook into SQL and drop into the database? I think that local-exec to call terraform commands directly causes lock errors.

Did you just make a separate script to combine the terraform destroy and the manual SQL?

Thanks for the reply! Really appreciate it!

jclarksnps commented 4 years ago

@kenmazaika we actually use gcloud commands to tear the instance down, prior to that call we use terraform to remove the state for the db and its databases and users. We basically gave up on terraform doing the teardown. So in our bash that calls terraform (we use concourse pipelines), we first use gcloud to delete the instance and then we remove all the state, then we go about regular TF activities.

JlUgia commented 4 years ago

FWIW, I'm taking the same approach as @kenmazaika, force removing the user from the state prior to destroying the Cloud SQL instance.

bharathkkb commented 4 years ago

I was able to fix a similar issue where I was receiving

Error: Error, failed to deleteuser foo in instance foo-db: 

based on comment https://github.com/terraform-providers/terraform-provider-google/issues/3820#issuecomment-573665424 part 1 by explicitly enforcing dependency.

emilymye commented 4 years ago

It seems like this issue is now related to many different things. The original issue seems to be related to adding explicit dependencies and managing the root user in Terraform (given that Terraform should delete the root user upon instance creation unless on a replica instance). I'm going to close this issue and ask that @JlUgia and other folks open a new issue to discuss ways in which we can manage , which is difficult - as you may have figured out, there isn't a way Terraform can force destroy objects that Terraform doesn't know about and I'm hesistant to make terraform handle SQL commands or similar.

ghost commented 4 years ago

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!