Closed shaulishaked closed 4 years ago
If you can repro and provide debug logs it helps a lot (i.e. TF_LOG=DEBUG terraform apply
) but my guess is that the user owns the created database and the error is happening when the user delete call happens before the database deletion on a given run. Could you confirm this from your terraform logs or see if adding a explicit depends_on = ["google_sql_user.<name>-db-user"]
to the database fixes the issue?
I'm trying to re-produce while running with the debug enabled. Since it's not consistent behavior, it's harder.
(By the way, I tried upgrading Terraform to 0.12.1 and the Google provider, and the same thing happened).
I will attach with debug enabled as soon as it happens again.
Thanks,
Added logs from Terraform.
I tried to remove all the resources that are not relevant in the log.
Hope this will help.
Hi @shaulishaked, could you actually give the full logs, if possible? The debug logs you gave are missing some of the HTTP requests and logs I might need to debug.
If you could also give the config you are using or which users and databases are linked to which instances, that would be helpful.
Hello @emilymye,
Could I send you the full logs and the config in private? I prefer not to share it publicly.
Thanks again,
Hi,
Any update on this?
I sent the logs I could share, and I can send full logs privately. Upgrading both Terraform and the Provider didn't help.
We are unable to destroy a setup in one run and require to run Terraform twice, which isn't reliable.
We still get sporadic errors of:
Error: Error, failed to deleteuser root in instance server-db-instance: googleapi: Error 400: Invalid request: Failed to delete user root. Detail: pq: role "root" cannot be dropped because some objects depend on it
., invalid
Please let me know if any additional info can help.
Thanks,
We have the same problem as well.
terraform --version
Terraform v0.12.5
Hi @shaulishaked - so sorry for the late reply! If you are still encountering this issue, you can send your logs to me at my google.com address (emilyye).
I have encountered this issue too. I think this is the result of dependencies at the DB level, getting to be at conflict with others at the infra level that we define ourselves depending on how we layout our configuration files at Terraform.
At the DB level: Database -> User -> Instance
, such that users can't be deleted if there are DBs that depend on them (eg.: they own), and users themselves can't be deleted if there are instances with dependencies to users.
At the same time, DBs will conflict at deletion time if there are resources using them (eg.: existing connections). With that, there is a deletion flow that allows for these dependencies to succeed:
Delete Instances connected to the database instance -> Database -> Users -> DB instance
Now, that's all great, unless one is in need of using the IPs for newly created machines to define the authorized nets / fw rules of the SQL instance. In that case, the machines need to be created first, hence deleted last, breaking the flow above.
One way to circumvent that is to ignore users and DBs at destroy time by taking manually these out of the state (terraform state rm <resource>
), but the need to pollute a pipeline with custom code like that is far from ideal.
Without getting into more fundamental discussions of who has the responsibility to provision infra and DB schemas, one pattern that I find appealing is to have Terraform take care of the infra, which includes the creation of an implicit root / admin / base user and use SW provisioning technologies (Spinnaker, Ansible, [local-remote]-exec
) to take care of the rest. Cloud SQL already includes a default postgres
user, though to set its password it is necessary to create an additional google_sql_user
resource (that brings back the dependency issues at destroy time mentioned above).
A solution to this would be to add the ability to set the default password for the instance on the google_sql_database_instance
resource itself, such that, creating and destroying the admin user and password is tightly bound to the provisioning of the instance, and hence something that happens synchronously in one block.
@emilymye can what @JlUgia wrote help you? Can you work with that?
I'll try to reproduce again and send you the logs once this happens on my setup again.
Thanks,
We're having lots of problems with the sql instance relating to the above feedback, basically when I say destroy sql instance, i expect it to be fully torn down (I don't care if there are connections open or if a user owns a database they shouldn't etc.. ) . Destroy means destroy... and from what I see, it currently does not because of the intertwined dependencies. Basically delete sql instance should be equivalent to: gcloud sql instances delete INSTANCE_NAME --quiet
Our config for our modules have the database instance -- users (depend on instance) -- databases (depend on users) FYI. I've seen random failures deleting the sql instance, some have come from a dev creating a new database with the user, and when terraform tries to teardown it fails as the user owns another database. If the database instance is being destroyed, i dont see why it would matter what the user owns :)
Sorry this fell on the backlog, I think I still need some clarification before I actually start trying to fix this. I feel like there are multiple issues being presented by @JlUgia and since I'm not firsthand using Cloud SQL, please bear with me.
1) Postgres will prevent deletion of a user if a database depends on it, so deletion order must be enforced from database --> user. Terraform doesn't inherently have a dependency between user and database, so in some cases it tries to delete the user first and fails. This is solved by adding a dependency for the database on the user (i.e. database depends_on = [ google_sql_user.user ]).
2) Cloud SQL will prevent deletion of User if specifically the only root
for a DBInstance? Because, google_sql_user
relies on google_sql_dbinstance
, Terraform when building the dependency graph (i.e. something managed by core, we can't change in the provider) forces the deletion of users before instances, but SQL demands you delete the instance (and the user will just get deleted during that). The real issue is then that we force you to create a new root user, i.e. delete the root user on creation of an master instance. This was because up until around summer of 2019, we couldn't set root_password
on an instance and would have to modify the user, so rather than do this implicitly we just had the user do it on their own and make it one operation (i.e. apply instead of import post instance-creation). That means, the solution for this problem is to (as @JlUgia suggested) add a root_password
field (currently in the beta-provider only) to DB Instance so TFers no longer have to create a separate sql_user for this purpose. On our end, we'll need to change the behavior to allow for root user to continue to exist
3) Here's where I got a bit lost:
Now, that's all great, unless one is in need of using the IPs for newly created machines to define the authorized nets / fw rules of the SQL instance. In that case, the machines need to be created first, hence deleted last, breaking the flow above.
Is this related to your issue @shaulishaked? This sounds like something I'm not sure how to manage this in Terraform, both because it's not an inherent dependency link in Terraform (or is it?) and becuase I'm not the most familiar with the proper deletion order for networking Cloud SQL instances. I guess I'm really confused by what this means. Do you have example config or do you mind giving a explaination, @JlUgia?
Hi,
1) Explicit dependencies: I tried adding an explicit dependency. It did not resolve the issue. Probably because the error is related to "root" user, and not the user I created myself.
2) root_password field: I am using Google Postgres SQL. root_password is ignored in Postgres SQL: https://www.terraform.io/docs/providers/google/r/sql_database_instance.html#root_password.
I agree with @jclarksnps with the saying that "Basically delete sql instance should be equivalent to: gcloud sql instances delete INSTANCE_NAME --quiet". For now, I need to do pre-destroy commands manually instead of relying on the Terraform module, which I really want to remove.
Thanks again,
I think it's two separate issues we are discussing.
@shaulishaked's issue relates to the fact that, when using Terraform, attempting to delete a Cloud SQL user fails if there are other objects making use of that user. I think I saw this issue in the past, and IIRC, it's not related to the instance but to the database objects created in Terraform. Have you tried adding a dependency to your databases such you users can only be deleted once the DBs have been destroyed (on your google_sql_database
, add a depends_on = [ google_sql_user.user ]
)?
We decided to remove any intra-application/service logic, like DB creation and management (google_sql_database
), away from Terraform. After all, TF shines on infra/resource provisioning, and one can argue that databases inside of a database engine or instance is something beyond that purpose.
In any case, @jclarksnps's point stands IMO: deleting a Cloud SQL instance probably should not look at application or software level dependencies (eg.: databases, connections, users), but only infrastructure related ones (eg.: IPs).
Said so, the problem may automagically get resolved for those using the new root_password
field introduced in the beta provider, since this will remove the dependency responsibility away from the TF user.
The issue I experienced is also related to dependencies preventing deletion of a Cloud SQL instance, although in my case, it was VMs with active connections to the DB creating the dependency issue. @jclarksnps's suggestion would help in this case too. I'm happy to go into more detail about this one, though I feel it's beyond what's needed to address this case.
@JlUgia Yeah our plan is to remove database creation from TF all-together, I agree that it does not belong there at all. I did try adding dependencies, although not reverse from the instance to the users. Most of my issues have been someone using the user to create another db or having connections open, which im not sure the dependencies would fix (unless I'm missing something)?
That's right per my understanding too. If there are other objects or connections open or created outside of the scope of Terraform, that'd conflict with the deletion of the resource.
Any update on this? I'm running into this issue, too.
It seems the best option is to manually do terraform state rm
on the postgres user outside of the terraform configuration.
Seems like a hack.
I tried to create a remote-exec
or local-exec
on the google_sql_user
to automatically trigger the action when destroy
ed but it didn't seem to work.
IMO a good solution to this would be to allow me (the user) to execute custom SQL on destroy for the user.
Can't imagine terraform would be able to handle all use cases, but if we could drop into SQL to manually do what we need to during this sequence it would be extremely helpful.
I think if I could execute SQL that looks like this:
REVOKE ALL PRIVILEGES ON ALL TABLES IN SCHEMA public FROM username;
REVOKE ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public FROM username;
REVOKE ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public FROM username;
DROP USER username;
It could work, but would need some sort of SQL hook.
@kenmazaika that's what we did, we remove the state and we actually use gcloud to tear the instance down.
@jclarksnps Thanks for the update!
How do you hook into SQL and drop into the database? I think that local-exec
to call terraform commands directly causes lock errors.
Did you just make a separate script to combine the terraform destroy
and the manual SQL?
Thanks for the reply! Really appreciate it!
@kenmazaika we actually use gcloud commands to tear the instance down, prior to that call we use terraform to remove the state for the db and its databases and users. We basically gave up on terraform doing the teardown. So in our bash that calls terraform (we use concourse pipelines), we first use gcloud to delete the instance and then we remove all the state, then we go about regular TF activities.
FWIW, I'm taking the same approach as @kenmazaika, force removing the user from the state prior to destroying the Cloud SQL instance.
I was able to fix a similar issue where I was receiving
Error: Error, failed to deleteuser foo in instance foo-db:
based on comment https://github.com/terraform-providers/terraform-provider-google/issues/3820#issuecomment-573665424 part 1 by explicitly enforcing dependency.
It seems like this issue is now related to many different things. The original issue seems to be related to adding explicit dependencies and managing the root user in Terraform (given that Terraform should delete the root user upon instance creation unless on a replica instance). I'm going to close this issue and ask that @JlUgia and other folks open a new issue to discuss ways in which we can manage , which is difficult - as you may have figured out, there isn't a way Terraform can force destroy objects that Terraform doesn't know about and I'm hesistant to make terraform handle SQL commands or similar.
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!
Community Note
Terraform Version
Terraform v0.11.13
Affected Resource(s)
Terraform Configuration Files
Panic Output
Expected Behavior
Terraform should destroy the user and database successfully on the first run.
Actual Behavior
Database and user aren't deleted on the first run. Re-run (with the same code) deletes successfully (sometimes require more than one re-run).
Steps to Reproduce
terraform destroy
0000