scaleway / terraform-provider-scaleway

Terraform Scaleway provider
https://www.terraform.io/docs/providers/scaleway/
Mozilla Public License 2.0
199 stars 125 forks source link

When account is flagged by trust and safety team, deleting resources fails silently #884

Closed jpetazzo closed 2 years ago

jpetazzo commented 3 years ago

Okay, buckle up, because this might be very hard to reproduce and test, but... Here we go!

I am currently using Scaleway (specifically, Kapsule) to deploy a large number of Kubernetes clusters as lab environments for students. I'm doing that with Terraform, and as part of my tests, I routinely deploy 50 clusters at a time.

Yesterday, my account was flagged by the trust and safety team. While my account is flagged, it looks like a lot of API calls behave differently.

I can still list clusters:

$ scw k8s cluster list
...
978f9cea-2062-47ae-a626-e504565aa9ef  tf-my3jc-138  ready
a8f40682-2b47-46ea-84a5-dd5ad9ada72a  tf-my3jc-123  ready

But other calls will fail:

$ scw k8s cluster get a8f40682-2b47-46ea-84a5-dd5ad9ada72a
Cannot find resource 'cluster' with ID 'a8f40682-2b47-46ea-84a5-dd5ad9ada72a'
$ scw k8s cluster delete a8f40682-2b47-46ea-84a5-dd5ad9ada72a
Cannot find resource 'cluster' with ID 'a8f40682-2b47-46ea-84a5-dd5ad9ada72a'

Now, here comes the problem: this cluster was created by Terraform, and when I ran terraform destroy, Terraform said that it removed it (it said that the resource was destroyed successfully, and it remove it from my Terraform state) but the cluster was still there. At first, I didn't notice (I thought I had successfully destroyed my clusters) until I noticed that I had... 50 too many clusters 😅

I don't know if that can fixed exclusively in the Terraform provider or if there might be a way to figure out the account status with the API.

We might also decide that it's a very rare corner case and that we don't care 🤷🏻 but I wanted to report this just in case.

Terraform Version

$ terraform -v
Terraform v1.0.8
on linux_amd64
+ provider registry.terraform.io/hashicorp/kubernetes v2.0.3
+ provider registry.terraform.io/hashicorp/local v2.1.0
+ provider registry.terraform.io/hashicorp/random v3.1.0
+ provider registry.terraform.io/scaleway/scaleway v2.1.0

Affected Resource(s)

Terraform Configuration Files

You can find the Terraform plan that I use here:

https://github.com/jpetazzo/container.training/tree/main/prepare-scw

By default it will deploy 1 cluster with 2 nodes but that can be changed with a couple of env vars, as indicated in the README.md.

Debug Output

N/A

Panic Output

N/A

Expected Behavior

When running terraform destroy, Terraform should tell me that the resources could not be destroyed.

Actual Behavior

Terraform said that the resources were destroyed, but in fact, they were not destroyed.

Steps to Reproduce

Sh4d1 commented 3 years ago

Don't let your account getting locked? 😄 But yeah it's not supposed to happen, I'll forward!

Mia-Cross commented 2 years ago

Thank you for reporting this, it was indeed something worth investigating. From what I got discussing with many teams at Scaleway, here was the problem :

When an account gets locked, the user loses permissions over their resources. For security concerns, we don't want to leak information about existing resources so if an unauthorized user tries to GET a resource with a random ID and that the response is "403 forbidden", they know that a resource with this ID exists. As of today, the permissions API answers "permission denied" whether you try to get a resource that is not yours or one that you created but don't have the rights on anymore, so to prevent discovery from unauthorized users, the k8s API answers "404 not found" in both cases of "resource not found" and "permission denied".

When you ran terraform destroy, it tried to delete your clusters, but since the k8s API returned a 404, Terraform considered that the resources were already destroyed and that it was a success. The reason why you are still able to see your clusters with a list request is that we chose to display them even if the resources are locked so that you know they still exist and were not deleted by us, but any request on a particular ID will return 404.

After the discussion with the teams, it was decided that for a better user experience with the Terraform provider, we wanted our API answers to be more accurate about what's going on, so we will soon deploy a new version of the API that returns 403 in case of "permission denied".

jpetazzo commented 2 years ago

Great! Thanks a lot for the follow-up (and detailed explanation!), that's much appreciated. And I think that's a great solution too :)

Cheers,