stackitcloud / terraform-provider-stackit

The official Terraform provider for STACKIT
https://registry.terraform.io/providers/stackitcloud/stackit
Apache License 2.0
35 stars 13 forks source link

Data Loss: Deleting one SKE Project resource deletes all SKE Clusters in the STACKIT Project #294

Closed jebreuer closed 5 months ago

jebreuer commented 6 months ago

Hi all! Me again. :)

I experienced a data loss issue.

Please see the example tf file attached for a minimal example reproducing the issue: ske-bug.tf.txt

I define a SKE Project resource for each SKE Cluster. After terraform apply the result is as follows:

$ stackit ske cluster list -p $redacted

 NAME  │ STATE         │ VERSION │ POOLS │ MONITORING
───────┼───────────────┼─────────┼───────┼────────────
 ske-1 │ STATE_HEALTHY │ 1.27.11 │     1 │ Disabled
 ske-2 │ STATE_HEALTHY │ 1.27.11 │     1 │ Disabled

Deleting the resources stackit_ske_cluster.ske-2 and stackit_ske_project.ske-project-2 causes an error after a couple of minutes:

[...]
stackit_ske_cluster.ske-2: Destruction complete after 8m12s
stackit_ske_project.ske-project-2: Destroying... [id=e1c540fe-97de-4fa0-a244-c640c98e6021]
╷
│ Error: Error deleting credential
│
│ Calling API: 409 Conflict, status code 409, Body: {"code":"AlreadyExists","message":"already exists: projects","details":""}
│
╵

And the resource stackit_ske_project.ske-1 gets deleted implicitly without consent:

stackit ske cluster list -p $redacted

 NAME  │ STATE          │ VERSION │ POOLS │ MONITORING
───────┼────────────────┼─────────┼───────┼────────────
 ske-1 │ STATE_DELETING │ 1.27.11 │     1 │ Disabled

Not only do I experience data loss the terraform state is also corrupt and needs manual cleanup (terraform state rm ...) for the remaining SKE related resources.

Observation: The resource stackit_ske_project seems to be unique for every STACKIT project. Subsequent instances (here: stackit_ske_project.ske-project-2) seem to be referencing the exact same thing.

Expectation 1: If the observation is true then creating a second instance of stackit_ske_project should fail.

Expectation 2: Deleting a stackit_ske_project that still contains clusters should fail. The current behavior is an equivalent of rm -Rf * which can cause implicit and harmful data loss.

Alternatively: Get rid of stackit_ske_project resource alltogether as it does not add any value at all (besides frustration), IMHO.

PeterStolz commented 6 months ago

Yeah, same thing happend to me this morning. From what I understood the best solution is probably to put the service enablement which stackit_ske_project provides into stackit_ske_cluster and the project should never be deleted (the enduser is not billed if it exists so I don't see a problem with that) This is at least a good temporary fix for users unless I am missing something.

joaopalet commented 6 months ago

Hello @jebreuer,

First of all, sorry for the inconvenience!

As discussed in #273, the stackit_ske_project resource is just used for enabling the SKE for a STACKIT project, and as you also pointed in another issue we should and will move the enablement logic into the stackit_ske_cluster resource directly (and deprecate stackit_ske_project in a future release).

jebreuer commented 6 months ago

If you can't provide a fix in a timely manner please update the documentation accordingly. It is likely that people shred their prod systems so they deserve at least a proper heads up.

GokceGK commented 5 months ago

Thestackit_ske_project resource is deprecated in the new terraform provider release v0.15.0.

Besides that, service enablement logic has been moved into the stackit_ske_cluster resource.