kislerdm / terraform-provider-neon

Terraform provider to manage Neon SaaS resources
https://registry.terraform.io/providers/kislerdm/neon/latest/docs
Mozilla Public License 2.0
34 stars 13 forks source link

Second "terraform apply" destroys (force replace) the original project #83

Closed kbumsik closed 7 months ago

kbumsik commented 7 months ago

Hi there,

I made the following resources, with Neon Project, Role (name="app"), and databases (name="app")

The resources are created well in the first terraform apply But in the second terraform apply, I got the whole project is destroyed and recreated.

Terraform Version

terraform -v
Terraform v1.4.6
on darwin_arm64
+ provider registry.terraform.io/gavinbunney/kubectl v1.14.0
+ provider registry.terraform.io/hashicorp/aws v5.37.0
+ provider registry.terraform.io/hashicorp/cloudinit v2.3.3
+ provider registry.terraform.io/hashicorp/helm v2.10.1
+ provider registry.terraform.io/hashicorp/kubernetes v2.26.0
+ provider registry.terraform.io/hashicorp/null v3.2.2
+ provider registry.terraform.io/hashicorp/random v3.6.0
+ provider registry.terraform.io/hashicorp/time v0.10.0
+ provider registry.terraform.io/hashicorp/tls v4.0.5
+ provider registry.terraform.io/kislerdm/neon v0.4.1

Provider Version

v0.4.1

Affected Resource(s)

Please list the resources as a list, for example:

Terraform Configuration Files

resource "neon_project" "this" {
  name = "project-name"
  region_id           = "aws-us-west-2"
  pg_version          = "16"
  compute_provisioner = "k8s-neonvm"

  enable_logical_replication = false

  history_retention_seconds = 604800 # 7 days

  branch {
    name = "main"
    database_name = "do_not_use_neon_default"
    role_name = "main"
  }

  default_endpoint_settings {
    autoscaling_limit_min_cu = 0.25
    autoscaling_limit_max_cu = 0.25
    suspend_timeout_seconds = 300 # 5 min
  }
}

resource "neon_role" "this" {
  project_id = neon_project.this.id
  branch_id  = neon_project.this.default_branch_id

  name       = "app"
}

resource "neon_database" "this" {
  name       = "app"

  project_id = neon_project.this.id
  branch_id  = neon_project.this.default_branch_id
  owner_name = neon_role.this.name
}

Debug Output

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this
plan:

  # module.neon_project_platform.neon_project.this has changed
  ~ resource "neon_project" "this" {
      ~ connection_uri                  = (sensitive value)
      ~ database_name                   = "do_not_use_neon_default" -> "app"
      ~ database_password               = (sensitive value)
      ~ database_user                   = "main" -> "app"
        id                              = "sparkling-glitter-51973216"
        name                            = "project-name"
        # (10 unchanged attributes hidden)

      ~ branch {
          ~ database_name = "do_not_use_neon_default" -> "app"
            id            = "br-odd-breeze-a6y64xd0"
            name          = "main"
          ~ role_name     = "main" -> "app"
        }

        # (1 unchanged block hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the
following plan may include actions to undo or respond to these changes.

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following
symbols:
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.neon_project_platform.neon_database.this must be replaced
-/+ resource "neon_database" "this" {
      ~ branch_id  = "br-odd-breeze-a6y64xd0" # forces replacement -> (known after apply) # forces replacement
      ~ id         = "sparkling-glitter-51973216/br-odd-breeze-a6y64xd0/app" -> (known after apply)
        name       = "app"
      ~ project_id = "sparkling-glitter-51973216" # forces replacement -> (known after apply) # forces replacement
        # (1 unchanged attribute hidden)
    }

  # module.neon_project_platform.neon_project.this must be replaced
-/+ resource "neon_project" "this" {
      - allowed_ips                     = [] -> null
      ~ connection_uri                  = (sensitive value)
      ~ database_host                   = "ep-sweet-dew-a6gxwisp.us-west-2.aws.neon.tech" -> (known after apply)
      ~ database_name                   = "app" -> (known after apply)
      ~ database_password               = (sensitive value)
      ~ database_user                   = "app" -> (known after apply)
      ~ default_branch_id               = "br-odd-breeze-a6y64xd0" -> (known after apply)
      ~ id                              = "sparkling-glitter-51973216" -> (known after apply)
        name                            = "project-name"
        # (7 unchanged attributes hidden)

      ~ branch {
          ~ database_name = "app" -> "do_not_use_neon_default" # forces replacement
          ~ id            = "br-odd-breeze-a6y64xd0" -> (known after apply)
            name          = "main"
          ~ role_name     = "app" -> "main" # forces replacement
        }

        # (1 unchanged block hidden)
    }

  # module.neon_project_platform.neon_role.this must be replaced
-/+ resource "neon_role" "this" {
      ~ branch_id  = "br-odd-breeze-a6y64xd0" # forces replacement -> (known after apply) # forces replacement
      ~ id         = "sparkling-glitter-51973216/br-odd-breeze-a6y64xd0/app" -> (known after apply)
        name       = "app"
      ~ password   = (sensitive value)
      ~ project_id = "sparkling-glitter-51973216" # forces replacement -> (known after apply) # forces replacement
      ~ protected  = false -> (known after apply)
    }

Plan: 3 to add, 0 to change, 3 to destroy.

Expected Behavior

The second terraform apply must not destroy the project.

Actual Behavior

The second terraform apply forces replaces the project.

Steps to Reproduce

  1. terraform apply
  2. and terraform apply again

Important Factoids

Maybe the default neon_project.branch.role_name and neon_project.branch.database_name are tracked as the first role/database in the API respone, but the API sorts them in the name order, so the name "app" comes first before "main"?

kbumsik commented 7 months ago

It feels like Neon API is not friendly to Terraform unfortunately :(

kislerdm commented 7 months ago

@kbumsik Thank you for reporting the issue! I'll try to reproduce and come back to you. Thanks!

kislerdm commented 7 months ago

The issue is confirmed.

The root cause

The coupling between the project's default branch and the project's default database and role.

The context

terraform-provider-neon-issue-83 (1)

The relationship between the Neon entities can be represented by the above graph. It illustrates the function's resourceProjectRead side effect.

The terraform state is updated using the result of the function newDbConnectionInfo which relies on the order of the project's databases array. This array is the response of the Neon API called by the SDK method ListProjectBranchDatabases to list the databases which belong to the primary branch. The array's order is not guaranteed.

Proposed solution

The following changes will be implemented to the terraform provider:

cc: @kbumsik

kbumsik commented 7 months ago

Thanks for looking into the problem so quickly!

I personally would suggest a breaking change: Remove default branch/database/endpoint.

The coupling between the project's default branch and the project's default database and role.

+1 Also the default compute endpoint as well. I think this is really unfortunate decision made by Neon. This doesn't make much sense for Terraform users' point of view:

  1. This is an implicit resource creation that is hard to manage correctly. This causes weird TF experience:
    • Users can manage branch/database/endpoint as a TF resource, but users can't manage default branch/database/endpoint as a separate resource when they are exactly the same...?
    • Users cannot delete the default branch/database/endpoint on TF (unless deleting the project), but it is actually deletable resources in Neon console.
  2. They are not actually coupled to the project.
  3. This is the root of a lot of bugs:
    • Mutating the default_endpoint_settings doesn't work.
    • ...and this bug

In conclusion, the default branch/database/endpoint are not actually "default" resources that are not coupled to a project resource - they are "auto-generated" resources. Any efforts to sync them correctly would not have strong guarantee.

The beauty of Terraform is that we can manage resources explicitly. Implicit resources are bad. Let's find a way to treat them as a separated, normal branch/database/endpoint resource block.

Proposed solution

Currently, this is my workaround.

Now I won't get any weird states. I can manage my own resources without implicitly created resource.

So how about doing the same thing in project resource? This is my proposal when we create a project resource with terraform apply:

  1. Call API to create a project.
  2. Call API to delete auto-created database, role, and compute endpoint
  3. After TF checks there are no databases, role, and compute endpoint, complete project resource creation.
  4. Only sync project attributes like id and other settings and the "main" root branch., the default branch at least has a trackable id.

This surely a breaking change. But I think this allow us to manage resources more explicit, and have a better TF experience.

kislerdm commented 7 months ago

@kbumsik Thank you for your proposal! I'm glad to hear about and support your vision towards simplicity and explicit resources definition!

BLUF

We shall think of the end users and how this provider may enable them in the most effective way. Would explicit definition of three additional resources required to start using the project be valuable for the users enough for us to justify worsening the ease of use for the sake of simplicity?

Please describe you proposal as RFC to collect community feedback. I'll help advertising it to boost the process.

The rationale: I'm personally in favour of your proposal. However, the provider is used by dozens of projects already, and I'm reluctant to introduce such a breaking change without asking the community for a feedback.

Context

I considered going down the path your suggested, i.e. delete all default resources associated with the project as part of neon_project provisioning, before the initial provider's release last year. However, I decided against that strategy, because the provider is intended to be a thin terraform adapter to the Neon API. In other words, the resource neon_project is expected to provision all Neon resources created as a result of the create project operation. Moreover, as you rightfully mentioned, newly created Neon project cannot be isolated from the "default" child resources: the root branch cannot be deleted.

I think this is really unfortunate decision made by Neon

I think that the Neon team's vision was exactly on the money: they were after solo-developers, and small startups to gain traction on the market. I reckon they manage over 130k databases to date (the source), and probably over 50k active users. It makes me think that their decision was, in fact, quite fortunate because apparently they successfully solved key problems for their user. 😉

Users cannot delete the default branch/database/endpoint on TF (unless deleting the project), but it is actually deletable resources in Neon console.

Any efforts to sync them correctly would not have strong guarantee.

I reckon this provider can deliver on such guarantees. For example, the default resources could be identified by their Neon IDs and the creation timestamp. The logic will be shipped as part of the next release after this issue is resolved.

Now I won't get any weird states. I can manage my own resources without implicitly created resource.

You are right, the root branch can be uniquely identified by the flag primary.

Neon API ListBranches raw responce ```json { "branches": [ { "id": "br-divine-flower-a61yfcdi", "project_id": "spring-pine-47090588", "name": "main", "current_state": "ready", "logical_size": 30801920, "creation_source": "console", "primary": true, "cpu_used_sec": 81, "compute_time_seconds": 81, "active_time_seconds": 300, "written_data_bytes": 32985488, "data_transfer_bytes": 0, "created_at": "2024-03-08T16:00:46Z", "updated_at": "2024-03-08T19:06:09Z" } ] } ```

Alternative

I can propose an additional resource neon_project_v2 which will follow the behaviour proposed by you. WDYT?

Thank you!