The provider doesn't seem to understand that volumes inherit from `volume.XYZ` keys on the pool

stgraber commented 4 months ago

I've noticed OpenTofu getting pretty confused about a volume.zfs.remove_snapshots=true config key appearing on a volume that's defined in terraform. That's because the config key was set on the parent storage pool and so propagated to the new volume.

That's normal and we definitely don't want terraform to try to rectify this by deleting and re-creating the volume as it attempted here :)

maveonair commented 4 months ago

This key could be probably added here: https://github.com/lxc/terraform-provider-incus/blob/main/internal/storage/resource_storage_pool.go#L337-L372

stgraber commented 4 months ago

Isn't that the list of computed pool keys as opposed to inherited keys on the volumes and buckets?

maveonair commented 4 months ago

Isn't that the list of computed pool keys as opposed to inherited keys on the volumes and buckets?

In other words, you first created the pool without Terraform/OpenTofu and then imported it with Terraform/OpenTofu?

How can I reproduce your setup to get the same issue?

stgraber commented 4 months ago

incus storage pool create foo zfs volume.zfs.remove_snapshots=true`

And then have Terraform create a new storage volume using that foo storage pool.

adamcstephens commented 4 months ago

Does this same behavior happen if the pool is created with a tf resource?

stgraber commented 4 months ago

Good question, though not really relevant as the case I was dealing with was one where Incus is acting as a cloud, so the TF user doesn't have any control on the storage pools, they get an empty project and can only create project-tied resources.

stgraber commented 4 months ago

stgraber@castiana:~/demo/test$ tofu apply

OpenTofu used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

OpenTofu will perform the following actions:

  # incus_storage_pool.test will be created
  + resource "incus_storage_pool" "test" {
      + config = {
          + "volume.zfs.remove_snapshots" = "true"
        }
      + driver = "zfs"
      + name   = "test"
    }

  # incus_volume.test will be created
  + resource "incus_volume" "test" {
      + config       = {
          + "size" = "20GiB"
        }
      + content_type = "block"
      + location     = (known after apply)
      + name         = "test"
      + pool         = "test"
      + target       = (known after apply)
      + type         = "custom"
    }

Plan: 2 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  OpenTofu will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

incus_storage_pool.test: Creating...
incus_storage_pool.test: Creation complete after 1s [name=test]
incus_volume.test: Creating...
╷
│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to incus_volume.test, provider "provider[\"registry.opentofu.org/lxc/incus\"]" produced an unexpected new value: .config: new element "zfs.remove_snapshots" has appeared.
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
stgraber@castiana:~/demo/test$ cat main.tf 
terraform {
  required_providers {
    incus = {
      source  = "lxc/incus"
      version = "0.1.1"
    }
  }
}

resource "incus_storage_pool" "test" {
  name   = "test"
  driver = "zfs"
  config = {
    "volume.zfs.remove_snapshots" = "true"
  }
}

resource "incus_volume" "test" {
  name         = "test"
  pool         = incus_storage_pool.test.name
  content_type = "block"
  config = {
    "size" = "20GiB"
  }
}
stgraber@castiana:~/demo/test$

So yeah, even if created by TF, it still happens.

adamcstephens commented 4 months ago

Yeah I suspected it was the case either way. Inherited resources should be ignored completely since they technically come from another resource. Can they be excluded or identified in the api?

where Incus is acting as a cloud, so the TF user doesn't have any control on the storage pools, they get an empty project and can only create project-tied resources.

If we ever need to represent unmanaged resources, I think the idiomatic solution is to model Data Sources that can still be represented on the TF side.

stgraber commented 4 months ago

The provider can do GetStoragePool then look at the Config map for a anything that's volume.XYZ then for any such key, XYZ should be treated as inherited in the volume or bucket if its value matches that of the pool volume.XYZ

maveonair commented 2 months ago

@stgraber Regarding the storage pool volume, this PR fixes the problem that the volume takes into account the inherited configuration value: https://github.com/lxc/terraform-provider-incus/pull/66

Regarding the creation of storage pools, however, I am not sure whether a correction is really necessary here:

Test Case

Create the storage pool without Terraform, e.g.:

$ incus storage create test zfs volume.zfs.remove_snapshots=true

Define the resource in Terraform and import it:

resource "incus_storage_pool" "test" {
  name    = "test"
  project = "default"
  driver  = "zfs"
}

$ terraform import incus_storage_pool.test default/test
incus_storage_pool.test: Importing from ID "default/test"...
incus_storage_pool.test: Import prepared!
  Prepared incus_storage_pool for import
incus_storage_pool.test: Refreshing state... [name=test]

Import successful!

The resources that were imported are shown above. These resources are now in
your Terraform state and will henceforth be managed by Terraform.

Run terraform plan again:

$ terraform plan
incus_storage_pool.test: Refreshing state... [name=test]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the
following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # incus_storage_pool.test will be updated in-place
  ~ resource "incus_storage_pool" "test" {
      ~ config      = {
          - "volume.zfs.remove_snapshots" = "true" -> null
        }
        name        = "test"
        # (3 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

So when we created the pool, the configuration was imported correctly, as we can see in the status file:

$ cat terraform.tfstate
{
  "version": 4,
  "terraform_version": "1.8.3",
  "serial": 62,
  "lineage": "863f8d8e-b4f8-b4cc-3bbe-5a247ae7e02e",
  "outputs": {},
  "resources": [
    {
      "mode": "managed",
      "type": "incus_storage_pool",
      "name": "test",
      "provider": "provider[\"registry.terraform.io/lxc/incus\"]",
      "instances": [
        {
          "schema_version": 0,
          "attributes": {
            "config": {
              "volume.zfs.remove_snapshots": "true"
            },
            "description": "",
            "driver": "zfs",
            "name": "test",
            "project": "default",
            "remote": null,
            "target": null
          },
          "sensitive_attributes": []
        }
      ]
    }
  ],
  "check_results": null
}

But when we run terraform plan again, the local configuration state is different from the remote state. This means that the resource definition is missing the corresponding configuration entry and therefore wants to force the state change. To rectify this, you must add the corresponding configuration entry to the resource definition:

resource "incus_storage_pool" "test" {
  name    = "test"
  project = "default"
  driver  = "zfs"
  config = {
    "volume.zfs.remove_snapshots" = "true"
  }
}

$ terraform plan
incus_storage_pool.test: Refreshing state... [name=test]

No changes. Your infrastructure matches the configuration.

Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are
needed.

So we could argue that this behavior is correct if you want to use the resource incus_storage_pool. If you want to ignore this completely, then we should rather introduce a data source for incus_storage_pool like we did for incus_profile that allows you to read the status but not change anything. How do you see this?

stgraber commented 2 months ago

The behavior for the storage pool is fine, the problem is really with storage pool keys propagating to volumes. I don't think just piling up "computed" keys is correct here.

One can create a volume and specifically set say block.filesystem=ext4 during creation time. But someone can similarly create a volume without specifying any block.filesystem but because the pool the volume is created in has volume.block.filesystem set to xfs, the resulting volume will now have Incus set block.filesystem=xfs unless another value is specified.

This is true of ALL volume keys, possibly including arbitrary user ones (not tested), so a storage pool having volume.user.foo=abc should result in a volume being created on that pool to get user.foo=abc.

So if going with marking keys computed, every single storage volume config key would need to be treated as computed which doesn't seem right.

maveonair commented 2 months ago

Your example with block.filesytem is currently handled as a computed key: https://github.com/lxc/terraform-provider-incus/blob/main/internal/storage/resource_storage_volume.go#L382 as we have inherited when we forked the provider. Or am I wrong?

stgraber commented 2 months ago

Bad example, but we have 20+ other keys which aren't covered by the hardcoded list, as mentioned it's effectively every single volume config key, which is why treating them all as computed kinda feels wrong.

stgraber commented 2 months ago

Just looking at one storage driver (ZFS):

block.filesystem
block.mount_options
security.shared
security.shifted
security.unmapped
snapshots.expiry
snapshots.pattern
snapshots.schedule
zfs.blocksize
zfs.block_mode
zfs.delegate
zfs.remove_snapshots
zfs.use_refquota
zfs.reserve_space

maveonair commented 2 months ago

I see and funnily we have implemented many of those keys example already in the computed keys for the storage pool based on the driver: https://github.com/lxc/terraform-provider-incus/blob/d570e743a8285b22bbe5779844960a7254c89de1/internal/storage/resource_storage_pool.go#L337

So if I understand you correctly then you want to avoid that we hard code these keys but rather make a lookup first to Incus to determine which keys are already set and ignore them? Do you don’t see an issue that we might end up in a state situation where actually the storage volume to resource cannot be trusted anymore because we accept what ever we get from Incus, instead of “forcing” the user to be very explicit about its intentions?

stgraber commented 2 months ago

I don't know how flexible the terraform plugin logic is for that, but ideally, we'd make it ignore zfs.use_refquota being set to true automatically if volume.zfs.use_refquota is set to true on the storage pool as that happens internally automatically and so isn't a change that should get reverted by Terraform.

An example would be:

1) Pool has volume.zfs.use_refquota=true set, user asks to create a new volume and only specifies size=50GiB. That should create the volume which will then have Incus automatically set zfs.use_refquota=true. Running Terraform again should have it consider everything to be clean.

2) Pool doesn't have volume.zfs.use_refquota set at all, user asks to create a new volume and only specifies size=50GiB. That should still all be clean. Then the user manually sets zfs.use_refquota=true on the volume. Terraform should pick that up as a manual change that needs to be reverted as the pool doesn't have volume.zfs.use_refquota=true set on it and the user didn't ask for the volume to have zfs.use_refquota set.

maveonair commented 2 months ago

@stgraber Please have a look at https://github.com/lxc/terraform-provider-incus/pull/68 and I would be grateful if you could test whether it works as you have described.

lxc / terraform-provider-incus

The provider doesn't seem to understand that volumes inherit from `volume.XYZ` keys on the pool #43