juju / terraform-provider-juju

A Terraform provider for Juju
Apache License 2.0
19 stars 37 forks source link

Provider produced inconsistent result when using "placement" #376

Open tllano11 opened 8 months ago

tllano11 commented 8 months ago

Description

When applying changes to juju_application.ceph_mon for a Ceph deployment using Terraform, the provider "provider["registry.terraform.io/juju/juju"]" produced an unexpected new value in the .placement attribute. Specifically, it was cty.StringVal("1,0,2"), but after the apply, it changed to cty.StringVal("0"):

╷
│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to juju_application.ceph_mon, provider "provider[\"registry.terraform.io/juju/juju\"]" produced an
│ unexpected new value: .placement: was cty.StringVal("1,0,2"), but now cty.StringVal("0").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵

The particular use case here is deploying ceph on virsh VMs managed through MAAS and juju. First, MAAS is configured manually in the VM host, then a MAAS cloud is configured in juju using a virsh VM as the controller. Afterwards, terraform is used to:

  1. link the VM host to MAAS so that VMs can be composed through MAAS.
  2. compose VMs in MAAS to use as ceph OSDs and Monitors (assuming each VM runs an OSD and a Monitor).
  3. create a model in juju for the ceph deployment and add the VMs to such model.
  4. deploy ceph charms and integrate as appropriate.

This problem occurs in step 4, when coordinating the deployment of ceph-mon units on the same machines as ceph-osd units. the pattern is:

resource "juju_machine" "ceph_osd" {
  // ... (omitted for brevity)
  count = local.num_osds
}
resource "juju_application" "ceph_osd" {
  // ... (omitted for brevity)
  units = local.num_osds
}
resource "juju_application" "ceph_mon" {
  // ... (omitted for brevity)
  placement = join(",", juju_machine.ceph_osd[*].machine_id)
}

Urgency

Casually reporting

Terraform Juju Provider version

0.10.0

Terraform version

v1.6.6

Terraform Configuration(s)

terraform {
  required_providers {
    maas = {
      source = "registry.terraform.io/maas/maas"
      version = "~>1.0"
    }
    juju = {
      version = "~> 0.10.0"
      source  = "registry.terraform.io/juju/juju"
    }
  }
}

variable "maas_endpoint" { type = string }
variable "maas_api_key" { type = string }
variable "maas_kvm_host" { type = string }
variable "juju_maas_cloud_name" { type = string }

provider "maas" {
  api_version = "2.0"
  api_key = var.maas_api_key
  api_url = var.maas_endpoint
}
provider "juju" {}

locals { 
    num_osds = 3
    ceph_osd_tag = "osd"
}

resource "maas_vm_host_machine" "ceph_osd" {
  count = local.num_osds
  hostname = "ceph-osd${count.index}"
  vm_host = var.maas_kvm_host
  cores = 2
  memory = 2048
  storage_disks {
    size_gigabytes = 15
  }
  storage_disks {
    size_gigabytes = 15
  }
}

resource "maas_tag" "osd" {
  name = local.ceph_osd_tag
  machines = maas_vm_host_machine.ceph_osd[*].id
}

resource "juju_model" "ceph" {
  name = "ceph"
  cloud {
    name   = var.juju_maas_cloud_name
  }
  config = {
    logging-config              = "<root>=INFO"
    development                 = false
    no-proxy                    = "127.0.0.1,localhost,::1"
    update-status-hook-interval = "5m"
  }
}

resource "juju_machine" "ceph_osd" {
  count       = local.num_osds
  model       = juju_model.ceph.name
  base        = "ubuntu@22.04"
  name        = "ceph-osd${count.index}"
  constraints = "tags=${maas_tag.osd.name}"
}

resource "juju_application" "ceph_osd" {
  name = "ceph-osd"
  model = juju_model.ceph.name
  charm {
    name     = "ceph-osd"
    channel  = "quincy/stable"
  }
  config = {
    osd-devices = "/dev/vdb"
  }
  units = local.num_osds
  placement = join(",",juju_machine.ceph_osd[*].machine_id)
}
resource "juju_application" "ceph_mon" {
  name = "ceph-mon"
  model = juju_model.ceph.name
  charm {
    name     = "ceph-mon"
    channel  = "quincy/stable"
  }
  units = local.num_osds
  placement = join(",",juju_machine.ceph_osd[*].machine_id)
}
resource "juju_integration" "ceph" {
  model = juju_model.ceph.name
  application {
    name     = juju_application.ceph_osd.name
    endpoint = "mon"
  }
  application {
    name     = juju_application.ceph_mon.name
    endpoint = "osd"
  }
}

Reproduce / Test

  1. Install and config libvirt. Then, create a virsh network for the VMs.
    cat <<EOF > maas-net.xml
    <network>
     <name>maas-ceph</name>
     <forward mode='nat'>
       <nat>
         <port start='1024' end='65535'/>
       </nat>
     </forward>
     <dns enable="no" />
     <bridge name='virbr1' stp='off' delay='0'/>
     <domain name='testnet'/>
     <ip address='192.110.99.1' netmask='255.255.255.0'>
     </ip>
    </network>
    EOF
    virsh net-define maas-net.xml
    virsh net-start maas-ceph
  2. Install and manually configure MAAS on the VM host.
  3. Setup a KVM host and create a VM to serve as the juju controller (don't deploy).
  4. Configure a MAAS cloud in Juju using the Virsh VM as the controller.
  5. (optional) Lock the juju controller in MAAS.
  6. Run terraform: terraform init && terraform apply

Debug/Panic Output

maas_vm_host_machine.ceph_osd[1]: Creating...
maas_vm_host_machine.ceph_osd[2]: Creating...
maas_vm_host_machine.ceph_osd[0]: Creating...
juju_model.ceph: Creating...
juju_model.ceph: Creation complete after 0s [id=2af167dd-374f-45f6-88e4-9f051a0725d9]
maas_vm_host_machine.ceph_osd[1]: Still creating... [10s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [10s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [10s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [20s elapsed]
maas_vm_host_machine.ceph_osd[1]: Still creating... [20s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [20s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [30s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [30s elapsed]
maas_vm_host_machine.ceph_osd[1]: Still creating... [30s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [40s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [40s elapsed]
maas_vm_host_machine.ceph_osd[1]: Still creating... [40s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [50s elapsed]
maas_vm_host_machine.ceph_osd[1]: Still creating... [50s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [50s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [1m0s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [1m0s elapsed]
maas_vm_host_machine.ceph_osd[1]: Still creating... [1m0s elapsed]
maas_vm_host_machine.ceph_osd[1]: Still creating... [1m10s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [1m10s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [1m10s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [1m20s elapsed]
maas_vm_host_machine.ceph_osd[1]: Still creating... [1m20s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [1m20s elapsed]
maas_vm_host_machine.ceph_osd[1]: Still creating... [1m30s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [1m30s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [1m30s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [1m40s elapsed]
maas_vm_host_machine.ceph_osd[1]: Still creating... [1m40s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [1m40s elapsed]
maas_vm_host_machine.ceph_osd[2]: Still creating... [1m50s elapsed]
maas_vm_host_machine.ceph_osd[1]: Still creating... [1m50s elapsed]
maas_vm_host_machine.ceph_osd[0]: Still creating... [1m50s elapsed]
maas_vm_host_machine.ceph_osd[0]: Creation complete after 1m57s [id=c3dp8r]
maas_vm_host_machine.ceph_osd[2]: Creation complete after 1m59s [id=gf6ypf]
maas_vm_host_machine.ceph_osd[1]: Still creating... [2m0s elapsed]
maas_vm_host_machine.ceph_osd[1]: Creation complete after 2m5s [id=tmwf63]
maas_tag.osd: Creating...
maas_tag.osd: Creation complete after 1s [id=osd]
juju_machine.ceph_osd[0]: Creating...
juju_machine.ceph_osd[1]: Creating...
juju_machine.ceph_osd[2]: Creating...
juju_machine.ceph_osd[1]: Creation complete after 0s [id=ceph:0:ceph-osd1]
juju_machine.ceph_osd[2]: Creation complete after 0s [id=ceph:1:ceph-osd2]
juju_machine.ceph_osd[0]: Creation complete after 0s [id=ceph:2:ceph-osd0]
juju_application.ceph_osd: Creating...
juju_application.ceph_mon: Creating...
╷
│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to juju_application.ceph_mon, provider "provider[\"registry.terraform.io/juju/juju\"]" produced an
│ unexpected new value: .placement: was cty.StringVal("2,0,1"), but now cty.StringVal("0").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵
╷
│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to juju_application.ceph_osd, provider "provider[\"registry.terraform.io/juju/juju\"]" produced an
│ unexpected new value: .placement: was cty.StringVal("2,0,1"), but now cty.StringVal("").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.

Notes & References

The OSDs and Monitors are depoyed successfully, but the integration is skipped due to errors reported by the juju_application resource. The output of juju switch ceph && juju status is as follows:

Model  Controller          Cloud/Region        Version  SLA          Timestamp
ceph   maas_cloud-default  maas_cloud/default  3.1.7    unsupported  13:21:58-05:00

App       Version  Status   Scale  Charm     Channel        Rev  Exposed  Message
ceph-mon  17.2.6   blocked      3  ceph-mon  quincy/stable  195  no       Missing relation: OSD
ceph-osd  17.2.6   blocked      3  ceph-osd  quincy/stable  576  no       Missing relation: monitor

Unit         Workload  Agent  Machine  Public address  Ports  Message
ceph-mon/0*  blocked   idle   0        192.110.99.3           Missing relation: OSD
ceph-mon/1   blocked   idle   1        192.110.99.2           Missing relation: OSD
ceph-mon/2   blocked   idle   2        192.110.99.4           Missing relation: OSD
ceph-osd/0*  blocked   idle   0        192.110.99.3           Missing relation: monitor
ceph-osd/1   blocked   idle   1        192.110.99.2           Missing relation: monitor
ceph-osd/2   blocked   idle   2        192.110.99.4           Missing relation: monitor

Machine  State    Address       Inst id    Base          AZ       Message
0        started  192.110.99.3  ceph-osd0  ubuntu@22.04  default  Deployed
1        started  192.110.99.2  ceph-osd1  ubuntu@22.04  default  Deployed
2        started  192.110.99.4  ceph-osd2  ubuntu@22.04  default  Deployed
cderici commented 8 months ago

@tllano11 Thanks for opening this, could you please reduce the plan into something smaller and easy to reproduce to make things go faster, thanks!

tllano11 commented 8 months ago

Thanks for the quick reply @cderici . I just updated the terraform plan in the bug description and included some debug output as well. I tried to reduce the plan as much as I could. Let me know if there is anything else you need me to add or modify.

hmlanigan commented 8 months ago

This issue does not appear to be a duplicate of #182. In this case, there are 3 placement values, but only 1 value is put back in after deploy.

tllano11 commented 8 months ago

I also noticed that sometimes no value is put back at all. In the Debug/Panic Output section of the issue description, the last error shows this.

marcoppenheimer commented 7 months ago

Adding to this, also getting this error. Any idea when a fix might be released to the provider?

Terraform - https://github.com/canonical/kafka-bundle/tree/main/terraform/dev Command that deploys it - https://github.com/canonical/kafka-bundle/blob/38ab7c3d5df4465c3c4ff4b71994ab709aa11962/tests/integration/terraform/test_terraform.py#L30-L64 Failing messages - https://pastebin.com/1kUgVSv8

phvalguima commented 6 months ago

I can reproduce this same issue without using "placement", e.g.:

Using Juju provider: v0.10.1 Terraform: v1.7.4

resource "juju_application" "opensearch" {
    name = "opensearch"
    model = var.model_name
    charm {
        name = "opensearch"
        channel = var.opensearch_channel
        base = var.opensearch_base
    }

    constraints = join(" ", [
      for k,v in {
          "instance-type" = var.opensearch_constraints.instance_type
          "root-disk" = var.opensearch_constraints.root-disk
          "spaces" = var.opensearch_constraints.spaces
      } : "${k}=${v}"
    ])

    units = var.opensearch_count
}

Results in:

│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to module.opensearch.juju_application.opensearch, provider "provider[\"registry.terraform.io/juju/juju\"].aws-juju" produced an
│ unexpected new value: .constraints: was cty.StringVal("instance-type=c6a.xlarge root-disk=100G spaces=internal-space"), but now cty.StringVal("arch=amd64
│ instance-type=c6a.xlarge root-disk=102400M spaces=internal-space").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
hmlanigan commented 5 months ago

An update has been added to 0.11.0 for placements. All machines should be written when deploying. There is still a problem with order to fix. I have some work in progress here.

chrmel commented 1 month ago

Unfortunately this issue persists in version 0.13.0

juju_application.test: Creating...
╷
│ Error: Provider produced inconsistent result after apply
│ 
│ When applying changes to juju_application.test, provider "provider[\"registry.terraform.io/juju/juju\"]" produced an unexpected new value: .placement: was cty.StringVal("1,2"), but now
│ cty.StringVal("1").
│ 
│ This is a bug in the provider, which should be reported in the provider's own issue tracker.
╵