Open robgjertsen1 opened 1 year ago
Not sure how I missed this issue. Suggest using markdown code format while pasting console logs.
Coming back to the root cause the main line that shows the reason for recreating the nfs disk(module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]
):
~ volume_type = "v7kamp.rch.stglabs.ibm.com base template" -> "6327
2fa4-2a99-4a94-ab1e-2a12fb64b1f8" # forces replacement
Seems the terraform provider for openstack is returning the storage template ID when querying the service. Which detects there is a change in the template for you as shown above. We have not used this feature recently but seems something is changed recently where only ID will work.
As a workaround please set variable volume_storage_template
to a value "63272fa4-2a99-4a94-ab1e-2a12fb64b1f8"
and run apply. This should not detect forced replacement change.
I've tried this 3 times and it fails each time. Initially I installed a cluster and was running workload on it and it was OK. I only saw issues once I tried to remove the bootstrap node and seeing issues accessing pvcs. The bootstrap remove got hung up (removed node from PowerVC and but stuck later on). An odd problem with NFS where IOs were hung but not an obvious issue with physical storage. Then I tried to remove bootstrap node immediately after recreating the cluster. This also resulted in issues where NFS filesystem wasn't mounted, and yet another issue where again the terraform execution was stuck after removing bootstrap node from PowerVC (yet NFS mount was OK here).
Here are some details below with last attempt. We are stuck in the gathering facts task for the ansible ocp4-helpernode playbook
Output from terraform:
====================================================================================================== $ terraform apply -var-file var.tfvars module.workernodes.data.ignition_file.w_hostname[0]: Reading... module.bootstrapnode.data.ignition_file.b_hostname: Reading... module.masternodes.data.ignition_file.m_hostname[0]: Reading... module.bootstrapnode.data.ignition_file.b_hostname: Read complete after 0s [id=1 ec8928da9e89f9b35deb26dd484665fda91d99d73e31330dce71edf3a4e19cc] module.masternodes.data.ignition_file.m_hostname[0]: Read complete after 0s [id= 7551bfa9e87523c711bf18607b8af5ccfee1657ea6c4817bbc3dd2186602f590] module.workernodes.data.ignition_file.w_hostname[0]: Read complete after 0s [id= 28b9dcc333049039879c9c1e94f95816f0341945047e8ae59674e1233f72be83] module.workernodes.data.openstack_compute_flavor_v2.worker: Reading... module.masternodes.data.openstack_compute_flavor_v2.master: Reading... module.bootstrapnode.data.openstack_compute_flavor_v2.bootstrap: Reading... module.bastion.openstack_compute_keypair_v2.key-pair[0]: Refreshing state... [id =merlin2-keypair] module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Refreshing st ate... [id=317b2360-639d-4d8a-8b34-a58f1bb19ee9] module.bastion.data.openstack_compute_flavor_v2.bastion: Reading... module.network.data.openstack_networking_network_v2.network: Reading... module.network.data.openstack_networking_network_v2.network: Read complete after 2s [id=f5e55ae3-c790-4a29-91e1-ce04a1acfc69] module.network.data.openstack_networking_subnet_v2.subnet: Reading... module.workernodes.data.openstack_compute_flavor_v2.worker: Read complete after 2s [id=1e5b0eed-6681-4305-8bc9-e20afb9f7cca] module.bootstrapnode.data.openstack_compute_flavor_v2.bootstrap: Read complete a fter 2s [id=874b188b-074a-4042-b0c8-3a22f04f8302] module.masternodes.data.openstack_compute_flavor_v2.master: Read complete after 2s [id=d364331a-9f24-4784-bced-3765e0c097ed] module.bastion.data.openstack_compute_flavor_v2.bastion: Read complete after 2s [id=874b188b-074a-4042-b0c8-3a22f04f8302] module.network.data.openstack_networking_subnet_v2.subnet: Read complete after 0 s [id=63011d28-987a-4ae1-a094-595f2e513a23] module.network.openstack_networking_port_v2.bastion_port[0]: Refreshing state... [id=91a5c711-f109-4ec0-91e7-86cd821233cc] module.network.openstack_networking_port_v2.bootstrap_port[0]: Refreshing state. .. [id=7072aba5-ac95-4b36-994a-1855f2624b55] module.bastion.openstack_compute_instance_v2.bastion[0]: Refreshing state... [id =f874bcaf-e8d5-46f6-8088-652ee3b9930a] module.network.openstack_networking_port_v2.master_port[0]: Refreshing state... [id=6d0e1c9a-aa11-48c3-80cd-e22c2cbe8abe] module.network.openstack_networking_port_v2.worker_port[0]: Refreshing state... [id=3a683058-4670-4f2a-a701-fc21e56142de] module.bastion.null_resource.bastion_init[0]: Refreshing state... [id=5535521664 652524244] module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Refreshin g state... [id=f874bcaf-e8d5-46f6-8088-652ee3b9930a/317b2360-639d-4d8a-8b34-a58f 1bb19ee9] module.bastion.null_resource.bastion_register[0]: Refreshing state... [id=390765 5651701511596] module.bastion.null_resource.enable_repos[0]: Refreshing state... [id=8446503032 307780766] module.bastion.null_resource.bastion_packages[0]: Refreshing state... [id=756303 5921889930989] module.bastion.null_resource.setup_nfs_disk[0]: Refreshing state... [id=57837008 01307001475] module.workernodes.data.ignition_config.worker[0]: Reading... module.bootstrapnode.data.ignition_config.bootstrap: Reading... module.workernodes.data.ignition_config.worker[0]: Read complete after 0s [id=85 d98bf1d766507417ab5b578be1abe6f3e6c0a80e57a931862b80f5ff8b4153] module.masternodes.data.ignition_config.master[0]: Reading... module.helpernode.null_resource.config: Refreshing state... [id=3876494058890088 587] module.masternodes.data.ignition_config.master[0]: Read complete after 0s [id=7a 035ac3f88d415956417f73f6ecd986a9d339cdbbea088f5332e0cd8a46de94] module.bootstrapnode.data.ignition_config.bootstrap: Read complete after 0s [id= 87f77fe2ea79f17615628f4222c5676d8c8062883faa6236a4ef9d6087f86729] module.installconfig.null_resource.pre_install[0]: Refreshing state... [id=14174 00108243665749] module.installconfig.null_resource.install_config: Refreshing state... [id=46832 85385320449241] module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Refreshing stat e... [id=2c289dad-9552-4037-8105-f798406ff623] module.bootstrapconfig.null_resource.bootstrap_config: Refreshing state... [id=6 822641950134297211] module.masternodes.openstack_compute_instance_v2.master[0]: Refreshing state... [id=0b820558-2077-40ee-81d9-811aa7dbc6d0] module.bootstrapcomplete.null_resource.bootstrap_complete: Refreshing state... [ id=285966427274519477] module.workernodes.openstack_compute_instance_v2.worker[0]: Refreshing state... [id=78da4c4f-d882-49c0-9e6b-94100492be63] module.workernodes.null_resource.remove_worker[0]: Refreshing state... [id=17321 02747293394534] module.install.null_resource.install: Refreshing state... [id=287553626659034692 8] module.install.null_resource.upgrade[0]: Refreshing state... [id=871900455648840 9504]
Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
Terraform will perform the following actions:
module.bastion.openstack_blockstorage_volume_v3.storage_volume[0] must be re
placed -/+ resource "openstack_blockstorage_volume_v3" "storage_volume" { ~ attachment = [
"volume_wwn" = "60050768028105F5D0000000000002D4" } -> (known after apply) name = "merlin2-nfs-storage-vol"
(1 unchanged attribute hidden)
}
module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0] must b
e replaced -/+ resource "openstack_compute_volume_attach_v2" "storage_v_attach" { ~ device = "/dev/sdb" -> (known after apply) ~ id = "f874bcaf-e8d5-46f6-8088-652ee3b9930a/317b2360-639d-4d8a-8 b34-a58f1bb19ee9" -> (known after apply)
(1 unchanged attribute hidden)
}
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0] will be dest
royed
(because index [0] is out of range for count)
"original_host" = "837542A_10C5EDW" } -> null
all_tags = [] -> null
availability_zone = "Default Group" -> null
created = "2023-05-01 21:09:11 +0000 UTC" -> null
flavor_id = "874b188b-074a-4042-b0c8-3a22f04f8302" -> null
flavor_name = "bastion_bootstrap" -> null
force_delete = false -> null
id = "2c289dad-9552-4037-8105-f798406ff623" -> null
image_id = "a518c74e-cd80-4c67-8724-15b2720b2108" -> null
image_name = "rhcos-new" -> null
name = "merlin2-bootstrap" -> null
power_state = "active" -> null
security_groups = [] -> null
stop_before_destroy = false -> null
updated = "2023-05-01 22:04:38 +0000 UTC" -> null
user_data = "eb7b092f153c6094e6202339c2b0ef36dbc518fd" -> null
network {
uuid = "f5e55ae3-c790-4a29-91e1-ce04a1acfc69" -> null } }
module.helpernode.null_resource.config must be replaced
-/+ resource "null_resource" "config" { ~ id = "3876494058890088587" -> (known after apply) ~ triggers = { # forces replacement ~ "bootstrap_count" = "1" -> "0"
(2 unchanged elements hidden)
}
module.network.openstack_networking_port_v2.bootstrap_port[0] will be destro
yed
(because index [0] is out of range for count)
"9.5.36.167", ] -> null
all_security_group_ids = [] -> null
all_tags = [] -> null
device_id = "2c289dad-9552-4037-8105-f798406ff623" -> null
device_owner = "compute:Default Group" -> null
dns_assignment = [] -> null
id = "7072aba5-ac95-4b36-994a-1855f2624b55" -> null
mac_address = "fa:16:3e:5c:1d:b7" -> null
name = "merlin2-bootstrap-port" -> null
network_id = "f5e55ae3-c790-4a29-91e1-ce04a1acfc69" -> null
port_security_enabled = false -> null
tags = [] -> null
tenant_id = "e4af56f8139e4418abcb29c723bf15a9" -> null
binding {
vnic_type = "normal" -> null }
Plan: 3 to add, 0 to change, 5 to destroy.
Changes to Outputs: ~ bootstrap_ip = "9.5.36.167" -> ""
Do you want to perform these actions? Terraform will perform the actions described above. Only 'yes' will be accepted to approve.
Enter a value: yes
module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Destroying... [ id=2c289dad-9552-4037-8105-f798406ff623] module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Still destroyin g... [id=2c289dad-9552-4037-8105-f798406ff623, 10s elapsed] module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Still destroyin g... [id=2c289dad-9552-4037-8105-f798406ff623, 20s elapsed] module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Still destroyin g... [id=2c289dad-9552-4037-8105-f798406ff623, 30s elapsed] module.bootstrapnode.openstack_compute_instance_v2.bootstrap[0]: Destruction com plete after 34s module.helpernode.null_resource.config: Destroying... [id=3876494058890088587] module.helpernode.null_resource.config: Destruction complete after 0s module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Destroyin g... [id=f874bcaf-e8d5-46f6-8088-652ee3b9930a/317b2360-639d-4d8a-8b34-a58f1bb19e e9] module.network.openstack_networking_port_v2.bootstrap_port[0]: Destroying... [id =7072aba5-ac95-4b36-994a-1855f2624b55] module.network.openstack_networking_port_v2.bootstrap_port[0]: Destruction compl ete after 7s module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Destructi on complete after 9s module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Destroying... [id=317b2360-639d-4d8a-8b34-a58f1bb19ee9] module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Still destroy ing... [id=317b2360-639d-4d8a-8b34-a58f1bb19ee9, 10s elapsed] module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Destruction c omplete after 11s module.helpernode.null_resource.config: Creating... module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Creating... module.helpernode.null_resource.config: Provisioning with 'remote-exec'... module.helpernode.null_resource.config (remote-exec): Connecting to remote host via SSH... module.helpernode.null_resource.config (remote-exec): Host: 9.5.36.166 module.helpernode.null_resource.config (remote-exec): User: root module.helpernode.null_resource.config (remote-exec): Password: false module.helpernode.null_resource.config (remote-exec): Private key: true module.helpernode.null_resource.config (remote-exec): Certificate: false module.helpernode.null_resource.config (remote-exec): SSH Agent: false module.helpernode.null_resource.config (remote-exec): Checking Host Key: false module.helpernode.null_resource.config (remote-exec): Target Platform: unix module.helpernode.null_resource.config (remote-exec): Connected! module.helpernode.null_resource.config (remote-exec): Cloning into ocp4-helperno de... module.helpernode.null_resource.config (remote-exec): Note: switching to 'adb110 2f64b2f25a8a1b44a96c414f293d72d3fc'.
module.helpernode.null_resource.config (remote-exec): You are in 'detached HEAD' state. You can look around, make experimental module.helpernode.null_resource.config (remote-exec): changes and commit them, a nd you can discard any commits you make in this module.helpernode.null_resource.config (remote-exec): state without impacting an y branches by switching back to a branch.
module.helpernode.null_resource.config (remote-exec): If you want to create a ne w branch to retain commits you create, you may module.helpernode.null_resource.config (remote-exec): do so (now or later) by us ing -c with the switch command. Example:
module.helpernode.null_resource.config (remote-exec): git switch -c <new-branc h-name>
module.helpernode.null_resource.config (remote-exec): Or undo this operation wit h:
module.helpernode.null_resource.config (remote-exec): git switch -
module.helpernode.null_resource.config (remote-exec): Turn off this advice by se tting config variable advice.detachedHead to false
module.helpernode.null_resource.config (remote-exec): HEAD is now at adb1102 Mer ge pull request #305 from redhat-cop/devel module.helpernode.null_resource.config: Provisioning with 'file'... module.helpernode.null_resource.config: Still creating... [10s elapsed] module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Still creatin g... [10s elapsed] module.helpernode.null_resource.config: Provisioning with 'file'... module.bastion.openstack_blockstorage_volume_v3.storage_volume[0]: Creation comp lete after 12s [id=35ba1876-52b6-4769-9950-eaf3be077eaa] module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Creating. .. module.helpernode.null_resource.config: Provisioning with 'file'... module.helpernode.null_resource.config: Provisioning with 'remote-exec'... module.helpernode.null_resource.config (remote-exec): Connecting to remote host via SSH... module.helpernode.null_resource.config (remote-exec): Host: 9.5.36.166 module.helpernode.null_resource.config (remote-exec): User: root module.helpernode.null_resource.config (remote-exec): Password: false module.helpernode.null_resource.config (remote-exec): Private key: true module.helpernode.null_resource.config (remote-exec): Certificate: false module.helpernode.null_resource.config (remote-exec): SSH Agent: false module.helpernode.null_resource.config (remote-exec): Checking Host Key: false module.helpernode.null_resource.config (remote-exec): Target Platform: unix module.helpernode.null_resource.config (remote-exec): Connected! module.bastion.openstack_compute_volume_attach_v2.storage_v_attach[0]: Creation complete after 7s [id=f874bcaf-e8d5-46f6-8088-652ee3b9930a/35ba1876-52b6-4769-99 50-eaf3be077eaa] module.helpernode.null_resource.config (remote-exec): Running ocp4-helpernode pl aybook... module.helpernode.null_resource.config: Still creating... [20s elapsed] module.helpernode.null_resource.config (remote-exec): Using /root/ocp4-helpernod e/ansible.cfg as config file
module.helpernode.null_resource.config (remote-exec): PLAY [all] ***
module.helpernode.null_resource.config (remote-exec): TASK [Gathering Facts] ***
module.helpernode.null_resource.config: Still creating... [30s elapsed] ...
module.helpernode.null_resource.config: Still creating... [17h6m25s elapsed]
======================================================================================================
Initiating node info:
$ ps -ef | grep terraform gjertsen 3410758 12015 0 May01 pts/1 00:08:32 terraform apply -var-file va r.tfvars gjertsen 3411154 3410758 0 May01 pts/1 00:00:03 .terraform/providers/registr y.terraform.io/hashicorp/null/3.2.1/linux_amd64/terraform-provider-null_v3.2.1_x 5
======================================================================================================
bastion node state:
ps -ef | grep ansible
root 67764 67738 7 May01 pts/1 01:21:07 /usr/libexec/platform-python /usr/bin/ansible-playbook -i inventory -e @helpernode_vars.yaml tasks/main.yml -v --become root 67771 67764 0 May01 pts/1 00:00:00 /usr/libexec/platform-python /usr/bin/ansible-playbook -i inventory -e @helpernode_vars.yaml tasks/main.yml -v --become root 67782 1 0 May01 ? 00:00:00 ssh: /root/.ansible/cp/08610c3669 [mux] root 67890 67771 0 May01 pts/1 00:00:00 ssh -C -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User="root" -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/08610c3669 -tt 9.5.36.166 /bin/sh -c '/usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1682979218.3504968-67771-94255852006671/AnsiballZ_setup.py && sleep 0' root 67891 67783 0 May01 pts/3 00:00:00 /bin/sh -c /usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1682979218.3504968-67771-94255852006671/AnsiballZ_setup.py && sleep 0 root 67912 67891 0 May01 pts/3 00:00:04 /usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1682979218.3504968-67771-94255852006671/AnsiballZ_setup.py
NFS mount looks OK
exportfs
/export
ls -al /export
total 0 drwxrwxrwx. 3 nobody nobody 92 May 1 17:41 . dr-xr-xr-x. 19 root root 259 May 1 17:06 .. drwxrwxrwx. 2 nobody nobody 6 May 1 17:41 openshift-image-registry-registry-pvc-pvc-5b20c6ca-b184-41eb-b145-c5253c26015a