Open jingvar opened 3 years ago
Hi @jingvar. Can you provide more information about the error? Which task fails?
I'm not sure about my first batch, but now I have
TASK [Wait for the ironic node to become active] ******************************************************************************************************************************************************************************************* FAILED - RETRYING: Wait for the ironic node to become active (60 retries left).
fatal: [controller0 -> {{ hostvars[seed_host].ansible_host | default(seed_host) }}]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["docker", "exec", "bifrost_deploy", "bash", "-c", " export OS_CLOUD=bifrost && export OS_BAREMETAL_API_VERSION=1.34 && export BIFROST_INVENTORY_SOURCE=ironic && ansible baremetal --connection local --inventory /etc/bifrost/inventory/ -e @/etc/bifrost/bifrost.yml -e @/etc/bifrost/dib.yml --limit controller0 -m command -a \"baremetal node show {{ inventory_hostname }} -f value -c provision_state\""], "delta": "0:00:12.900611", "end": "2021-05-31 17:23:39.951354", "rc": 0, "start": "2021-05-31 17:23:27.050743", "stderr": "", "stderr_lines": [], "stdout": "controller0 | CHANGED | rc=0 >>\nwait call-back", "stdout_lines": ["controller0 | CHANGED | rc=0 >>", "wait call-back"]} fatal: [compute0 -> {{ hostvars[seed_host].ansible_host | default(seed_host) }}]: FAILED! => {"attempts": 60, "changed": false, "cmd": ["docker", "exec", "bifrost_deploy", "bash", "-c", " export OS_CLOUD=bifrost && export OS_BAREMETAL_API_VERSION=1.34 && export BIFROST_INVENTORY_SOURCE=ironic && ansible baremetal --connection local --inventory /etc/bifrost/inventory/ -e @/etc/bifrost/bifrost.yml -e @/etc/bifrost/dib.yml --limit compute0 -m command -a \"baremetal node show {{ inventory_hostname }} -f value -c provision_state\""], "delta": "0:00:13.568618", "end": "2021-05-31 17:23:41.981203", "rc": 0, "start": "2021-05-31 17:23:28.412585", "stderr": "", "stderr_lines": [], "stdout": "compute0 | CHANGED | rc=0 >>\nwait call-back", "stdout_lines": ["compute0 | CHANGED | rc=0 >>", "wait call-back"]}
(bifrost-deploy)[root@seed bifrost-9.0.2.dev21]# baremetal node list +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+ | 6ced105b-8119-4910-907b-126faca79bf1 | compute0 | None | power on | wait call-back | False | | 52b05c6c-10e6-4255-ae6b-63e74ca944c7 | controller0 | None | power on | wait call-back | False | +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+
{"commands": [{"id": "ab0ec511-a31b-4a4b-8ac2-873072a9a757", "command_name": "get_deploy_steps", "command_params": {"node": {"id": 4, "uuid": "52b05c6c-10e6-4255-ae6b-63e74ca944c7", "name": "controller0", "chassis_id": null, "instance_uuid": null, "driver": "ipmi", "driver_info": {"ipmi_address": "192.168.33.4", "ipmi_port": "6230", "ipmi_username": "username", "ipmi_password": "******", "deploy_kernel": "http://192.168.33.5:8080/ipa.kernel", "deploy_ramdisk": "http://192.168.33.5:8080/ipa.initramfs"}, "driver_internal_info": {"deploy_boot_mode": "bios", "last_power_state_change": "2021-05-31T16:53:26.807751", "agent_secret_token": "******", "agent_url": "https://192.168.33.169:9999", "agent_version": "6.4.4.dev17", "agent_last_heartbeat": "2021-05-31T17:00:02.167829", "agent_verify_ca": "/var/lib/ironic/certificates/52b05c6c-10e6-4255-ae6b-63e74ca944c7.crt", "is_whole_disk_image": true, "deploy_steps": [{"step": "deploy", "priority": 100, "argsinfo": null, "interface": "deploy"}, {"step": "write_image", "priority": 80, "argsinfo": null, "interface": "deploy"}, {"step": "prepare_instance_boot", "priority": 60, "argsinfo": null, "interface": "deploy"}, {"step": "tear_down_agent", "priority": 40, "argsinfo": null, "interface": "deploy"}, {"step": "switch_to_tenant_network", "priority": 30, "argsinfo": null, "interface": "deploy"}, {"step": "boot_instance", "priority": 20, "argsinfo": null, "interface": "deploy"}], "deploy_step_index": 0}, "clean_step": {}, "deploy_step": {"step": "deploy", "priority": 100, "argsinfo": null, "interface": "deploy"}, "raid_config": {}, "target_raid_config": {}, "instance_info": {"image_checksum": "96fe772c5df8d0422c3dac67d58749ae", "image_disk_format": "qcow2", "image_source": "http://192.168.33.5:8080/deployment_image.qcow2", "configdrive": "******", "image_url": "http://192.168.33.5:8080/deployment_image.qcow2", "image_type": "whole-disk-image"}, "properties": {"cpu_arch": "x86_64", "cpus": "4", "memory_mb": "8192", "local_gb": 22, "capabilities": "cpu_aes:true,cpu_hugepages:true,cpu_hugepages_1g:true,boot_option:local", "root_device": {}, "vendor": "unknown"}, "reservation": "seed", "conductor_affinity": 1, "conductor_group": "", "power_state": "power on", "target_power_state": null, "provision_state": "deploying", "provision_updated_at": "2021-05-31T17:02:11.000000", "target_provision_state": "active", "maintenance": false, "maintenance_reason": null, "fault": null, "console_enabled": false, "last_error": null, "resource_class": "test-rc", "inspection_finished_at": null, "inspection_started_at": "2021-05-31T16:53:10.000000", "extra": {"pxe_interface_mac": "52:54:00:ff:90:2c", "system_vendor": {"manufacturer": "Red Hat", "product_name": "KVM"}}, "automated_clean": null, "protected": false, "protected_reason": null, "allocation_id": null, "bios_interface": "no-bios", "boot_interface": "ipxe", "console_interface": "no-console", "deploy_interface": "direct", "inspect_interface": "inspector", "management_interface": "ipmitool", "network_interface": "noop", "power_interface": "ipmitool", "raid_interface": "no-raid", "rescue_interface": "no-rescue", "storage_interface": "noop", "vendor_interface": "ipmitool", "traits": {"objects": []}, "owner": null, "lessee": null, "description": null, "retired": false, "retired_reason": null, "network_data": {}, "created_at": "2021-05-31T16:50:22.000000", "updated_at": "2021-05-31T17:02:12.541477"}, "ports": [{"id": 4, "uuid": "1bea11a9-4857-4d9d-b19f-61cc4c90e15a", "node_id": 4, "address": "52:54:00:ff:90:2c", "extra": {}, "local_link_connection": {"switch_id": "7a:15:b0:04:74:db", "switch_info": "brtenks0", "port_id": "p-contr0-0-br"}, "portgroup_id": null, "pxe_enabled": true, "internal_info": {}, "physical_network": "physnet1", "is_smartnic": false, "created_at": "2021-05-31T16:50:23.000000", "updated_at": "2021-05-31T16:50:34.000000"}]}, "command_status": "SUCCEEDED", "command_error": null, "command_result": {"deploy_steps": {"GenericHardwareManager": [{"step": "erase_devices_metadata", "priority": 0, "interface": "deploy", "reboot_requested": false}, {"step": "apply_configuration", "priority": 0, "interface": "raid", "reboot_requested": false, "argsinfo": {"raid_config": {"description": "The RAID configuration to apply.", "required": true}, "delete_existing": {"description": "Setting this to 'True' indicates to delete existing RAID configuration prior to creating the new configuration. Default value is 'True'.", "required": false}}}, {"step": "write_image", "priority": 0, "interface": "deploy", "reboot_requested": false}]}, "hardware_manager_version": {"generic_hardware_manager": "1.1"}}}, {"id": "aaadf52b-1550-43ac-b781-6934f00a7b2b", "command_name": "execute_deploy_step", "command_params": {"step": {"interface": "deploy", "step": "write_image", "args": {"image_info": {"id": "deployment_image.qcow2", "urls": ["http://192.168.33.5:8080/deployment_image.qcow2"], "disk_format": "qcow2", "container_format": null, "stream_raw_images": true, "checksum": "96fe772c5df8d0422c3dac67d58749ae", "node_uuid": "52b05c6c-10e6-4255-ae6b-63e74ca944c7"}, "configdrive": "http://192.168.33.5:8080/configdrive-52b05c6c-10e6-4255-ae6b-63e74ca944c7.iso.gz"}}, "node": {"id": 4, "uuid": "52b05c6c-10e6-4255-ae6b-63e74ca944c7", "name": "controller0", "chassis_id": null, "instance_uuid": null, "driver": "ipmi", "driver_info": {"ipmi_address": "192.168.33.4", "ipmi_port": "6230", "ipmi_username": "username", "ipmi_password": "******", "deploy_kernel": "http://192.168.33.5:8080/ipa.kernel", "deploy_ramdisk": "http://192.168.33.5:8080/ipa.initramfs"}, "driver_internal_info": {"deploy_boot_mode": "bios", "last_power_state_change": "2021-05-31T16:53:26.807751", "agent_secret_token": "******", "agent_url": "https://192.168.33.169:9999", "agent_version": "6.4.4.dev17", "agent_last_heartbeat": "2021-05-31T17:00:02.167829", "agent_verify_ca": "/var/lib/ironic/certificates/52b05c6c-10e6-4255-ae6b-63e74ca944c7.crt", "is_whole_disk_image": true, "deploy_steps": [{"step": "deploy", "priority": 100, "argsinfo": null, "interface": "deploy"}, {"step": "write_image", "priority": 80, "argsinfo": null, "interface": "deploy"}, {"step": "prepare_instance_boot", "priority": 60, "argsinfo": null, "interface": "deploy"}, {"step": "tear_down_agent", "priority": 40, "argsinfo": null, "interface": "deploy"}, {"step": "switch_to_tenant_network", "priority": 30, "argsinfo": null, "interface": "deploy"}, {"step": "boot_instance", "priority": 20, "argsinfo": null, "interface": "deploy"}], "deploy_step_index": 1, "hardware_manager_version": {"generic_hardware_manager": "1.1"}, "agent_cached_deploy_steps": {"deploy": [{"step": "erase_devices_metadata", "priority": 0, "interface": "deploy", "reboot_requested": false}, {"step": "write_image", "priority": 0, "interface": "deploy", "reboot_requested": false}], "raid": [{"step": "apply_configuration", "priority": 0, "interface": "raid", "reboot_requested": false, "argsinfo": {"raid_config": {"description": "The RAID configuration to apply.", "required": true}, "delete_existing": {"description": "Setting this to 'True' indicates to delete existing RAID configuration prior to creating the new configuration. Default value is 'True'.", "required": false}}}]}, "agent_cached_deploy_steps_refreshed": "2021-05-31 17:02:17.355828"}, "clean_step": {}, "deploy_step": {"step": "write_image", "priority": 80, "argsinfo": null, "interface": "deploy"}, "raid_config": {}, "target_raid_config": {}, "instance_info": {"image_checksum": "96fe772c5df8d0422c3dac67d58749ae", "image_disk_format": "qcow2", "image_source": "http://192.168.33.5:8080/deployment_image.qcow2", "configdrive": "******", "image_url": "http://192.168.33.5:8080/deployment_image.qcow2", "image_type": "whole-disk-image"}, "properties": {"cpu_arch": "x86_64", "cpus": "4", "memory_mb": "8192", "local_gb": 22, "capabilities": "cpu_aes:true,cpu_hugepages:true,cpu_hugepages_1g:true,boot_option:local", "root_device": {}, "vendor": "unknown"}, "reservation": "seed", "conductor_affinity": 1, "conductor_group": "", "power_state": "power on", "target_power_state": null, "provision_state": "deploying", "provision_updated_at": "2021-05-31T17:02:11.000000", "target_provision_state": "active", "maintenance": false, "maintenance_reason": null, "fault": null, "console_enabled": false, "last_error": null, "resource_class": "test-rc", "inspection_finished_at": null, "inspection_started_at": "2021-05-31T16:53:10.000000", "extra": {"pxe_interface_mac": "52:54:00:ff:90:2c", "system_vendor": {"manufacturer": "Red Hat", "product_name": "KVM"}}, "automated_clean": null, "protected": false, "protected_reason": null, "allocation_id": null, "bios_interface": "no-bios", "boot_interface": "ipxe", "console_interface": "no-console", "deploy_interface": "direct", "inspect_interface": "inspector", "management_interface": "ipmitool", "network_interface": "noop", "power_interface": "ipmitool", "raid_interface": "no-raid", "rescue_interface": "no-rescue", "storage_interface": "noop", "vendor_interface": "ipmitool", "traits": {"objects": []}, "owner": null, "lessee": null, "description": null, "retired": false, "retired_reason": null, "network_data": {}, "created_at": "2021-05-31T16:50:22.000000", "updated_at": "2021-05-31T17:02:17.520232"}, "ports": [{"id": 4, "uuid": "1bea11a9-4857-4d9d-b19f-61cc4c90e15a", "node_id": 4, "address": "52:54:00:ff:90:2c", "extra": {}, "local_link_connection": {"switch_id": "7a:15:b0:04:74:db", "switch_info": "brtenks0", "port_id": "p-contr0-0-br"}, "portgroup_id": null, "pxe_enabled": true, "internal_info": {}, "physical_network": "physnet1", "is_smartnic": false, "created_at": "2021-05-31T16:50:23.000000", "updated_at": "2021-05-31T16:50:34.000000"}], "deploy_version": {"generic_hardware_manager": "1.1"}}, "command_status": "RUNNING", "command_error": null, "command_result": null}]}
Probably after one hour
(bifrost-deploy)[root@seed bifrost-9.0.2.dev21]# baremetal node list +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+ | 6ced105b-8119-4910-907b-126faca79bf1 | compute0 | None | power on | active | False | | 52b05c6c-10e6-4255-ae6b-63e74ca944c7 | controller0 | None | power on | active | False | +--------------------------------------+-------------+---------------+-------------+--------------------+-------------+
`ssh centos@192.168.33.6
Activate the web console with: systemctl enable --now cockpit.socket
[centos@compute0 ~]$`
My env CPU 64 cores, 2.4 GHz Intel(R) Xeon(R) CPU E5-4640
Memory 512 GiB
Storage rotary 7200rpm sata ST2000DM008-2FR1
Moved QCOWs to RAM (tmpfs) - same result
The provisioning timeout (in seconds) may be set via wait_active_timeout
, in any of the .yml files in config/src/kayobe-config/etc/kayobe
Could you try increasing it and let us know if it works. We could update the default if so.
There is another issue - Ironic seems broken.
How is it broken? It looks like it successfully provisioned your nodes (eventually).
I redeployed env. controller0 and compute0 nodes stuck in
^Mboot.ipxe : 404 bytes [script]
^Mpxelinux.cfg/52-54-00-d4-90-8f... ok
^Mhttp://192.168.33.5:8080//b7ac5f3e-4016-47e2-a39b-49fc6d66e4cc/deploy_kernel... ok
^Mhttp://192.168.33.5:8080//b7ac5f3e-4016-47e2-a39b-49fc6d66e4cc/deploy_ramdisk... ok
ESC[2J
(bifrost-deploy)[root@seed bifrost-9.0.2.dev21]# baremetal node list
+--------------------------------------+-------------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+-------------+---------------+-------------+--------------------+-------------+
| 85093588-1e83-4dbc-b777-03502569b12c | compute0 | None | power on | wait call-back | False |
| b7ac5f3e-4016-47e2-a39b-49fc6d66e4cc | controller0 | None | power on | wait call-back | False |
+--------------------------------------+-------------+---------------+-------------+--------------------+-------------+
kayobe overcloud provision got failed virsh destroy controller0 virsh start controller0
^Mhttp://192.168.33.5:8080/ipa.initramfs... ok
ESC[2JLinux version 5.10.3-tinycore64 (root@box) (gcc (GCC) 10.2.0, GNU ld (GNU Binutils) 2.35.1) #2021 SMP Mon Dec 28 16:17:51 UTC 2020
Command line: ipa-inspection-callback-url=http://192.168.33.5:5050/v1/continue ipa-api-url=http://192.168.33.5:6385 systemd.journald.forward_to_console=yes BOOTIF=52:54:00:d4:90:8f nofb nomodeset vga=normal console=ttyS0 ipa-collect-lldp=1 ipa-inspection-collectors=default,logs,pci-devices ipa-inspection-benchmarks= ipa-insecure=1 initrd=ipa.initramfs
x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
BIOS-provided physical RAM map:
...
IPA have normal start
I removed /httpboot/pxelinux.cfg/52-54-00-07-57-0f 52-54-00-c9-fe-16 and VMs were successfully inspected and provisioned
less 52-54-00-07-57-0f
#!ipxe
set attempts:int32 10
set i:int32 0
goto deploy
:deploy
imgfree
kernel http://192.168.33.5:8080//7c5bfb54-8283-4739-b173-5d9f7b4889bc/deploy_kernel selinux=0 troubleshoot=0 text systemd.journald.forward_to_console=yes ipa-insecure=1 ipa-insecure=1 ipa-collect-lldp=1 ipa-inspection-collectors=default,logs,pci-devices ipa-inspection-benchmarks= ipa-inspection-callback-url=http://192.168.33.5:5050/v1/continue ipa-api-url=http://192.168.33.5:6385 ipa-global-request-id=req-5062d291-4673-40be-9c3f-0925cf13c56a BOOTIF=${mac} initrd=deploy_ramdisk || goto retry
initrd http://192.168.33.5:8080//7c5bfb54-8283-4739-b173-5d9f7b4889bc/deploy_ramdisk || goto retry
boot
:retry
iseq ${i} ${attempts} && goto fail ||
inc i
echo No response, retrying in {i} seconds.
sleep ${i}
goto deploy
:fail
echo Failed to get a response after ${attempts} attempts
echo Powering off in 30 seconds.
sleep 30
poweroff
:boot_partition
imgfree
kernel no_kernel root={{ ROOT }} ro text systemd.journald.forward_to_console=yes ipa-insecure=1 ipa-insecure=1 ipa-collect-lldp=1 ipa-inspection-collectors=default,logs,pci-devices ipa-inspection-benchmarks= ipa-inspection-callback-url=http://192.168.33.5:5050/v1/continue ipa-api-url=http://192.168.33.5:6385 ipa-global-request-id=req-5062d291-4673-40be-9c3f-0925cf13c56a initrd=ramdisk || goto boot_partition
initrd no_ramdisk || goto boot_partition
boot
:boot_ramdisk
imgfree
kernel no_kernel root=/dev/ram0 text systemd.journald.forward_to_console=yes ipa-insecure=1 ipa-insecure=1 ipa-collect-lldp=1 ipa-inspection-collectors=default,logs,pci-devices ipa-inspection-benchmarks= ipa-inspection-callback-url=http://192.168.33.5:5050/v1/continue ipa-api-url=http://192.168.33.5:6385 ipa-global-request-id=req-5062d291-4673-40be-9c3f-0925cf13c56a initrd=ramdisk || goto boot_ramdisk
initrd no_ramdisk || goto boot_ramdisk
boot
:boot_whole_disk
sanboot --no-describe
I have faced with kind of trouble. Provisioning timeout is too short for my env. I pleasure to see a configuration of your environment corresponding current timeouts.