yunionio / cloudpods

A cloud-native open-source unified multi-cloud and hybrid-cloud platform. 开源、云原生的多云管理及混合云融合平台
https://www.cloudpods.org
Apache License 2.0
2.61k stars 536 forks source link

[求助/Help]v3.10.11版本host在新镜像创建虚拟机日志报错 #19382

Closed chenjacken closed 8 months ago

chenjacken commented 10 months ago

1,版本 高可用v3.10.11版本

host.conf的网络配置是:

image

Ceph的configmap rook-config-override的网络内容:

image

2,上传镜像并创新虚拟机的问题

1)上传镜像(大概是30G左右)感觉特别慢 2)上传镜像期间,web访问明显很慢,读取数据显示加载中 3)创建对应的虚拟机,显示“部署失败”,然后同步状态,虚拟机显示“运行中”

Web端的"更新状态失败"日志是:

deploying=>deploy_fail: {"__reason__":"Deploy guest fs: request deploy guest fs: rpc error: code = Unknown desc = run deploy_guest_fs failed []: \"/opt/yunion/bin/host-deployer --common-config-file /opt/yunion/common.conf --config /opt/yunion/host.conf --deploy-action deploy_guest_fs --deploy-params '{\\\"disk_info\\\":{\\\"path\\\":\\\"rbd:nvmepool/058e45ea-71d9-4338-8629-12b21389f028:mon_host=172.16.1.216\\\\\\\\;172.16.1.218\\\\\\\\;172.16.1.217:key=AQBz5pZlFX41OBAAJqPwV73/Zxc0nKEjdGb0uw\\\\\\\\=\\\\\\\\=:rados_osd_op_timeout=1200:client_mount_timeout=120:rados_mon_op_timeout=5\\\"},\\\"guest_desc\\\":{\\\"name\\\":\\\"yudao2\\\",\\\"uuid\\\":\\\"289287d0-4d9c-4605-89f7-69dd878143a9\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"nics\\\":[{\\\"mac\\\":\\\"00:22:a3:f2:b5:5d\\\",\\\"ip\\\":\\\"172.16.1.92\\\",\\\"net\\\":\\\"static\\\",\\\"net_id\\\":\\\"ce676c71-febf-4ca1-8ecf-6add3aa5215e\\\",\\\"gateway\\\":\\\"172.16.1.1\\\",\\\"dns\\\":\\\"172.16.1.200\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"ifname\\\":\\\"static-92\\\",\\\"masklen\\\":24,\\\"driver\\\":\\\"virtio\\\",\\\"bridge\\\":\\\"br1\\\",\\\"wire_id\\\":\\\"2a4e5367-e4c5-4410-81dd-217698d99ff2\\\",\\\"vlan\\\":1,\\\"interface\\\":\\\"bond0\\\",\\\"bw\\\":1000,\\\"mtu\\\":1500}],\\\"disks\\\":[{\\\"disk_id\\\":\\\"058e45ea-71d9-4338-8629-12b21389f028\\\",\\\"driver\\\":\\\"scsi\\\",\\\"cache_mode\\\":\\\"none\\\",\\\"aio_mode\\\":\\\"native\\\",\\\"size\\\":51200,\\\"template_id\\\":\\\"27ccd685-aab0-4498-8042-368c3d6f8d7b\\\",\\\"storage_id\\\":\\\"1b298235-a82f-4579-8b7a-e6dd2d9916d3\\\",\\\"path\\\":\\\"rbd:nvmepool/058e45ea-71d9-4338-8629-12b21389f028\\\",\\\"format\\\":\\\"raw\\\"}],\\\"Hypervisor\\\":\\\"kvm\\\",\\\"hostname\\\":\\\"yudao2\\\"},\\\"deploy_info\\\":{\\\"public_key\\\":{\\\"admin_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDG7U+zsDlTXjbDWg4/C0NElAGPJ2CXrs8dh89ftJFjPbB5W9ghrVoen4UTBBm6GqXc4hl5zGVM2zL2H31n85HfYgBo47uKFEKu9c4DpSdiTBf15zBEvhNZziOJ0FEhwglZ1WRvSKDd2+3AH23WMp++btcz/ruhbib2mdUW9nwfQj783Sl+WfJ9Ss6p3RthRtolDxrpSXAIP5KH41jwYvCLPMLBndh5sz3fHuB6AfpbjYgG++pBrhf0rtemj5f1ZtgbvQ5IlYs5L1QUcctA6BbzwlRPbaNvSaM6+hjiU3g7Fm68qmT+4uNBRVKqip0hBkMBJSW8A8ZUSLIvP4G4DDXF\\\\n\\\",\\\"project_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQxBHbbAyqBKf71sa4+xLV/9gTkZe7kIJgSyU+9ViGqfzN9B0TjBqL4pnZujHUl4Gch4EK9TGg3FtQNWTBHETRMaB4JVrjSpu4uXEYRj3EVVqJKCwwWNOoy4hj7eHmEaAFkw8CVNvBlJAPFXVXUIcPZplQQQI/Da5gUfZ8beGIlrhBWtz2Julw/5sxPiaENm2PPItiw6iZnPZ88/bZCvSHy0Cx2odZE3TJrN3H5Zob/3O09n8wCqPUrvMz9ibKb9z5iT0ANLnKtSCQW1xxIml5JlSFLEPPKFEyCdrE2mTsfPp7Gc+BUD9/KZy+8hih6gfS+dL1kK6OPOVfJLxDcjNJ\\\\n\\\"},\\\"is_init\\\":true,\\\"default_root_user\\\":true,\\\"windows_default_admin_user\\\":true,\\\"telegraf\\\":{\\\"telegraf_conf\\\":\\\"### MANAGED BY ansible-telegraf ANSIBLE ROLE ###\\\\n\\\\n[global_tags]\\\\n\\\\n    host = \\\\\\\"node6-172-16-1-219\\\\\\\"\\\\n    vm_id = \\\\\\\"289287d0-4d9c-4605-89f7-69dd878143a9\\\\\\\"\\\\n    zone = \\\\\\\"华南-广州\\\\\\\"\\\\n    tenant_id = \\\\\\\"2e152fe0619046a38081d7e487028358\\\\\\\"\\\\n    scaling_group_id = \\\\\\\"\\\\\\\"\\\\n    host_id = \\\\\\\"3605ec3c-d819-4bd4-8c7f-3f1188e808ac\\\\\\\"\\\\n    vm_name = \\\\\\\"yudao2\\\\\\\"\\\\n    zone_ext_id = \\\\\\\"\\\\\\\"\\\\n    tenant = \\\\\\\"system\\\\\\\"\\\\n    project_domain = \\\\\\\"Default\\\\\\\"\\\\n    os_type = \\\\\\\"Linux\\\\\\\"\\\\n    status = \\\\\\\"start_deploy\\\\\\\"\\\\n    cloudregion = \\\\\\\"Default\\\\\\\"\\\\n    cloudregion_id = \\\\\\\"default\\\\\\\"\\\\n    region_ext_id = \\\\\\\"\\\\\\\"\\\\n    vm_ip = \\\\\\\"172.16.1.92\\\\\\\"\\\\n    zone_id = \\\\\\\"7b6ae896-1b3d-40e5-879f-cfd00799200b\\\\\\\"\\\\n    brand = \\\\\\\"OneCloud\\\\\\\"\\\\n    domain_id = \\\\\\\"default\\\\\\\"\\\\n\\\\n# Configuration for telegraf agent\\\\n[agent]\\\\n    interval = \\\\\\\"60s\\\\\\\"\\\\n    debug = false\\\\n    hostname = \\\\\\\"\\\\\\\"\\\\n    round_interval = true\\\\n    flush_interval = \\\\\\\"60s\\\\\\\"\\\\n    flush_jitter = \\\\\\\"0s\\\\\\\"\\\\n    collection_jitter = \\\\\\\"0s\\\\\\\"\\\\n    metric_batch_size = 1000\\\\n    metric_buffer_limit = 10000\\\\n    quiet = false\\\\n    logfile = \\\\\\\"/var/log/telegraf.log\\\\\\\"\\\\n    logfile_rotation_max_size = \\\\\\\"10MB\\\\\\\"\\\\n    logfile_rotation_max_archives = 1\\\\n    omit_hostname = true\\\\n\\\\n###############################################################################\\\\n#                                  OUTPUTS                                    #\\\\n###############################################################################\\\\n\\\\n[[outputs.influxdb]]\\\\n    urls = [\\\\\\\"http://169.254.169.254/monitor\\\\\\\"]\\\\n    database = \\\\\\\"telegraf\\\\\\\"\\\\n    insecure_skip_verify = true\\\\n\\\\n###############################################################################\\\\n#                                  INPUTS                                     #\\\\n###############################################################################\\\\n[[inputs.cpu]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    percpu = true\\\\n    totalcpu = true\\\\n    collect_cpu_time = false\\\\n    report_active = true\\\\n[[inputs.disk]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    ignore_fs = [\\\\\\\"tmpfs\\\\\\\", \\\\\\\"devtmpfs\\\\\\\", \\\\\\\"overlay\\\\\\\", \\\\\\\"squashfs\\\\\\\", \\\\\\\"iso9660\\\\\\\"]\\\\n[[inputs.diskio]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    skip_serial_number = false\\\\n[[inputs.kernel]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.kernel_vmstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.mem]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.processes]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.swap]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.system]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.net]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.netstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.nstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.internal]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    collect_memstats = false\\\\n\\\"}}}'\" error: Process exited with status 2, cmd error: [info 240130 09:13:11 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/host.conf\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bandwidth-limit\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bridge-driver\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-vm-uuid\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-delay-seconds\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-underlay-mtu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-skip-tls-verify\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tap-man\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument set-vnc-password\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sync-storage-info-duration-second\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument linux-default-root-user\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-monitor\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-pid-file\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-limit\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument migrate-expect-rate\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-block-size\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tc-man\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ethtool-enable-gso\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-set-cgroup\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument zero-clean-disk-data\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument always-recycle-diskfile\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-ksm\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-virtio-rng-device\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-gpu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument block-io-scheduler\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-config-file\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-image-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-recycle-day\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-storage-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-bps-per-cpu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-usb\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-template-backing\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-lease-timeout\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument restrict-qemu-img-convert-worker\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-dir-suffix\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-request-worker-count\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-server-port\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-qemu-debug-log\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-reserved-memory\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument memory-snapshots-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-probe-kubelet\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-south-database\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-router-vms\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bw-download-bandwidth\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-eip-man\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument use-boot-vga\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-use-tls\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-iops-per-cpu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-openflow-controller\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument binary-memclean-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-mapped-bridge\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovmf-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument slots\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument windows-default-admin-user\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-backing-template\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument report-interval\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-guest-man\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-socket-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument kubelet-run-directory\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-integration-bridge\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-renewal-time\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-encap-ip\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument servers-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-iops-per-cpu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-bps-per-cpu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-hotplug-vcpu-count\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument pcie-root-port-count\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-switch-vms\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-image-save-format\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument rack\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile-keep-days\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-cpu-binding\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument image-cache-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-allow-conntrack-invalid\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-health-timeout\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-temp-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-custom-device\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-telegraf\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument check-system-services\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-fallocate-disk\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tap-bridge-name\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-lease-time\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-eip-bridge\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-live-migrate-downtime\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tunnel-padding-bytes\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument min-migrate-timeout-seconds\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ping-region-interval\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-kvm\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-type\n[info 240130 09:13:11 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-01-30 09:13:11 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/common.conf\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-qemu-version\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[info 2024-01-30 09:13:11 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-01-30 09:13:11 deployserver.(*SDeployService).InitService(deployserver.go:454)] exec socket path: /var/run/onecloud/exec.sock\nfatal error: sync: unlock of unlocked mutex\n\ngoroutine 1 [running]:\nruntime.throw({0x111a4d0?, 0xc00023caf0?})\n\t/opt/go/src/runtime/panic.go:992 +0x71 fp=0xc000845928 sp=0xc0008458f8 pc=0x4379d1\nsync.throw({0x111a4d0?, 0xf19900?})\n\t/opt/go/src/runtime/panic.go:978 +0x1e fp=0xc000845948 sp=0xc000845928 pc=0x4656de\nsync.(*Mutex).unlockSlow(0xc000348278, 0xffffffff)\n\t/opt/go/src/sync/mutex.go:220 +0x3c fp=0xc000845970 sp=0xc000845948 pc=0x474c1c\nsync.(*Mutex).Unlock(...)\n\t/opt/go/src/sync/mutex.go:214\nyunion.io/x/onecloud/pkg/util/xfsutils.UnlockXfsPartition({0xc0007081d1, 0x24})\n\t/root/go/src/yunion.io/x/onecloud/pkg/util/xfsutils/lock.go:48 +0xf4 fp=0xc0008459d0 sp=0xc000845970 pc=0xd1a8d4\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:283 +0x45 fp=0xc0008459f0 sp=0xc0008459d0 pc=0xd27945\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount(0xc00007ac00)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:303 +0x699 fp=0xc000845b88 sp=0xc0008459f0 pc=0xd277d9\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).UmountRootfs(0xc0005fbb20?, {0x12c14a0?, 0xc000010280?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:117 +0x3b fp=0xc000845ba0 sp=0xc000845b88 pc=0xe7a7fb\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:475 +0x36 fp=0xc000845bc8 sp=0xc000845ba0 pc=0xd20a16\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs({0x12bdd48, 0xc00060d470}, 0xc00059d040)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:485 +0x366 fp=0xc000845c90 sp=0xc000845bc8 pc=0xd20906\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).DeployGuestfs(0xc00060d470?, 0x0?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:123 +0x26 fp=0xc000845cb8 sp=0xc000845c90 pc=0xe7a866\nyunion.io/x/onecloud/pkg/hostman/diskutils.(*SKVMGuestDisk).DeployGuestfs(...)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/kvm.go:144\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*LocalDeploy).DeployGuestFs(0xc0006b8000?, 0xc00059d040)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:46 +0x184 fp=0xc000845d88 sp=0xc000845cb8 pc=0xe85c84\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.StartLocalDeploy({0x7ffea2ed6cba?, 0x4?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:130 +0x2a8 fp=0xc000845de8 sp=0xc000845d88 pc=0xe86dc8\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*SDeployService).RunService(0xc00003ba00?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/deployserver.go:266 +0x5b fp=0xc000845ed0 sp=0xc000845de8 pc=0xe83cfb\nyunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc00000ebb8)\n\t/root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xfa fp=0xc000845f50 sp=0xc000845ed0 pc=0xb70e5a\nmain.main()\n\t/root/go/src/yunion.io/x/onecloud/cmd/host-deployer/main.go:28 +0xe5 fp=0xc000845f80 sp=0xc000845f50 pc=0xe87625\nruntime.main()\n\t/opt/go/src/runtime/proc.go:250 +0x212 fp=0xc000845fe0 sp=0xc000845f80 pc=0x43a0f2\nruntime.goexit()\n\t/opt/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000845fe8 sp=0xc000845fe0 pc=0x46aa61\n\ngoroutine 24 [chan send]:\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2.1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:189 +0x24b\ncreated by yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:186 +0xe5\n\ngoroutine 8 [syscall]:\nos/signal.signal_recv()\n\t/opt/go/src/runtime/sigqueue.go:151 +0x2f\nos/signal.loop()\n\t/opt/go/src/os/signal/signal_unix.go:23 +0x19\ncreated by os/signal.Notify.func1.1\n\t/opt/go/src/os/signal/signal.go:151 +0x2a\n\ngoroutine 9 [chan receive]:\nyunion.io/x/pkg/util/signalutils.StartTrap.func1()\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:72 +0xa7\ncreated by yunion.io/x/pkg/util/signalutils.StartTrap\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:62 +0xd4\n","__stage__":"OnDeployGuestComplete","__status__":"error"}

Web端的"部署失败"日志是:

{
    "__reason__": "Deploy guest fs: request deploy guest fs: rpc error: code = Unknown desc = run deploy_guest_fs failed []: \"/opt/yunion/bin/host-deployer --common-config-file /opt/yunion/common.conf --config /opt/yunion/host.conf --deploy-action deploy_guest_fs --deploy-params '{\\\"disk_info\\\":{\\\"path\\\":\\\"rbd:nvmepool/058e45ea-71d9-4338-8629-12b21389f028:mon_host=172.16.1.216\\\\\\\\;172.16.1.218\\\\\\\\;172.16.1.217:key=AQBz5pZlFX41OBAAJqPwV73/Zxc0nKEjdGb0uw\\\\\\\\=\\\\\\\\=:rados_osd_op_timeout=1200:client_mount_timeout=120:rados_mon_op_timeout=5\\\"},\\\"guest_desc\\\":{\\\"name\\\":\\\"yudao2\\\",\\\"uuid\\\":\\\"289287d0-4d9c-4605-89f7-69dd878143a9\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"nics\\\":[{\\\"mac\\\":\\\"00:22:a3:f2:b5:5d\\\",\\\"ip\\\":\\\"172.16.1.92\\\",\\\"net\\\":\\\"static\\\",\\\"net_id\\\":\\\"ce676c71-febf-4ca1-8ecf-6add3aa5215e\\\",\\\"gateway\\\":\\\"172.16.1.1\\\",\\\"dns\\\":\\\"172.16.1.200\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"ifname\\\":\\\"static-92\\\",\\\"masklen\\\":24,\\\"driver\\\":\\\"virtio\\\",\\\"bridge\\\":\\\"br1\\\",\\\"wire_id\\\":\\\"2a4e5367-e4c5-4410-81dd-217698d99ff2\\\",\\\"vlan\\\":1,\\\"interface\\\":\\\"bond0\\\",\\\"bw\\\":1000,\\\"mtu\\\":1500}],\\\"disks\\\":[{\\\"disk_id\\\":\\\"058e45ea-71d9-4338-8629-12b21389f028\\\",\\\"driver\\\":\\\"scsi\\\",\\\"cache_mode\\\":\\\"none\\\",\\\"aio_mode\\\":\\\"native\\\",\\\"size\\\":51200,\\\"template_id\\\":\\\"27ccd685-aab0-4498-8042-368c3d6f8d7b\\\",\\\"storage_id\\\":\\\"1b298235-a82f-4579-8b7a-e6dd2d9916d3\\\",\\\"path\\\":\\\"rbd:nvmepool/058e45ea-71d9-4338-8629-12b21389f028\\\",\\\"format\\\":\\\"raw\\\"}],\\\"Hypervisor\\\":\\\"kvm\\\",\\\"hostname\\\":\\\"yudao2\\\"},\\\"deploy_info\\\":{\\\"public_key\\\":{\\\"admin_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDG7U+zsDlTXjbDWg4/C0NElAGPJ2CXrs8dh89ftJFjPbB5W9ghrVoen4UTBBm6GqXc4hl5zGVM2zL2H31n85HfYgBo47uKFEKu9c4DpSdiTBf15zBEvhNZziOJ0FEhwglZ1WRvSKDd2+3AH23WMp++btcz/ruhbib2mdUW9nwfQj783Sl+WfJ9Ss6p3RthRtolDxrpSXAIP5KH41jwYvCLPMLBndh5sz3fHuB6AfpbjYgG++pBrhf0rtemj5f1ZtgbvQ5IlYs5L1QUcctA6BbzwlRPbaNvSaM6+hjiU3g7Fm68qmT+4uNBRVKqip0hBkMBJSW8A8ZUSLIvP4G4DDXF\\\\n\\\",\\\"project_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQxBHbbAyqBKf71sa4+xLV/9gTkZe7kIJgSyU+9ViGqfzN9B0TjBqL4pnZujHUl4Gch4EK9TGg3FtQNWTBHETRMaB4JVrjSpu4uXEYRj3EVVqJKCwwWNOoy4hj7eHmEaAFkw8CVNvBlJAPFXVXUIcPZplQQQI/Da5gUfZ8beGIlrhBWtz2Julw/5sxPiaENm2PPItiw6iZnPZ88/bZCvSHy0Cx2odZE3TJrN3H5Zob/3O09n8wCqPUrvMz9ibKb9z5iT0ANLnKtSCQW1xxIml5JlSFLEPPKFEyCdrE2mTsfPp7Gc+BUD9/KZy+8hih6gfS+dL1kK6OPOVfJLxDcjNJ\\\\n\\\"},\\\"is_init\\\":true,\\\"default_root_user\\\":true,\\\"windows_default_admin_user\\\":true,\\\"telegraf\\\":{\\\"telegraf_conf\\\":\\\"### MANAGED BY ansible-telegraf ANSIBLE ROLE ###\\\\n\\\\n[global_tags]\\\\n\\\\n    host = \\\\\\\"node6-172-16-1-219\\\\\\\"\\\\n    vm_id = \\\\\\\"289287d0-4d9c-4605-89f7-69dd878143a9\\\\\\\"\\\\n    zone = \\\\\\\"华南-广州\\\\\\\"\\\\n    tenant_id = \\\\\\\"2e152fe0619046a38081d7e487028358\\\\\\\"\\\\n    scaling_group_id = \\\\\\\"\\\\\\\"\\\\n    host_id = \\\\\\\"3605ec3c-d819-4bd4-8c7f-3f1188e808ac\\\\\\\"\\\\n    vm_name = \\\\\\\"yudao2\\\\\\\"\\\\n    zone_ext_id = \\\\\\\"\\\\\\\"\\\\n    tenant = \\\\\\\"system\\\\\\\"\\\\n    project_domain = \\\\\\\"Default\\\\\\\"\\\\n    os_type = \\\\\\\"Linux\\\\\\\"\\\\n    status = \\\\\\\"start_deploy\\\\\\\"\\\\n    cloudregion = \\\\\\\"Default\\\\\\\"\\\\n    cloudregion_id = \\\\\\\"default\\\\\\\"\\\\n    region_ext_id = \\\\\\\"\\\\\\\"\\\\n    vm_ip = \\\\\\\"172.16.1.92\\\\\\\"\\\\n    zone_id = \\\\\\\"7b6ae896-1b3d-40e5-879f-cfd00799200b\\\\\\\"\\\\n    brand = \\\\\\\"OneCloud\\\\\\\"\\\\n    domain_id = \\\\\\\"default\\\\\\\"\\\\n\\\\n# Configuration for telegraf agent\\\\n[agent]\\\\n    interval = \\\\\\\"60s\\\\\\\"\\\\n    debug = false\\\\n    hostname = \\\\\\\"\\\\\\\"\\\\n    round_interval = true\\\\n    flush_interval = \\\\\\\"60s\\\\\\\"\\\\n    flush_jitter = \\\\\\\"0s\\\\\\\"\\\\n    collection_jitter = \\\\\\\"0s\\\\\\\"\\\\n    metric_batch_size = 1000\\\\n    metric_buffer_limit = 10000\\\\n    quiet = false\\\\n    logfile = \\\\\\\"/var/log/telegraf.log\\\\\\\"\\\\n    logfile_rotation_max_size = \\\\\\\"10MB\\\\\\\"\\\\n    logfile_rotation_max_archives = 1\\\\n    omit_hostname = true\\\\n\\\\n###############################################################################\\\\n#                                  OUTPUTS                                    #\\\\n###############################################################################\\\\n\\\\n[[outputs.influxdb]]\\\\n    urls = [\\\\\\\"http://169.254.169.254/monitor\\\\\\\"]\\\\n    database = \\\\\\\"telegraf\\\\\\\"\\\\n    insecure_skip_verify = true\\\\n\\\\n###############################################################################\\\\n#                                  INPUTS                                     #\\\\n###############################################################################\\\\n[[inputs.cpu]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    percpu = true\\\\n    totalcpu = true\\\\n    collect_cpu_time = false\\\\n    report_active = true\\\\n[[inputs.disk]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    ignore_fs = [\\\\\\\"tmpfs\\\\\\\", \\\\\\\"devtmpfs\\\\\\\", \\\\\\\"overlay\\\\\\\", \\\\\\\"squashfs\\\\\\\", \\\\\\\"iso9660\\\\\\\"]\\\\n[[inputs.diskio]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    skip_serial_number = false\\\\n[[inputs.kernel]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.kernel_vmstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.mem]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.processes]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.swap]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.system]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.net]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.netstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.nstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.internal]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    collect_memstats = false\\\\n\\\"}}}'\" error: Process exited with status 2, cmd error: [info 240130 09:13:11 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/host.conf\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bandwidth-limit\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bridge-driver\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-vm-uuid\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-delay-seconds\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-underlay-mtu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-skip-tls-verify\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tap-man\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument set-vnc-password\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sync-storage-info-duration-second\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument linux-default-root-user\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-monitor\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-pid-file\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-limit\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument migrate-expect-rate\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-block-size\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tc-man\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ethtool-enable-gso\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-set-cgroup\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument zero-clean-disk-data\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument always-recycle-diskfile\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-ksm\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-virtio-rng-device\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-gpu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument block-io-scheduler\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-config-file\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-image-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-recycle-day\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-storage-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-bps-per-cpu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-usb\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-template-backing\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-lease-timeout\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument restrict-qemu-img-convert-worker\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-dir-suffix\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-request-worker-count\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-server-port\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-qemu-debug-log\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-reserved-memory\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument memory-snapshots-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-probe-kubelet\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-south-database\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-router-vms\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bw-download-bandwidth\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-eip-man\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument use-boot-vga\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-use-tls\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-iops-per-cpu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-openflow-controller\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument binary-memclean-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-mapped-bridge\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovmf-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument slots\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument windows-default-admin-user\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-backing-template\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument report-interval\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-guest-man\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-socket-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument kubelet-run-directory\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-integration-bridge\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-renewal-time\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-encap-ip\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument servers-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-iops-per-cpu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-bps-per-cpu\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-hotplug-vcpu-count\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument pcie-root-port-count\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-switch-vms\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-image-save-format\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument rack\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile-keep-days\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-cpu-binding\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument image-cache-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-allow-conntrack-invalid\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-health-timeout\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-temp-path\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-custom-device\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-telegraf\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument check-system-services\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-fallocate-disk\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tap-bridge-name\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-lease-time\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-eip-bridge\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-live-migrate-downtime\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tunnel-padding-bytes\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument min-migrate-timeout-seconds\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ping-region-interval\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-kvm\n[warning 240130 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-type\n[info 240130 09:13:11 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-01-30 09:13:11 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/common.conf\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-qemu-version\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 2024-01-30 09:13:11 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[info 2024-01-30 09:13:11 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-01-30 09:13:11 deployserver.(*SDeployService).InitService(deployserver.go:454)] exec socket path: /var/run/onecloud/exec.sock\nfatal error: sync: unlock of unlocked mutex\n\ngoroutine 1 [running]:\nruntime.throw({0x111a4d0?, 0xc00023caf0?})\n\t/opt/go/src/runtime/panic.go:992 +0x71 fp=0xc000845928 sp=0xc0008458f8 pc=0x4379d1\nsync.throw({0x111a4d0?, 0xf19900?})\n\t/opt/go/src/runtime/panic.go:978 +0x1e fp=0xc000845948 sp=0xc000845928 pc=0x4656de\nsync.(*Mutex).unlockSlow(0xc000348278, 0xffffffff)\n\t/opt/go/src/sync/mutex.go:220 +0x3c fp=0xc000845970 sp=0xc000845948 pc=0x474c1c\nsync.(*Mutex).Unlock(...)\n\t/opt/go/src/sync/mutex.go:214\nyunion.io/x/onecloud/pkg/util/xfsutils.UnlockXfsPartition({0xc0007081d1, 0x24})\n\t/root/go/src/yunion.io/x/onecloud/pkg/util/xfsutils/lock.go:48 +0xf4 fp=0xc0008459d0 sp=0xc000845970 pc=0xd1a8d4\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:283 +0x45 fp=0xc0008459f0 sp=0xc0008459d0 pc=0xd27945\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount(0xc00007ac00)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:303 +0x699 fp=0xc000845b88 sp=0xc0008459f0 pc=0xd277d9\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).UmountRootfs(0xc0005fbb20?, {0x12c14a0?, 0xc000010280?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:117 +0x3b fp=0xc000845ba0 sp=0xc000845b88 pc=0xe7a7fb\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:475 +0x36 fp=0xc000845bc8 sp=0xc000845ba0 pc=0xd20a16\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs({0x12bdd48, 0xc00060d470}, 0xc00059d040)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:485 +0x366 fp=0xc000845c90 sp=0xc000845bc8 pc=0xd20906\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).DeployGuestfs(0xc00060d470?, 0x0?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:123 +0x26 fp=0xc000845cb8 sp=0xc000845c90 pc=0xe7a866\nyunion.io/x/onecloud/pkg/hostman/diskutils.(*SKVMGuestDisk).DeployGuestfs(...)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/kvm.go:144\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*LocalDeploy).DeployGuestFs(0xc0006b8000?, 0xc00059d040)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:46 +0x184 fp=0xc000845d88 sp=0xc000845cb8 pc=0xe85c84\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.StartLocalDeploy({0x7ffea2ed6cba?, 0x4?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:130 +0x2a8 fp=0xc000845de8 sp=0xc000845d88 pc=0xe86dc8\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*SDeployService).RunService(0xc00003ba00?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/deployserver.go:266 +0x5b fp=0xc000845ed0 sp=0xc000845de8 pc=0xe83cfb\nyunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc00000ebb8)\n\t/root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xfa fp=0xc000845f50 sp=0xc000845ed0 pc=0xb70e5a\nmain.main()\n\t/root/go/src/yunion.io/x/onecloud/cmd/host-deployer/main.go:28 +0xe5 fp=0xc000845f80 sp=0xc000845f50 pc=0xe87625\nruntime.main()\n\t/opt/go/src/runtime/proc.go:250 +0x212 fp=0xc000845fe0 sp=0xc000845f80 pc=0x43a0f2\nruntime.goexit()\n\t/opt/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000845fe8 sp=0xc000845fe0 pc=0x46aa61\n\ngoroutine 24 [chan send]:\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2.1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:189 +0x24b\ncreated by yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:186 +0xe5\n\ngoroutine 8 [syscall]:\nos/signal.signal_recv()\n\t/opt/go/src/runtime/sigqueue.go:151 +0x2f\nos/signal.loop()\n\t/opt/go/src/os/signal/signal_unix.go:23 +0x19\ncreated by os/signal.Notify.func1.1\n\t/opt/go/src/os/signal/signal.go:151 +0x2a\n\ngoroutine 9 [chan receive]:\nyunion.io/x/pkg/util/signalutils.StartTrap.func1()\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:72 +0xa7\ncreated by yunion.io/x/pkg/util/signalutils.StartTrap\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:62 +0xd4\n",
    "__stage__": "OnDeployGuestComplete",
    "__status__": "error"
}
chenjacken commented 10 months ago

host的POD日志是:

[info 2024-01-30 10:12:43 remotefile.(*SRemoteFile).downloadInternal.func1(remotefile.go:263)] written file /opt/cloud/workspace/disks/image_cache/118f76a6-cfb2-49a8-892c-eee6136b234c.tmp rate: 10.34 MiB p/s percent: 99.78%
[warning 2024-01-30 10:12:43 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 49 cycles...
[info 2024-01-30 10:12:44 remotefile.(*SRemoteFile).downloadInternal.func1(remotefile.go:263)] written file /opt/cloud/workspace/disks/image_cache/118f76a6-cfb2-49a8-892c-eee6136b234c.tmp rate: 10.47 MiB p/s percent: 99.85%
[info 2024-01-30 10:12:45 remotefile.(*SRemoteFile).downloadInternal.func1(remotefile.go:263)] written file /opt/cloud/workspace/disks/image_cache/118f76a6-cfb2-49a8-892c-eee6136b234c.tmp rate: 10.52 MiB p/s percent: 99.92%
[info 2024-01-30 10:12:46 remotefile.(*SRemoteFile).downloadInternal.func1(remotefile.go:263)] written file /opt/cloud/workspace/disks/image_cache/118f76a6-cfb2-49a8-892c-eee6136b234c.tmp rate: 10.01 MiB p/s percent: 99.99%
[warning 2024-01-30 10:13:13 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 50 cycles...
[info 2024-01-30 10:13:25 storageman.(*SRbdImageCache).Acquire(imagecache_rbd.go:86)] convert local image 118f76a6-cfb2-49a8-892c-eee6136b234c to rbd pool nvmepool
[warning 2024-01-30 10:13:43 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 51 cycles...
[warning 2024-01-30 10:14:13 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 52 cycles...
[warning 2024-01-30 10:14:43 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 53 cycles...
[warning 2024-01-30 10:15:13 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 54 cycles...
[warning 2024-01-30 10:15:43 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 55 cycles...
[warning 2024-01-30 10:16:13 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 56 cycles...
[info 2024-01-30 10:16:35 workmanager.(*workerTask).Run(manager.go:95)] DelayTask complete: {"image_id":"118f76a6-cfb2-49a8-892c-eee6136b234c","name":"yudao1-2","path":"rbd:nvmepool/image_cache_118f76a6-cfb2-49a8-892c-eee6136b234c:mon_host=172.16.1.216\\;172.16.1.218\\;172.16.1.217:key=AQBz5pZlFX41OBAAJqPwV73/Zxc0nKEjdGb0uw\\=\\=:rados_mon_op_timeout=5:rados_osd_op_timeout=1200:client_mount_timeout=120","size":204800}
[info 2024-01-30 10:16:35 modules.TaskComplete(task.go:34)] Sync task 2a91051f-78c6-4eab-867a-16017fa34e73 complete succ
[info 2024-01-30 12:52:41 appsrv.(*Application).ServeHTTP(appsrv.go:288)] S-ox38NQdlc88DjrcA5pl0ksZmU= 200 77d344-5b692f-1c4972 GET /servers/289287d0-4d9c-4605-89f7-69dd878143a9/status (172.16.1.211:1842:compute_v2) 59.03ms
[info 2024-01-30 12:52:41 modules.TaskComplete(task.go:34)] Sync task 27bae067-18ca-4259-88bb-25e35dcb4674 complete succ
[info 2024-01-30 12:52:44 appsrv.(*Application).ServeHTTP(appsrv.go:288)] S-ox38NQdlc88DjrcA5pl0ksZmU= 200 ed08cd-591031-7631dd GET /servers/289287d0-4d9c-4605-89f7-69dd878143a9/status (172.16.1.211:60917:compute_v2) 0.20ms
[info 2024-01-30 12:52:44 modules.TaskComplete(task.go:34)] Sync task cc842b60-bac9-4147-8059-335a69fb9509 complete succ
[info 2024-01-30 12:52:50 appsrv.(*Application).ServeHTTP(appsrv.go:288)] S-ox38NQdlc88DjrcA5pl0ksZmU= 200 498edf-efab3f-3424b3 GET /servers/289287d0-4d9c-4605-89f7-69dd878143a9/status (172.16.1.211:18294:compute_v2) 0.11ms
[info 2024-01-30 12:52:50 modules.TaskComplete(task.go:34)] Sync task 23eec0f6-991d-4e63-8aa3-f2ee77d6b8ef complete succ
[info 2024-01-30 12:52:51 appsrv.(*Application).ServeHTTP(appsrv.go:288)] S-ox38NQdlc88DjrcA5pl0ksZmU= 200 3ded0b-7a0546-3475df GET /servers/289287d0-4d9c-4605-89f7-69dd878143a9/status (172.16.1.211:55389:compute_v2) 0.21ms
[info 2024-01-30 12:52:51 modules.TaskComplete(task.go:34)] Sync task aeaeb207-bdcc-400f-8746-3e9d11e91e27 complete succ
[info 2024-01-30 12:52:56 appsrv.(*Application).ServeHTTP(appsrv.go:288)] S-ox38NQdlc88DjrcA5pl0ksZmU= 200 a054b6-acf548-35de33 GET /servers/289287d0-4d9c-4605-89f7-69dd878143a9/status (172.16.1.211:48940:compute_v2) 0.22ms
[info 2024-01-30 12:52:56 modules.TaskComplete(task.go:34)] Sync task 1d6c8b78-17e6-4ef8-8636-f9fb048296b9 complete succ
[info 2024-01-30 12:52:59 appsrv.(*Application).ServeHTTP(appsrv.go:288)] S-ox38NQdlc88DjrcA5pl0ksZmU= 200 c0a834-436505-59c89f POST /servers/289287d0-4d9c-4605-89f7-69dd878143a9/start (172.16.1.211:20318:compute_v2) 3.46ms
[info 2024-01-30 12:52:59 guestman.(*SKVMGuestInstance).asyncScriptStart(qemu-kvm.go:580)] Use vnc port 1
[error 2024-01-30 12:53:01 guestman.(*SKVMGuestInstance).StartMonitor(qemu-kvm.go:824)] Guest 289287d0-4d9c-4605-89f7-69dd878143a9 start monitor failed, can't get qmp monitor port or monitor path
[error 2024-01-30 12:53:01 guestman.(*SKVMGuestInstance).StartMonitor(qemu-kvm.go:824)] Guest 289287d0-4d9c-4605-89f7-69dd878143a9 start monitor failed, can't get qmp monitor port or monitor path
[info 2024-01-30 12:53:01 monitor.(*SBaseMonitor).connect(monitor.go:298)] Connect tcp 127.0.0.1:56101 success
[info 2024-01-30 12:53:01 guestman.(*SKVMGuestInstance).asyncScriptStart(qemu-kvm.go:605)] VM started yudao2(289287d0-4d9c-4605-89f7-69dd878143a9) ...
[info 2024-01-30 12:53:01 guestman.(*SKVMGuestInstance).asyncScriptStart(qemu-kvm.go:611)] Async start server yudao2(289287d0-4d9c-4605-89f7-69dd878143a9) success!
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"QMP": {"version": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "2022-12-15_14:23:05@buildkitsandbox@e2220a9"}, "capabilities": ["oob"]}}
[info 2024-01-30 12:53:03 guestman.(*SKVMGuestInstance).onMonitorConnected(qemu-kvm.go:1016)] Monitor connected ...
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"execute":"qmp_capabilities"}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"execute":"query-version"}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"return": {}}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"return": {"qemu": {"micro": 0, "minor": 2, "major": 4}, "package": "2022-12-15_14:23:05@buildkitsandbox@e2220a9"}}
[info 2024-01-30 12:53:03 guestman.(*SKVMGuestInstance).onGetQemuVersion(qemu-kvm.go:1086)] Guest(289287d0-4d9c-4605-89f7-69dd878143a9) qemu version 4.2.0
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"execute":"human-monitor-command","arguments":{"command-line":"info status"}}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"return": "VM status: paused (prelaunch)\r\n"}
[info 2024-01-30 12:53:03 guestman.(*SGuestResumeTask).onConfirmRunning(guesttasks.go:1536)]289287d0-4d9c-4605-89f7-69dd878143a9: onConfirmRunning status paused (prelaunch)
[error 2024-01-30 12:53:03 cgrouputils.(*CGroupTask).createTask(cgrouputils.go:236)] mkdir /sys/fs/cgroup/memory/cloudpods.hostagent/server_289287d0-4d9c-4605-89f7-69dd878143a9_20338: no such file or directory
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"execute":"cont"}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"execute":"human-monitor-command","arguments":{"command-line":"block_set_io_throttle drive_0 0 0 0 0 0 0"}}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"timestamp": {"seconds": 1706619183, "microseconds": 247259}, "event": "RESUME"}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).watchEvent(qmp.go:252)] QMP event yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): QMP Event result: &monitor.Event{Event:"\"RESUME\"", Data:map[string]interface {}{}, Timestamp:(*monitor.Timestamp)(0xc001abbbe0)}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"return": {}}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"execute":"human-monitor-command","arguments":{"command-line":"info status"}}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"return": ""}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"return": "VM status: running\r\n"}
[info 2024-01-30 12:53:03 guestman.(*SGuestResumeTask).onConfirmRunning(guesttasks.go:1536)]289287d0-4d9c-4605-89f7-69dd878143a9: onConfirmRunning status running
[info 2024-01-30 12:53:03 modules.TaskComplete(task.go:34)] Sync task dd3140a1-4e5c-4447-8001-b58a3dca8824 complete succ
[info 2024-01-30 12:53:03 guestman.(*SKVMGuestInstance).detachStartupTask(qemu-kvm.go:1511)]289287d0-4d9c-4605-89f7-69dd878143a9: detachStartupTask
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"execute":"query-block-jobs"}
[info 2024-01-30 12:53:03 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"return": []}
[info 2024-01-30 12:53:06 monitor.(*QmpMonitor).write(qmp.go:260)] QMP Write yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"execute":"set_password","arguments":{"password":"tSTd2z8b","protocol":"vnc"}}
[info 2024-01-30 12:53:06 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"return": {}}
[info 2024-01-30 12:53:32 monitor.(*QmpMonitor).read(qmp.go:182)] QMP Read yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): {"timestamp": {"seconds": 1706619212, "microseconds": 599381}, "event": "NIC_RX_FILTER_CHANGED", "data": {"name": "netdev-static-92", "path": "/machine/peripheral/netdev-static-92/virtio-backend"}}
[info 2024-01-30 12:53:32 monitor.(*QmpMonitor).watchEvent(qmp.go:252)] QMP event yudao2(289287d0-4d9c-4605-89f7-69dd878143a9): QMP Event result: &monitor.Event{Event:"\"NIC_RX_FILTER_CHANGED\"", Data:map[string]interface {}{"name":"netdev-static-92", "path":"/machine/peripheral/netdev-static-92/virtio-backend"}, Timestamp:(*monitor.Timestamp)(0xc0020b0040)}
[info 2024-01-30 12:53:32 hostdhcp.(*SGuestDHCPServer).serveDHCPInternal(dhcpserver.go:278)] Make DHCP Reply 172.16.1.92 TO 00:22:a3:f2:b5:5d
[root@master1 ~]# kubectl logs default-host-ctsxr -c host -n onecloud |grep error
[error 2024-01-29 13:34:45 fileutils2.GetAllBlkdevsIoSchedulers(fileutils.go:170)] no block device avaiable
[error 2024-01-29 13:34:54 guestman.(*SGuestManager).OnVerifyExistingGuestsSucc(guestman.go:295)] Server CDN-Node-GZ02(b00d5297-0b52-4bcb-8d68-d722ab1ec713) not found on this host
[error 2024-01-29 13:34:54 guestman.(*SGuestManager).OnVerifyExistingGuestsSucc(guestman.go:295)] Server kubernetes-node-mrgt-3(b654eb65-7fa2-4eba-8d72-2bb56a72a3d3) not found on this host
[error 2024-01-29 13:34:54 guestman.(*SGuestManager).OnVerifyExistingGuestsSucc(guestman.go:295)] Server makers-2(13a296d5-4629-4ff1-8114-0f72bfe683f2) not found on this host
[error 2024-01-29 13:34:54 hostinfo.(*SHostInfo).PutHostOnline(hostinfo.go:1552)] Host sys error: map[isolated_devices:[{isolated_devices    GPU 03:00.0 use kernel driver ast, skip it 2024-01-29 13:34:49.750245895 +0000 UTC m=+8.038013814}]]
[error 2024-01-29 13:34:54 httperrors.HTTPError(httperrors.go:110)] Send error Guest 13a296d5-4629-4ff1-8114-0f72bfe683f2 not found
[error 2024-01-29 13:34:54 httperrors.HTTPError(httperrors.go:110)] Send error Guest b654eb65-7fa2-4eba-8d72-2bb56a72a3d3 not found
[error 2024-01-29 13:34:54 httperrors.HTTPError(httperrors.go:110)] Send error Guest b00d5297-0b52-4bcb-8d68-d722ab1ec713 not found
[error 2024-01-30 07:23:21 httperrors.HTTPError(httperrors.go:110)] Send error Guest 20ad2895-45e6-41b1-8e87-3842445990e8 not found
[error 2024-01-30 07:23:21 httperrors.HTTPError(httperrors.go:110)] Send error Guest 20ad2895-45e6-41b1-8e87-3842445990e8 not found
[error 2024-01-30 07:23:30 httperrors.HTTPError(httperrors.go:110)] Send error Guest 20ad2895-45e6-41b1-8e87-3842445990e8 not found
[error 2024-01-30 07:23:30 httperrors.HTTPError(httperrors.go:110)] Send error Guest 20ad2895-45e6-41b1-8e87-3842445990e8 not found
[error 2024-01-30 07:23:39 httperrors.HTTPError(httperrors.go:110)] Send error Not found
[info 2024-01-30 09:13:20 workmanager.(*workerTask).Run(manager.go:92)] DelayTask failed: Deploy guest fs: request deploy guest fs: rpc error: code = Unknown desc = run deploy_guest_fs failed []: "/opt/yunion/bin/host-deployer --common-config-file /opt/yunion/common.conf --config /opt/yunion/host.conf --deploy-action deploy_guest_fs --deploy-params '{\"disk_info\":{\"path\":\"rbd:nvmepool/058e45ea-71d9-4338-8629-12b21389f028:mon_host=172.16.1.216\\\\;172.16.1.218\\\\;172.16.1.217:key=AQBz5pZlFX41OBAAJqPwV73/Zxc0nKEjdGb0uw\\\\=\\\\=:rados_osd_op_timeout=1200:client_mount_timeout=120:rados_mon_op_timeout=5\"},\"guest_desc\":{\"name\":\"yudao2\",\"uuid\":\"289287d0-4d9c-4605-89f7-69dd878143a9\",\"domain\":\"cloud.onecloud.io\",\"nics\":[{\"mac\":\"00:22:a3:f2:b5:5d\",\"ip\":\"172.16.1.92\",\"net\":\"static\",\"net_id\":\"ce676c71-febf-4ca1-8ecf-6add3aa5215e\",\"gateway\":\"172.16.1.1\",\"dns\":\"172.16.1.200\",\"domain\":\"cloud.onecloud.io\",\"ifname\":\"static-92\",\"masklen\":24,\"driver\":\"virtio\",\"bridge\":\"br1\",\"wire_id\":\"2a4e5367-e4c5-4410-81dd-217698d99ff2\",\"vlan\":1,\"interface\":\"bond0\",\"bw\":1000,\"mtu\":1500}],\"disks\":[{\"disk_id\":\"058e45ea-71d9-4338-8629-12b21389f028\",\"driver\":\"scsi\",\"cache_mode\":\"none\",\"aio_mode\":\"native\",\"size\":51200,\"template_id\":\"27ccd685-aab0-4498-8042-368c3d6f8d7b\",\"storage_id\":\"1b298235-a82f-4579-8b7a-e6dd2d9916d3\",\"path\":\"rbd:nvmepool/058e45ea-71d9-4338-8629-12b21389f028\",\"format\":\"raw\"}],\"Hypervisor\":\"kvm\",\"hostname\":\"yudao2\"},\"deploy_info\":{\"public_key\":{\"admin_public_key\":\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDG7U+zsDlTXjbDWg4/C0NElAGPJ2CXrs8dh89ftJFjPbB5W9ghrVoen4UTBBm6GqXc4hl5zGVM2zL2H31n85HfYgBo47uKFEKu9c4DpSdiTBf15zBEvhNZziOJ0FEhwglZ1WRvSKDd2+3AH23WMp++btcz/ruhbib2mdUW9nwfQj783Sl+WfJ9Ss6p3RthRtolDxrpSXAIP5KH41jwYvCLPMLBndh5sz3fHuB6AfpbjYgG++pBrhf0rtemj5f1ZtgbvQ5IlYs5L1QUcctA6BbzwlRPbaNvSaM6+hjiU3g7Fm68qmT+4uNBRVKqip0hBkMBJSW8A8ZUSLIvP4G4DDXF\\n\",\"project_public_key\":\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQxBHbbAyqBKf71sa4+xLV/9gTkZe7kIJgSyU+9ViGqfzN9B0TjBqL4pnZujHUl4Gch4EK9TGg3FtQNWTBHETRMaB4JVrjSpu4uXEYRj3EVVqJKCwwWNOoy4hj7eHmEaAFkw8CVNvBlJAPFXVXUIcPZplQQQI/Da5gUfZ8beGIlrhBWtz2Julw/5sxPiaENm2PPItiw6iZnPZ88/bZCvSHy0Cx2odZE3TJrN3H5Zob/3O09n8wCqPUrvMz9ibKb9z5iT0ANLnKtSCQW1xxIml5JlSFLEPPKFEyCdrE2mTsfPp7Gc+BUD9/KZy+8hih6gfS+dL1kK6OPOVfJLxDcjNJ\\n\"},\"is_init\":true,\"default_root_user\":true,\"windows_default_admin_user\":true,\"telegraf\":{\"telegraf_conf\":\"### MANAGED BY ansible-telegraf ANSIBLE ROLE ###\\n\\n[global_tags]\\n\\n    host = \\\"node6-172-16-1-219\\\"\\n    vm_id = \\\"289287d0-4d9c-4605-89f7-69dd878143a9\\\"\\n    zone = \\\"华南-广州\\\"\\n    tenant_id = \\\"2e152fe0619046a38081d7e487028358\\\"\\n    scaling_group_id = \\\"\\\"\\n    host_id = \\\"3605ec3c-d819-4bd4-8c7f-3f1188e808ac\\\"\\n    vm_name = \\\"yudao2\\\"\\n    zone_ext_id = \\\"\\\"\\n    tenant = \\\"system\\\"\\n    project_domain = \\\"Default\\\"\\n    os_type = \\\"Linux\\\"\\n    status = \\\"start_deploy\\\"\\n    cloudregion = \\\"Default\\\"\\n    cloudregion_id = \\\"default\\\"\\n    region_ext_id = \\\"\\\"\\n    vm_ip = \\\"172.16.1.92\\\"\\n    zone_id = \\\"7b6ae896-1b3d-40e5-879f-cfd00799200b\\\"\\n    brand = \\\"OneCloud\\\"\\n    domain_id = \\\"default\\\"\\n\\n# Configuration for telegraf agent\\n[agent]\\n    interval = \\\"60s\\\"\\n    debug = false\\n    hostname = \\\"\\\"\\n    round_interval = true\\n    flush_interval = \\\"60s\\\"\\n    flush_jitter = \\\"0s\\\"\\n    collection_jitter = \\\"0s\\\"\\n    metric_batch_size = 1000\\n    metric_buffer_limit = 10000\\n    quiet = false\\n    logfile = \\\"/var/log/telegraf.log\\\"\\n    logfile_rotation_max_size = \\\"10MB\\\"\\n    logfile_rotation_max_archives = 1\\n    omit_hostname = true\\n\\n###############################################################################\\n#                                  OUTPUTS                                    #\\n###############################################################################\\n\\n[[outputs.influxdb]]\\n    urls = [\\\"http://169.254.169.254/monitor\\\"]\\n    database = \\\"telegraf\\\"\\n    insecure_skip_verify = true\\n\\n###############################################################################\\n#                                  INPUTS                                     #\\n###############################################################################\\n[[inputs.cpu]]\\n    name_prefix = \\\"agent_\\\"\\n    percpu = true\\n    totalcpu = true\\n    collect_cpu_time = false\\n    report_active = true\\n[[inputs.disk]]\\n    name_prefix = \\\"agent_\\\"\\n    ignore_fs = [\\\"tmpfs\\\", \\\"devtmpfs\\\", \\\"overlay\\\", \\\"squashfs\\\", \\\"iso9660\\\"]\\n[[inputs.diskio]]\\n    name_prefix = \\\"agent_\\\"\\n    skip_serial_number = false\\n[[inputs.kernel]]\\n    name_prefix = \\\"agent_\\\"\\n[[inputs.kernel_vmstat]]\\n    name_prefix = \\\"agent_\\\"\\n[[inputs.mem]]\\n    name_prefix = \\\"agent_\\\"\\n[[inputs.processes]]\\n    name_prefix = \\\"agent_\\\"\\n[[inputs.swap]]\\n    name_prefix = \\\"agent_\\\"\\n[[inputs.system]]\\n    name_prefix = \\\"agent_\\\"\\n[[inputs.net]]\\n    name_prefix = \\\"agent_\\\"\\n[[inputs.netstat]]\\n    name_prefix = \\\"agent_\\\"\\n[[inputs.nstat]]\\n    name_prefix = \\\"agent_\\\"\\n[[inputs.internal]]\\n    name_prefix = \\\"agent_\\\"\\n    collect_memstats = false\\n\"}}}'" error: Process exited with status 2, cmd error: [info 240130 09:13:11 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/host.conf
fatal error: sync: unlock of unlocked mutex
[error 2024-01-30 12:53:01 guestman.(*SKVMGuestInstance).StartMonitor(qemu-kvm.go:824)] Guest 289287d0-4d9c-4605-89f7-69dd878143a9 start monitor failed, can't get qmp monitor port or monitor path
[error 2024-01-30 12:53:01 guestman.(*SKVMGuestInstance).StartMonitor(qemu-kvm.go:824)] Guest 289287d0-4d9c-4605-89f7-69dd878143a9 start monitor failed, can't get qmp monitor port or monitor path
[error 2024-01-30 12:53:03 cgrouputils.(*CGroupTask).createTask(cgrouputils.go:236)] mkdir /sys/fs/cgroup/memory/cloudpods.hostagent/server_289287d0-4d9c-4605-89f7-69dd878143a9_20338: no such file or directory
wanyaoqi commented 10 months ago

@chenjacken 感谢反馈,部署失败的问题我们看一下 上传镜像到镜像管理服务会做一次 qemu-img convert ,这个步骤可能会很慢,并且这个转换镜像的过程中 IO 比较重。

chenjacken commented 10 months ago

@chenjacken 感谢反馈,部署失败的问题我们看一下 上传镜像到镜像管理服务会做一次 qemu-img convert ,这个步骤可能会很慢,并且这个转换镜像的过程中 IO 比较重。

好的,谢谢!! 另外,状态比较慢的2个步骤是:缓存镜像分配磁盘: 1,缓存镜像把镜像文件从minio传到ceph吗?如果优化速度 2,分配磁盘,镜像已经缓存到ceph,分配理应也很快,但是这个状态的时间也比较长。

chenjacken commented 10 months ago

@chenjacken 感谢反馈,部署失败的问题我们看一下 上传镜像到镜像管理服务会做一次 qemu-img convert ,这个步骤可能会很慢,并且这个转换镜像的过程中 IO 比较重。

好的,谢谢!! 另外,状态比较慢的2个步骤是:缓存镜像分配磁盘: 1,缓存镜像把镜像文件从minio传到ceph吗?如果优化速度 2,分配磁盘,镜像已经缓存到ceph,分配理应也很快,但是这个状态的时间也比较长。

镜像文件是50G的大小 缓存镜像用了1个小时 分配磁盘用过了1个小时30分钟,结果显示部署失败了

然后同步状态,虚拟机显示关机,开机该虚拟机,就正常运行中

web显示的日志是:

deploying=>deploy_fail: {"__reason__":"Deploy guest fs: request deploy guest fs: rpc error: code = Unknown desc = run deploy_guest_fs failed []: \"/opt/yunion/bin/host-deployer --common-config-file /opt/yunion/common.conf --config /opt/yunion/host.conf --deploy-action deploy_guest_fs --deploy-params '{\\\"disk_info\\\":{\\\"path\\\":\\\"rbd:nvmepool/2c87c588-0f36-477b-8ee7-4818c8d585f9:mon_host=172.16.1.216\\\\\\\\;172.16.1.218\\\\\\\\;172.16.1.217:key=AQBz5pZlFX41OBAAJqPwV73/Zxc0nKEjdGb0uw\\\\\\\\=\\\\\\\\=:rados_mon_op_timeout=5:rados_osd_op_timeout=1200:client_mount_timeout=120\\\"},\\\"guest_desc\\\":{\\\"name\\\":\\\"yudao4\\\",\\\"uuid\\\":\\\"584e3a8d-6780-4578-831e-44dcbcd99ca6\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"nics\\\":[{\\\"mac\\\":\\\"00:22:fe:af:dc:53\\\",\\\"ip\\\":\\\"172.16.1.94\\\",\\\"net\\\":\\\"static\\\",\\\"net_id\\\":\\\"ce676c71-febf-4ca1-8ecf-6add3aa5215e\\\",\\\"gateway\\\":\\\"172.16.1.1\\\",\\\"dns\\\":\\\"172.16.1.200\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"ifname\\\":\\\"static-94\\\",\\\"masklen\\\":24,\\\"driver\\\":\\\"virtio\\\",\\\"bridge\\\":\\\"br1\\\",\\\"wire_id\\\":\\\"2a4e5367-e4c5-4410-81dd-217698d99ff2\\\",\\\"vlan\\\":1,\\\"interface\\\":\\\"bond0\\\",\\\"bw\\\":1000,\\\"mtu\\\":1500}],\\\"disks\\\":[{\\\"disk_id\\\":\\\"2c87c588-0f36-477b-8ee7-4818c8d585f9\\\",\\\"driver\\\":\\\"scsi\\\",\\\"cache_mode\\\":\\\"none\\\",\\\"aio_mode\\\":\\\"native\\\",\\\"size\\\":51200,\\\"template_id\\\":\\\"5fd11cb5-ad0a-419e-8ba0-a77d009d60d6\\\",\\\"storage_id\\\":\\\"1b298235-a82f-4579-8b7a-e6dd2d9916d3\\\",\\\"path\\\":\\\"rbd:nvmepool/2c87c588-0f36-477b-8ee7-4818c8d585f9\\\",\\\"format\\\":\\\"raw\\\"}],\\\"Hypervisor\\\":\\\"kvm\\\",\\\"hostname\\\":\\\"yudao4\\\"},\\\"deploy_info\\\":{\\\"public_key\\\":{\\\"admin_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDG7U+zsDlTXjbDWg4/C0NElAGPJ2CXrs8dh89ftJFjPbB5W9ghrVoen4UTBBm6GqXc4hl5zGVM2zL2H31n85HfYgBo47uKFEKu9c4DpSdiTBf15zBEvhNZziOJ0FEhwglZ1WRvSKDd2+3AH23WMp++btcz/ruhbib2mdUW9nwfQj783Sl+WfJ9Ss6p3RthRtolDxrpSXAIP5KH41jwYvCLPMLBndh5sz3fHuB6AfpbjYgG++pBrhf0rtemj5f1ZtgbvQ5IlYs5L1QUcctA6BbzwlRPbaNvSaM6+hjiU3g7Fm68qmT+4uNBRVKqip0hBkMBJSW8A8ZUSLIvP4G4DDXF\\\\n\\\",\\\"project_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQxBHbbAyqBKf71sa4+xLV/9gTkZe7kIJgSyU+9ViGqfzN9B0TjBqL4pnZujHUl4Gch4EK9TGg3FtQNWTBHETRMaB4JVrjSpu4uXEYRj3EVVqJKCwwWNOoy4hj7eHmEaAFkw8CVNvBlJAPFXVXUIcPZplQQQI/Da5gUfZ8beGIlrhBWtz2Julw/5sxPiaENm2PPItiw6iZnPZ88/bZCvSHy0Cx2odZE3TJrN3H5Zob/3O09n8wCqPUrvMz9ibKb9z5iT0ANLnKtSCQW1xxIml5JlSFLEPPKFEyCdrE2mTsfPp7Gc+BUD9/KZy+8hih6gfS+dL1kK6OPOVfJLxDcjNJ\\\\n\\\"},\\\"is_init\\\":true,\\\"default_root_user\\\":true,\\\"windows_default_admin_user\\\":true,\\\"telegraf\\\":{\\\"telegraf_conf\\\":\\\"### MANAGED BY ansible-telegraf ANSIBLE ROLE ###\\\\n\\\\n[global_tags]\\\\n\\\\n    os_type = \\\\\\\"Linux\\\\\\\"\\\\n    status = \\\\\\\"start_deploy\\\\\\\"\\\\n    tenant_id = \\\\\\\"2e152fe0619046a38081d7e487028358\\\\\\\"\\\\n    scaling_group_id = \\\\\\\"\\\\\\\"\\\\n    domain_id = \\\\\\\"default\\\\\\\"\\\\n    vm_name = \\\\\\\"yudao4\\\\\\\"\\\\n    zone = \\\\\\\"华南-广州\\\\\\\"\\\\n    zone_id = \\\\\\\"7b6ae896-1b3d-40e5-879f-cfd00799200b\\\\\\\"\\\\n    tenant = \\\\\\\"system\\\\\\\"\\\\n    host = \\\\\\\"node5-172-16-1-218\\\\\\\"\\\\n    host_id = \\\\\\\"a97714c5-543d-40ce-8098-414f4fbb9e25\\\\\\\"\\\\n    vm_ip = \\\\\\\"172.16.1.94\\\\\\\"\\\\n    region_ext_id = \\\\\\\"\\\\\\\"\\\\n    brand = \\\\\\\"OneCloud\\\\\\\"\\\\n    project_domain = \\\\\\\"Default\\\\\\\"\\\\n    vm_id = \\\\\\\"584e3a8d-6780-4578-831e-44dcbcd99ca6\\\\\\\"\\\\n    zone_ext_id = \\\\\\\"\\\\\\\"\\\\n    cloudregion = \\\\\\\"Default\\\\\\\"\\\\n    cloudregion_id = \\\\\\\"default\\\\\\\"\\\\n\\\\n# Configuration for telegraf agent\\\\n[agent]\\\\n    interval = \\\\\\\"60s\\\\\\\"\\\\n    debug = false\\\\n    hostname = \\\\\\\"\\\\\\\"\\\\n    round_interval = true\\\\n    flush_interval = \\\\\\\"60s\\\\\\\"\\\\n    flush_jitter = \\\\\\\"0s\\\\\\\"\\\\n    collection_jitter = \\\\\\\"0s\\\\\\\"\\\\n    metric_batch_size = 1000\\\\n    metric_buffer_limit = 10000\\\\n    quiet = false\\\\n    logfile = \\\\\\\"/var/log/telegraf.log\\\\\\\"\\\\n    logfile_rotation_max_size = \\\\\\\"10MB\\\\\\\"\\\\n    logfile_rotation_max_archives = 1\\\\n    omit_hostname = true\\\\n\\\\n###############################################################################\\\\n#                                  OUTPUTS                                    #\\\\n###############################################################################\\\\n\\\\n[[outputs.influxdb]]\\\\n    urls = [\\\\\\\"http://169.254.169.254/monitor\\\\\\\"]\\\\n    database = \\\\\\\"telegraf\\\\\\\"\\\\n    insecure_skip_verify = true\\\\n\\\\n###############################################################################\\\\n#                                  INPUTS                                     #\\\\n###############################################################################\\\\n[[inputs.cpu]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    percpu = true\\\\n    totalcpu = true\\\\n    collect_cpu_time = false\\\\n    report_active = true\\\\n[[inputs.disk]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    ignore_fs = [\\\\\\\"tmpfs\\\\\\\", \\\\\\\"devtmpfs\\\\\\\", \\\\\\\"overlay\\\\\\\", \\\\\\\"squashfs\\\\\\\", \\\\\\\"iso9660\\\\\\\"]\\\\n[[inputs.diskio]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    skip_serial_number = false\\\\n[[inputs.kernel]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.kernel_vmstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.mem]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.processes]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.swap]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.system]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.net]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.netstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.nstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.internal]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    collect_memstats = false\\\\n\\\"}}}'\" error: Process exited with status 2, cmd error: [info 240131 03:58:36 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/host.conf\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-mapped-bridge\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-underlay-mtu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-delay-seconds\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument min-migrate-timeout-seconds\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-health-timeout\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-reserved-memory\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument set-vnc-password\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument zero-clean-disk-data\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-switch-vms\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-kvm\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile-keep-days\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-allow-conntrack-invalid\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tap-man\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-socket-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tunnel-padding-bytes\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-temp-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bridge-driver\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-image-save-format\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-renewal-time\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-custom-device\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-limit\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-iops-per-cpu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-config-file\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-qemu-debug-log\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument image-cache-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument windows-default-admin-user\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bw-download-bandwidth\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-use-tls\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument report-interval\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ethtool-enable-gso\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ping-region-interval\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-lease-timeout\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bandwidth-limit\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-cpu-binding\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-eip-man\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument servers-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument block-io-scheduler\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-iops-per-cpu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-gpu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-telegraf\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument always-recycle-diskfile\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-bps-per-cpu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-usb\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-vm-uuid\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-recycle-day\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-openflow-controller\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-block-size\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument kubelet-run-directory\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument migrate-expect-rate\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-encap-ip\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument slots\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-set-cgroup\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-router-vms\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument binary-memclean-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-type\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-bps-per-cpu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-integration-bridge\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument pcie-root-port-count\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-probe-kubelet\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-monitor\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-template-backing\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-south-database\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument rack\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument linux-default-root-user\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-hotplug-vcpu-count\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-guest-man\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovmf-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-lease-time\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument restrict-qemu-img-convert-worker\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tap-bridge-name\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-server-port\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-image-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument memory-snapshots-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-pid-file\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument use-boot-vga\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-live-migrate-downtime\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-ksm\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-eip-bridge\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-dir-suffix\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument check-system-services\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-virtio-rng-device\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-storage-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-request-worker-count\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tc-man\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sync-storage-info-duration-second\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-backing-template\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-fallocate-disk\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-skip-tls-verify\n[info 240131 03:58:36 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-01-31 03:58:36 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/common.conf\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-qemu-version\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[info 2024-01-31 03:58:36 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-01-31 03:58:36 procutils.WaitZombieLoop(zombie_others.go:36)] My pid is not 1 and no need to wait zombies\n[info 2024-01-31 03:58:36 deployserver.(*SDeployService).InitService(deployserver.go:454)] exec socket path: /var/run/onecloud/exec.sock\nfatal error: sync: unlock of unlocked mutex\n\ngoroutine 1 [running]:\nruntime.throw({0x111a4d0?, 0xc0001d8380?})\n\t/opt/go/src/runtime/panic.go:992 +0x71 fp=0xc0008e5928 sp=0xc0008e58f8 pc=0x4379d1\nsync.throw({0x111a4d0?, 0xf19900?})\n\t/opt/go/src/runtime/panic.go:978 +0x1e fp=0xc0008e5948 sp=0xc0008e5928 pc=0x4656de\nsync.(*Mutex).unlockSlow(0xc0003b3a70, 0xffffffff)\n\t/opt/go/src/sync/mutex.go:220 +0x3c fp=0xc0008e5970 sp=0xc0008e5948 pc=0x474c1c\nsync.(*Mutex).Unlock(...)\n\t/opt/go/src/sync/mutex.go:214\nyunion.io/x/onecloud/pkg/util/xfsutils.UnlockXfsPartition({0xc00073e2b1, 0x24})\n\t/root/go/src/yunion.io/x/onecloud/pkg/util/xfsutils/lock.go:48 +0xf4 fp=0xc0008e59d0 sp=0xc0008e5970 pc=0xd1a8d4\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:283 +0x45 fp=0xc0008e59f0 sp=0xc0008e59d0 pc=0xd27945\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount(0xc00007ac60)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:303 +0x699 fp=0xc0008e5b88 sp=0xc0008e59f0 pc=0xd277d9\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).UmountRootfs(0xc00023f180?, {0x12c14a0?, 0xc000010120?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:117 +0x3b fp=0xc0008e5ba0 sp=0xc0008e5b88 pc=0xe7a7fb\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:475 +0x36 fp=0xc0008e5bc8 sp=0xc0008e5ba0 pc=0xd20a16\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs({0x12bdd48, 0xc0006ce840}, 0xc0007000f0)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:485 +0x366 fp=0xc0008e5c90 sp=0xc0008e5bc8 pc=0xd20906\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).DeployGuestfs(0xc0006ce840?, 0x0?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:123 +0x26 fp=0xc0008e5cb8 sp=0xc0008e5c90 pc=0xe7a866\nyunion.io/x/onecloud/pkg/hostman/diskutils.(*SKVMGuestDisk).DeployGuestfs(...)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/kvm.go:144\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*LocalDeploy).DeployGuestFs(0xc000718000?, 0xc0007000f0)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:46 +0x184 fp=0xc0008e5d88 sp=0xc0008e5cb8 pc=0xe85c84\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.StartLocalDeploy({0x7ffc852bbcba?, 0x4?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:130 +0x2a8 fp=0xc0008e5de8 sp=0xc0008e5d88 pc=0xe86dc8\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*SDeployService).RunService(0xc00017d000?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/deployserver.go:266 +0x5b fp=0xc0008e5ed0 sp=0xc0008e5de8 pc=0xe83cfb\nyunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc0000af458)\n\t/root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xfa fp=0xc0008e5f50 sp=0xc0008e5ed0 pc=0xb70e5a\nmain.main()\n\t/root/go/src/yunion.io/x/onecloud/cmd/host-deployer/main.go:28 +0xe5 fp=0xc0008e5f80 sp=0xc0008e5f50 pc=0xe87625\nruntime.main()\n\t/opt/go/src/runtime/proc.go:250 +0x212 fp=0xc0008e5fe0 sp=0xc0008e5f80 pc=0x43a0f2\nruntime.goexit()\n\t/opt/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0008e5fe8 sp=0xc0008e5fe0 pc=0x46aa61\n\ngoroutine 6 [chan receive, 1 minutes]:\nyunion.io/x/pkg/util/signalutils.StartTrap.func1()\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:72 +0xa7\ncreated by yunion.io/x/pkg/util/signalutils.StartTrap\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:62 +0xd4\n\ngoroutine 23 [syscall, 1 minutes]:\nos/signal.signal_recv()\n\t/opt/go/src/runtime/sigqueue.go:151 +0x2f\nos/signal.loop()\n\t/opt/go/src/os/signal/signal_unix.go:23 +0x19\ncreated by os/signal.Notify.func1.1\n\t/opt/go/src/os/signal/signal.go:151 +0x2a\n\ngoroutine 15 [chan send, 1 minutes]:\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2.1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:189 +0x24b\ncreated by yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:186 +0xe5\n","__stage__":"OnDeployGuestComplete","__status__":"error"}
{
    "__reason__": "Deploy guest fs: request deploy guest fs: rpc error: code = Unknown desc = run deploy_guest_fs failed []: \"/opt/yunion/bin/host-deployer --common-config-file /opt/yunion/common.conf --config /opt/yunion/host.conf --deploy-action deploy_guest_fs --deploy-params '{\\\"disk_info\\\":{\\\"path\\\":\\\"rbd:nvmepool/2c87c588-0f36-477b-8ee7-4818c8d585f9:mon_host=172.16.1.216\\\\\\\\;172.16.1.218\\\\\\\\;172.16.1.217:key=AQBz5pZlFX41OBAAJqPwV73/Zxc0nKEjdGb0uw\\\\\\\\=\\\\\\\\=:rados_mon_op_timeout=5:rados_osd_op_timeout=1200:client_mount_timeout=120\\\"},\\\"guest_desc\\\":{\\\"name\\\":\\\"yudao4\\\",\\\"uuid\\\":\\\"584e3a8d-6780-4578-831e-44dcbcd99ca6\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"nics\\\":[{\\\"mac\\\":\\\"00:22:fe:af:dc:53\\\",\\\"ip\\\":\\\"172.16.1.94\\\",\\\"net\\\":\\\"static\\\",\\\"net_id\\\":\\\"ce676c71-febf-4ca1-8ecf-6add3aa5215e\\\",\\\"gateway\\\":\\\"172.16.1.1\\\",\\\"dns\\\":\\\"172.16.1.200\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"ifname\\\":\\\"static-94\\\",\\\"masklen\\\":24,\\\"driver\\\":\\\"virtio\\\",\\\"bridge\\\":\\\"br1\\\",\\\"wire_id\\\":\\\"2a4e5367-e4c5-4410-81dd-217698d99ff2\\\",\\\"vlan\\\":1,\\\"interface\\\":\\\"bond0\\\",\\\"bw\\\":1000,\\\"mtu\\\":1500}],\\\"disks\\\":[{\\\"disk_id\\\":\\\"2c87c588-0f36-477b-8ee7-4818c8d585f9\\\",\\\"driver\\\":\\\"scsi\\\",\\\"cache_mode\\\":\\\"none\\\",\\\"aio_mode\\\":\\\"native\\\",\\\"size\\\":51200,\\\"template_id\\\":\\\"5fd11cb5-ad0a-419e-8ba0-a77d009d60d6\\\",\\\"storage_id\\\":\\\"1b298235-a82f-4579-8b7a-e6dd2d9916d3\\\",\\\"path\\\":\\\"rbd:nvmepool/2c87c588-0f36-477b-8ee7-4818c8d585f9\\\",\\\"format\\\":\\\"raw\\\"}],\\\"Hypervisor\\\":\\\"kvm\\\",\\\"hostname\\\":\\\"yudao4\\\"},\\\"deploy_info\\\":{\\\"public_key\\\":{\\\"admin_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDG7U+zsDlTXjbDWg4/C0NElAGPJ2CXrs8dh89ftJFjPbB5W9ghrVoen4UTBBm6GqXc4hl5zGVM2zL2H31n85HfYgBo47uKFEKu9c4DpSdiTBf15zBEvhNZziOJ0FEhwglZ1WRvSKDd2+3AH23WMp++btcz/ruhbib2mdUW9nwfQj783Sl+WfJ9Ss6p3RthRtolDxrpSXAIP5KH41jwYvCLPMLBndh5sz3fHuB6AfpbjYgG++pBrhf0rtemj5f1ZtgbvQ5IlYs5L1QUcctA6BbzwlRPbaNvSaM6+hjiU3g7Fm68qmT+4uNBRVKqip0hBkMBJSW8A8ZUSLIvP4G4DDXF\\\\n\\\",\\\"project_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQxBHbbAyqBKf71sa4+xLV/9gTkZe7kIJgSyU+9ViGqfzN9B0TjBqL4pnZujHUl4Gch4EK9TGg3FtQNWTBHETRMaB4JVrjSpu4uXEYRj3EVVqJKCwwWNOoy4hj7eHmEaAFkw8CVNvBlJAPFXVXUIcPZplQQQI/Da5gUfZ8beGIlrhBWtz2Julw/5sxPiaENm2PPItiw6iZnPZ88/bZCvSHy0Cx2odZE3TJrN3H5Zob/3O09n8wCqPUrvMz9ibKb9z5iT0ANLnKtSCQW1xxIml5JlSFLEPPKFEyCdrE2mTsfPp7Gc+BUD9/KZy+8hih6gfS+dL1kK6OPOVfJLxDcjNJ\\\\n\\\"},\\\"is_init\\\":true,\\\"default_root_user\\\":true,\\\"windows_default_admin_user\\\":true,\\\"telegraf\\\":{\\\"telegraf_conf\\\":\\\"### MANAGED BY ansible-telegraf ANSIBLE ROLE ###\\\\n\\\\n[global_tags]\\\\n\\\\n    os_type = \\\\\\\"Linux\\\\\\\"\\\\n    status = \\\\\\\"start_deploy\\\\\\\"\\\\n    tenant_id = \\\\\\\"2e152fe0619046a38081d7e487028358\\\\\\\"\\\\n    scaling_group_id = \\\\\\\"\\\\\\\"\\\\n    domain_id = \\\\\\\"default\\\\\\\"\\\\n    vm_name = \\\\\\\"yudao4\\\\\\\"\\\\n    zone = \\\\\\\"华南-广州\\\\\\\"\\\\n    zone_id = \\\\\\\"7b6ae896-1b3d-40e5-879f-cfd00799200b\\\\\\\"\\\\n    tenant = \\\\\\\"system\\\\\\\"\\\\n    host = \\\\\\\"node5-172-16-1-218\\\\\\\"\\\\n    host_id = \\\\\\\"a97714c5-543d-40ce-8098-414f4fbb9e25\\\\\\\"\\\\n    vm_ip = \\\\\\\"172.16.1.94\\\\\\\"\\\\n    region_ext_id = \\\\\\\"\\\\\\\"\\\\n    brand = \\\\\\\"OneCloud\\\\\\\"\\\\n    project_domain = \\\\\\\"Default\\\\\\\"\\\\n    vm_id = \\\\\\\"584e3a8d-6780-4578-831e-44dcbcd99ca6\\\\\\\"\\\\n    zone_ext_id = \\\\\\\"\\\\\\\"\\\\n    cloudregion = \\\\\\\"Default\\\\\\\"\\\\n    cloudregion_id = \\\\\\\"default\\\\\\\"\\\\n\\\\n# Configuration for telegraf agent\\\\n[agent]\\\\n    interval = \\\\\\\"60s\\\\\\\"\\\\n    debug = false\\\\n    hostname = \\\\\\\"\\\\\\\"\\\\n    round_interval = true\\\\n    flush_interval = \\\\\\\"60s\\\\\\\"\\\\n    flush_jitter = \\\\\\\"0s\\\\\\\"\\\\n    collection_jitter = \\\\\\\"0s\\\\\\\"\\\\n    metric_batch_size = 1000\\\\n    metric_buffer_limit = 10000\\\\n    quiet = false\\\\n    logfile = \\\\\\\"/var/log/telegraf.log\\\\\\\"\\\\n    logfile_rotation_max_size = \\\\\\\"10MB\\\\\\\"\\\\n    logfile_rotation_max_archives = 1\\\\n    omit_hostname = true\\\\n\\\\n###############################################################################\\\\n#                                  OUTPUTS                                    #\\\\n###############################################################################\\\\n\\\\n[[outputs.influxdb]]\\\\n    urls = [\\\\\\\"http://169.254.169.254/monitor\\\\\\\"]\\\\n    database = \\\\\\\"telegraf\\\\\\\"\\\\n    insecure_skip_verify = true\\\\n\\\\n###############################################################################\\\\n#                                  INPUTS                                     #\\\\n###############################################################################\\\\n[[inputs.cpu]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    percpu = true\\\\n    totalcpu = true\\\\n    collect_cpu_time = false\\\\n    report_active = true\\\\n[[inputs.disk]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    ignore_fs = [\\\\\\\"tmpfs\\\\\\\", \\\\\\\"devtmpfs\\\\\\\", \\\\\\\"overlay\\\\\\\", \\\\\\\"squashfs\\\\\\\", \\\\\\\"iso9660\\\\\\\"]\\\\n[[inputs.diskio]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    skip_serial_number = false\\\\n[[inputs.kernel]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.kernel_vmstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.mem]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.processes]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.swap]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.system]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.net]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.netstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.nstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.internal]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    collect_memstats = false\\\\n\\\"}}}'\" error: Process exited with status 2, cmd error: [info 240131 03:58:36 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/host.conf\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-mapped-bridge\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-underlay-mtu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-delay-seconds\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument min-migrate-timeout-seconds\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-health-timeout\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-reserved-memory\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument set-vnc-password\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument zero-clean-disk-data\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-switch-vms\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-kvm\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile-keep-days\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-allow-conntrack-invalid\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tap-man\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-socket-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tunnel-padding-bytes\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-temp-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bridge-driver\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-image-save-format\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-renewal-time\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-custom-device\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-limit\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-iops-per-cpu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-config-file\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-qemu-debug-log\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument image-cache-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument windows-default-admin-user\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bw-download-bandwidth\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-use-tls\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument report-interval\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ethtool-enable-gso\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ping-region-interval\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-lease-timeout\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bandwidth-limit\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-cpu-binding\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-eip-man\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument servers-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument block-io-scheduler\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-iops-per-cpu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-gpu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-telegraf\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument always-recycle-diskfile\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-bps-per-cpu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-usb\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-vm-uuid\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-recycle-day\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-openflow-controller\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-block-size\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument kubelet-run-directory\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument migrate-expect-rate\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-encap-ip\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument slots\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-set-cgroup\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-router-vms\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument binary-memclean-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-type\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-bps-per-cpu\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-integration-bridge\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument pcie-root-port-count\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-probe-kubelet\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-monitor\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-template-backing\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-south-database\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument rack\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument linux-default-root-user\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-hotplug-vcpu-count\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-guest-man\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovmf-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-lease-time\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument restrict-qemu-img-convert-worker\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tap-bridge-name\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-server-port\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-image-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument memory-snapshots-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-pid-file\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument use-boot-vga\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-live-migrate-downtime\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-ksm\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-eip-bridge\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-dir-suffix\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument check-system-services\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-virtio-rng-device\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-storage-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-request-worker-count\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tc-man\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sync-storage-info-duration-second\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-path\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-backing-template\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-fallocate-disk\n[warning 240131 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-skip-tls-verify\n[info 240131 03:58:36 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-01-31 03:58:36 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/common.conf\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-qemu-version\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 2024-01-31 03:58:36 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[info 2024-01-31 03:58:36 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-01-31 03:58:36 procutils.WaitZombieLoop(zombie_others.go:36)] My pid is not 1 and no need to wait zombies\n[info 2024-01-31 03:58:36 deployserver.(*SDeployService).InitService(deployserver.go:454)] exec socket path: /var/run/onecloud/exec.sock\nfatal error: sync: unlock of unlocked mutex\n\ngoroutine 1 [running]:\nruntime.throw({0x111a4d0?, 0xc0001d8380?})\n\t/opt/go/src/runtime/panic.go:992 +0x71 fp=0xc0008e5928 sp=0xc0008e58f8 pc=0x4379d1\nsync.throw({0x111a4d0?, 0xf19900?})\n\t/opt/go/src/runtime/panic.go:978 +0x1e fp=0xc0008e5948 sp=0xc0008e5928 pc=0x4656de\nsync.(*Mutex).unlockSlow(0xc0003b3a70, 0xffffffff)\n\t/opt/go/src/sync/mutex.go:220 +0x3c fp=0xc0008e5970 sp=0xc0008e5948 pc=0x474c1c\nsync.(*Mutex).Unlock(...)\n\t/opt/go/src/sync/mutex.go:214\nyunion.io/x/onecloud/pkg/util/xfsutils.UnlockXfsPartition({0xc00073e2b1, 0x24})\n\t/root/go/src/yunion.io/x/onecloud/pkg/util/xfsutils/lock.go:48 +0xf4 fp=0xc0008e59d0 sp=0xc0008e5970 pc=0xd1a8d4\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:283 +0x45 fp=0xc0008e59f0 sp=0xc0008e59d0 pc=0xd27945\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount(0xc00007ac60)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:303 +0x699 fp=0xc0008e5b88 sp=0xc0008e59f0 pc=0xd277d9\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).UmountRootfs(0xc00023f180?, {0x12c14a0?, 0xc000010120?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:117 +0x3b fp=0xc0008e5ba0 sp=0xc0008e5b88 pc=0xe7a7fb\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:475 +0x36 fp=0xc0008e5bc8 sp=0xc0008e5ba0 pc=0xd20a16\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs({0x12bdd48, 0xc0006ce840}, 0xc0007000f0)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:485 +0x366 fp=0xc0008e5c90 sp=0xc0008e5bc8 pc=0xd20906\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).DeployGuestfs(0xc0006ce840?, 0x0?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:123 +0x26 fp=0xc0008e5cb8 sp=0xc0008e5c90 pc=0xe7a866\nyunion.io/x/onecloud/pkg/hostman/diskutils.(*SKVMGuestDisk).DeployGuestfs(...)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/kvm.go:144\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*LocalDeploy).DeployGuestFs(0xc000718000?, 0xc0007000f0)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:46 +0x184 fp=0xc0008e5d88 sp=0xc0008e5cb8 pc=0xe85c84\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.StartLocalDeploy({0x7ffc852bbcba?, 0x4?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:130 +0x2a8 fp=0xc0008e5de8 sp=0xc0008e5d88 pc=0xe86dc8\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*SDeployService).RunService(0xc00017d000?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/deployserver.go:266 +0x5b fp=0xc0008e5ed0 sp=0xc0008e5de8 pc=0xe83cfb\nyunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc0000af458)\n\t/root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xfa fp=0xc0008e5f50 sp=0xc0008e5ed0 pc=0xb70e5a\nmain.main()\n\t/root/go/src/yunion.io/x/onecloud/cmd/host-deployer/main.go:28 +0xe5 fp=0xc0008e5f80 sp=0xc0008e5f50 pc=0xe87625\nruntime.main()\n\t/opt/go/src/runtime/proc.go:250 +0x212 fp=0xc0008e5fe0 sp=0xc0008e5f80 pc=0x43a0f2\nruntime.goexit()\n\t/opt/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0008e5fe8 sp=0xc0008e5fe0 pc=0x46aa61\n\ngoroutine 6 [chan receive, 1 minutes]:\nyunion.io/x/pkg/util/signalutils.StartTrap.func1()\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:72 +0xa7\ncreated by yunion.io/x/pkg/util/signalutils.StartTrap\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:62 +0xd4\n\ngoroutine 23 [syscall, 1 minutes]:\nos/signal.signal_recv()\n\t/opt/go/src/runtime/sigqueue.go:151 +0x2f\nos/signal.loop()\n\t/opt/go/src/os/signal/signal_unix.go:23 +0x19\ncreated by os/signal.Notify.func1.1\n\t/opt/go/src/os/signal/signal.go:151 +0x2a\n\ngoroutine 15 [chan send, 1 minutes]:\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2.1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:189 +0x24b\ncreated by yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:186 +0xe5\n",
    "__stage__": "OnDeployGuestComplete",
    "__status__": "error"
}
wanyaoqi commented 10 months ago

@chenjacken 感谢反馈,部署失败的问题我们看一下 上传镜像到镜像管理服务会做一次 qemu-img convert ,这个步骤可能会很慢,并且这个转换镜像的过程中 IO 比较重。

好的,谢谢!! 另外,状态比较慢的2个步骤是:缓存镜像分配磁盘: 1,缓存镜像把镜像文件从minio传到ceph吗?如果优化速度 2,分配磁盘,镜像已经缓存到ceph,分配理应也很快,但是这个状态的时间也比较长。

@chenjacken 镜像传的 ceph 这个过程是比较长的时间,取决于两边的 io 速度,这个动作也是一次性的,接下来创建应不会有这个问题了。缓存到 ceph 后创建应该是很快的,应该是部署异常了,这个貌似是 xfs 会有这个问题,我在我们环境复现一下

chenjacken commented 10 months ago

谢谢,辛苦了!🌹

chenjacken commented 10 months ago

宿主机宕机自动迁移时候,也会出错:

start_migrate=>migrate_failed: "{\"error\":{\"class\":\"ClientError\",\"code\":499,\"details\":\"Post \\\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\\\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\",\"request\":{\"body\":\"{\\\"desc\\\":{\\\"bios\\\":\\\"BIOS\\\",\\\"boot_order\\\":\\\"cdn\\\",\\\"cpu\\\":4,\\\"disks\\\":[{\\\"aio_mode\\\":\\\"native\\\",\\\"boot_index\\\":-1,\\\"bps...src_memory_snapshots\\\":[]}\",\"headers\":{\"Content-Length\":\"6678\",\"Content-Type\":\"application/json\",\"User-Agent\":\"yunioncloud-go/201708\",\"X-Auth-Token\":\"*\",\"X-Region-Version\":\"v2\",\"X-Task-Id\":\"e1defafc-849a-44f2-886d-9cfc16870d4d\",\"X-Task-Notify-Url\":\"https://default-region:30888/tasks/e1defafc-849a-44f2-886d-9cfc16870d4d\"},\"method\":\"POST\",\"url\":\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\"}}}"
"{\"error\":{\"class\":\"ClientError\",\"code\":499,\"details\":\"Post \\\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\\\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\",\"request\":{\"body\":\"{\\\"desc\\\":{\\\"bios\\\":\\\"BIOS\\\",\\\"boot_order\\\":\\\"cdn\\\",\\\"cpu\\\":4,\\\"disks\\\":[{\\\"aio_mode\\\":\\\"native\\\",\\\"boot_index\\\":-1,\\\"bps...src_memory_snapshots\\\":[]}\",\"headers\":{\"Content-Length\":\"6678\",\"Content-Type\":\"application/json\",\"User-Agent\":\"yunioncloud-go/201708\",\"X-Auth-Token\":\"*\",\"X-Region-Version\":\"v2\",\"X-Task-Id\":\"e1defafc-849a-44f2-886d-9cfc16870d4d\",\"X-Task-Notify-Url\":\"https://default-region:30888/tasks/e1defafc-849a-44f2-886d-9cfc16870d4d\"},\"method\":\"POST\",\"url\":\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\"}}}"

是在执行某些命令时候出错或者超时Client.Timeout exceeded while awaiting headers ?

wanyaoqi commented 10 months ago

宿主机宕机自动迁移时候,也会出错:

start_migrate=>migrate_failed: "{\"error\":{\"class\":\"ClientError\",\"code\":499,\"details\":\"Post \\\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\\\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\",\"request\":{\"body\":\"{\\\"desc\\\":{\\\"bios\\\":\\\"BIOS\\\",\\\"boot_order\\\":\\\"cdn\\\",\\\"cpu\\\":4,\\\"disks\\\":[{\\\"aio_mode\\\":\\\"native\\\",\\\"boot_index\\\":-1,\\\"bps...src_memory_snapshots\\\":[]}\",\"headers\":{\"Content-Length\":\"6678\",\"Content-Type\":\"application/json\",\"User-Agent\":\"yunioncloud-go/201708\",\"X-Auth-Token\":\"*\",\"X-Region-Version\":\"v2\",\"X-Task-Id\":\"e1defafc-849a-44f2-886d-9cfc16870d4d\",\"X-Task-Notify-Url\":\"https://default-region:30888/tasks/e1defafc-849a-44f2-886d-9cfc16870d4d\"},\"method\":\"POST\",\"url\":\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\"}}}"
"{\"error\":{\"class\":\"ClientError\",\"code\":499,\"details\":\"Post \\\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\\\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\",\"request\":{\"body\":\"{\\\"desc\\\":{\\\"bios\\\":\\\"BIOS\\\",\\\"boot_order\\\":\\\"cdn\\\",\\\"cpu\\\":4,\\\"disks\\\":[{\\\"aio_mode\\\":\\\"native\\\",\\\"boot_index\\\":-1,\\\"bps...src_memory_snapshots\\\":[]}\",\"headers\":{\"Content-Length\":\"6678\",\"Content-Type\":\"application/json\",\"User-Agent\":\"yunioncloud-go/201708\",\"X-Auth-Token\":\"*\",\"X-Region-Version\":\"v2\",\"X-Task-Id\":\"e1defafc-849a-44f2-886d-9cfc16870d4d\",\"X-Task-Notify-Url\":\"https://default-region:30888/tasks/e1defafc-849a-44f2-886d-9cfc16870d4d\"},\"method\":\"POST\",\"url\":\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\"}}}"

是在执行某些命令时候出错或者超时Client.Timeout exceeded while awaiting headers ?

@chenjacken 看这个报错是迁移的目的端机器链接失败,目的端宿主机是否正常?

chenjacken commented 10 months ago

宿主机宕机自动迁移时候,也会出错:

start_migrate=>migrate_failed: "{\"error\":{\"class\":\"ClientError\",\"code\":499,\"details\":\"Post \\\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\\\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\",\"request\":{\"body\":\"{\\\"desc\\\":{\\\"bios\\\":\\\"BIOS\\\",\\\"boot_order\\\":\\\"cdn\\\",\\\"cpu\\\":4,\\\"disks\\\":[{\\\"aio_mode\\\":\\\"native\\\",\\\"boot_index\\\":-1,\\\"bps...src_memory_snapshots\\\":[]}\",\"headers\":{\"Content-Length\":\"6678\",\"Content-Type\":\"application/json\",\"User-Agent\":\"yunioncloud-go/201708\",\"X-Auth-Token\":\"*\",\"X-Region-Version\":\"v2\",\"X-Task-Id\":\"e1defafc-849a-44f2-886d-9cfc16870d4d\",\"X-Task-Notify-Url\":\"https://default-region:30888/tasks/e1defafc-849a-44f2-886d-9cfc16870d4d\"},\"method\":\"POST\",\"url\":\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\"}}}"
"{\"error\":{\"class\":\"ClientError\",\"code\":499,\"details\":\"Post \\\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\\\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\",\"request\":{\"body\":\"{\\\"desc\\\":{\\\"bios\\\":\\\"BIOS\\\",\\\"boot_order\\\":\\\"cdn\\\",\\\"cpu\\\":4,\\\"disks\\\":[{\\\"aio_mode\\\":\\\"native\\\",\\\"boot_index\\\":-1,\\\"bps...src_memory_snapshots\\\":[]}\",\"headers\":{\"Content-Length\":\"6678\",\"Content-Type\":\"application/json\",\"User-Agent\":\"yunioncloud-go/201708\",\"X-Auth-Token\":\"*\",\"X-Region-Version\":\"v2\",\"X-Task-Id\":\"e1defafc-849a-44f2-886d-9cfc16870d4d\",\"X-Task-Notify-Url\":\"https://default-region:30888/tasks/e1defafc-849a-44f2-886d-9cfc16870d4d\"},\"method\":\"POST\",\"url\":\"https://172.16.1.218:8885/servers/b6e37fdf-7301-495f-8eb2-268bbdaa5b79/dest-prepare-migrate\"}}}"

是在执行某些命令时候出错或者超时Client.Timeout exceeded while awaiting headers ?

@chenjacken 看这个报错是迁移的目的端机器链接失败,目的端宿主机是否正常?

这个问题我再看看。谢谢!

chenjacken commented 10 months ago

用新镜像来创新虚拟机,同样的问题

{
    "__reason__": "Deploy guest fs: request deploy guest fs: rpc error: code = Unknown desc = run deploy_guest_fs failed []: \"/opt/yunion/bin/host-deployer --common-config-file /opt/yunion/common.conf --config /opt/yunion/host.conf --deploy-action deploy_guest_fs --deploy-params '{\\\"disk_info\\\":{\\\"path\\\":\\\"rbd:nvmepool/9c380d1a-dcf8-443c-8646-9bf67a592158:mon_host=172.16.1.216\\\\\\\\;172.16.1.218\\\\\\\\;172.16.1.217:key=AQBz5pZlFX41OBAAJqPwV73/Zxc0nKEjdGb0uw\\\\\\\\=\\\\\\\\=:rados_mon_op_timeout=5:rados_osd_op_timeout=1200:client_mount_timeout=120\\\"},\\\"guest_desc\\\":{\\\"name\\\":\\\"HWSaaS\\\",\\\"uuid\\\":\\\"14333d0c-6241-4f78-8912-7834ac74d4d7\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"nics\\\":[{\\\"mac\\\":\\\"00:22:31:ad:35:2d\\\",\\\"ip\\\":\\\"172.16.1.198\\\",\\\"net\\\":\\\"vm-static-net\\\",\\\"net_id\\\":\\\"e532366c-2ba4-4fed-895b-efd402812149\\\",\\\"gateway\\\":\\\"172.16.1.1\\\",\\\"dns\\\":\\\"172.16.1.200\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"ifname\\\":\\\"dhcp1-dsf\\\",\\\"masklen\\\":24,\\\"driver\\\":\\\"virtio\\\",\\\"bridge\\\":\\\"br1\\\",\\\"wire_id\\\":\\\"2a4e5367-e4c5-4410-81dd-217698d99ff2\\\",\\\"vlan\\\":1,\\\"interface\\\":\\\"bond0\\\",\\\"bw\\\":1000,\\\"mtu\\\":1500}],\\\"disks\\\":[{\\\"disk_id\\\":\\\"9c380d1a-dcf8-443c-8646-9bf67a592158\\\",\\\"driver\\\":\\\"scsi\\\",\\\"cache_mode\\\":\\\"none\\\",\\\"aio_mode\\\":\\\"native\\\",\\\"size\\\":102400,\\\"template_id\\\":\\\"d4d0b10b-89d3-49c4-88c8-d3528312d5c1\\\",\\\"storage_id\\\":\\\"1b298235-a82f-4579-8b7a-e6dd2d9916d3\\\",\\\"path\\\":\\\"rbd:nvmepool/9c380d1a-dcf8-443c-8646-9bf67a592158\\\",\\\"format\\\":\\\"raw\\\"}],\\\"Hypervisor\\\":\\\"kvm\\\",\\\"hostname\\\":\\\"HWSaaS\\\"},\\\"deploy_info\\\":{\\\"public_key\\\":{\\\"admin_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDG7U+zsDlTXjbDWg4/C0NElAGPJ2CXrs8dh89ftJFjPbB5W9ghrVoen4UTBBm6GqXc4hl5zGVM2zL2H31n85HfYgBo47uKFEKu9c4DpSdiTBf15zBEvhNZziOJ0FEhwglZ1WRvSKDd2+3AH23WMp++btcz/ruhbib2mdUW9nwfQj783Sl+WfJ9Ss6p3RthRtolDxrpSXAIP5KH41jwYvCLPMLBndh5sz3fHuB6AfpbjYgG++pBrhf0rtemj5f1ZtgbvQ5IlYs5L1QUcctA6BbzwlRPbaNvSaM6+hjiU3g7Fm68qmT+4uNBRVKqip0hBkMBJSW8A8ZUSLIvP4G4DDXF\\\\n\\\",\\\"project_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQxBHbbAyqBKf71sa4+xLV/9gTkZe7kIJgSyU+9ViGqfzN9B0TjBqL4pnZujHUl4Gch4EK9TGg3FtQNWTBHETRMaB4JVrjSpu4uXEYRj3EVVqJKCwwWNOoy4hj7eHmEaAFkw8CVNvBlJAPFXVXUIcPZplQQQI/Da5gUfZ8beGIlrhBWtz2Julw/5sxPiaENm2PPItiw6iZnPZ88/bZCvSHy0Cx2odZE3TJrN3H5Zob/3O09n8wCqPUrvMz9ibKb9z5iT0ANLnKtSCQW1xxIml5JlSFLEPPKFEyCdrE2mTsfPp7Gc+BUD9/KZy+8hih6gfS+dL1kK6OPOVfJLxDcjNJ\\\\n\\\"},\\\"is_init\\\":true,\\\"default_root_user\\\":true,\\\"windows_default_admin_user\\\":true,\\\"telegraf\\\":{\\\"telegraf_conf\\\":\\\"### MANAGED BY ansible-telegraf ANSIBLE ROLE ###\\\\n\\\\n[global_tags]\\\\n\\\\n    vm_ip = \\\\\\\"172.16.1.198\\\\\\\"\\\\n    vm_name = \\\\\\\"HWSaaS\\\\\\\"\\\\n    status = \\\\\\\"start_deploy\\\\\\\"\\\\n    tenant = \\\\\\\"system\\\\\\\"\\\\n    brand = \\\\\\\"OneCloud\\\\\\\"\\\\n    scaling_group_id = \\\\\\\"\\\\\\\"\\\\n    project_domain = \\\\\\\"Default\\\\\\\"\\\\n    host = \\\\\\\"node9-172-16-1-233\\\\\\\"\\\\n    os_type = \\\\\\\"Linux\\\\\\\"\\\\n    cloudregion = \\\\\\\"Default\\\\\\\"\\\\n    region_ext_id = \\\\\\\"\\\\\\\"\\\\n    vm_id = \\\\\\\"14333d0c-6241-4f78-8912-7834ac74d4d7\\\\\\\"\\\\n    zone = \\\\\\\"华南-广州\\\\\\\"\\\\n    zone_id = \\\\\\\"7b6ae896-1b3d-40e5-879f-cfd00799200b\\\\\\\"\\\\n    cloudregion_id = \\\\\\\"default\\\\\\\"\\\\n    tenant_id = \\\\\\\"2e152fe0619046a38081d7e487028358\\\\\\\"\\\\n    host_id = \\\\\\\"33184fbe-77c0-4aad-8460-f3b27f8648fc\\\\\\\"\\\\n    zone_ext_id = \\\\\\\"\\\\\\\"\\\\n    domain_id = \\\\\\\"default\\\\\\\"\\\\n\\\\n# Configuration for telegraf agent\\\\n[agent]\\\\n    interval = \\\\\\\"60s\\\\\\\"\\\\n    debug = false\\\\n    hostname = \\\\\\\"\\\\\\\"\\\\n    round_interval = true\\\\n    flush_interval = \\\\\\\"60s\\\\\\\"\\\\n    flush_jitter = \\\\\\\"0s\\\\\\\"\\\\n    collection_jitter = \\\\\\\"0s\\\\\\\"\\\\n    metric_batch_size = 1000\\\\n    metric_buffer_limit = 10000\\\\n    quiet = false\\\\n    logfile = \\\\\\\"/var/log/telegraf.log\\\\\\\"\\\\n    logfile_rotation_max_size = \\\\\\\"10MB\\\\\\\"\\\\n    logfile_rotation_max_archives = 1\\\\n    omit_hostname = true\\\\n\\\\n###############################################################################\\\\n#                                  OUTPUTS                                    #\\\\n###############################################################################\\\\n\\\\n[[outputs.influxdb]]\\\\n    urls = [\\\\\\\"http://169.254.169.254/monitor\\\\\\\"]\\\\n    database = \\\\\\\"telegraf\\\\\\\"\\\\n    insecure_skip_verify = true\\\\n\\\\n###############################################################################\\\\n#                                  INPUTS                                     #\\\\n###############################################################################\\\\n[[inputs.cpu]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    percpu = true\\\\n    totalcpu = true\\\\n    collect_cpu_time = false\\\\n    report_active = true\\\\n[[inputs.disk]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    ignore_fs = [\\\\\\\"tmpfs\\\\\\\", \\\\\\\"devtmpfs\\\\\\\", \\\\\\\"overlay\\\\\\\", \\\\\\\"squashfs\\\\\\\", \\\\\\\"iso9660\\\\\\\"]\\\\n[[inputs.diskio]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    skip_serial_number = false\\\\n[[inputs.kernel]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.kernel_vmstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.mem]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.processes]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.swap]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.system]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.net]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.netstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.nstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.internal]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    collect_memstats = false\\\\n\\\"}}}'\" error: Process exited with status 2, cmd error: [info 240202 03:34:22 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/host.conf\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tc-man\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-reserved-memory\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-live-migrate-downtime\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument windows-default-admin-user\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument use-boot-vga\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument servers-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument slots\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-delay-seconds\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bandwidth-limit\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-iops-per-cpu\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-kvm\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-cpu-binding\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-config-file\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tunnel-padding-bytes\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-router-vms\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-iops-per-cpu\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-block-size\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument migrate-expect-rate\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument check-system-services\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-server-port\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-switch-vms\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-qemu-debug-log\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-south-database\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-guest-man\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-probe-kubelet\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument report-interval\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-socket-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument binary-memclean-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument min-migrate-timeout-seconds\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-dir-suffix\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-recycle-day\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-ksm\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-storage-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-monitor\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-allow-conntrack-invalid\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bridge-driver\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-request-worker-count\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-eip-bridge\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument pcie-root-port-count\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-bps-per-cpu\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-openflow-controller\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-pid-file\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-temp-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-type\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-integration-bridge\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-fallocate-disk\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ethtool-enable-gso\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-eip-man\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-backing-template\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-lease-timeout\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-lease-time\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile-keep-days\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tap-bridge-name\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument always-recycle-diskfile\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-custom-device\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument block-io-scheduler\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovmf-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sync-storage-info-duration-second\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-renewal-time\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-vm-uuid\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-usb\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-telegraf\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-image-save-format\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-gpu\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-image-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-mapped-bridge\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-underlay-mtu\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-set-cgroup\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-health-timeout\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument image-cache-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-encap-ip\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-limit\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument rack\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tap-man\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-virtio-rng-device\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-hotplug-vcpu-count\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument restrict-qemu-img-convert-worker\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument linux-default-root-user\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument zero-clean-disk-data\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ping-region-interval\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-skip-tls-verify\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument set-vnc-password\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-template-backing\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-use-tls\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument kubelet-run-directory\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument memory-snapshots-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bw-download-bandwidth\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-bps-per-cpu\n[info 240202 03:34:22 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-02-02 03:34:22 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/common.conf\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-qemu-version\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[info 2024-02-02 03:34:22 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-02-02 03:34:22 procutils.WaitZombieLoop(zombie_others.go:36)] My pid is not 1 and no need to wait zombies\n[info 2024-02-02 03:34:22 deployserver.(*SDeployService).InitService(deployserver.go:454)] exec socket path: /var/run/onecloud/exec.sock\nfatal error: sync: unlock of unlocked mutex\n\ngoroutine 1 [running]:\nruntime.throw({0x111a4d0?, 0xc000320380?})\n\t/opt/go/src/runtime/panic.go:992 +0x71 fp=0xc000687928 sp=0xc0006878f8 pc=0x4379d1\nsync.throw({0x111a4d0?, 0xf19900?})\n\t/opt/go/src/runtime/panic.go:978 +0x1e fp=0xc000687948 sp=0xc000687928 pc=0x4656de\nsync.(*Mutex).unlockSlow(0xc00034fa70, 0xffffffff)\n\t/opt/go/src/sync/mutex.go:220 +0x3c fp=0xc000687970 sp=0xc000687948 pc=0x474c1c\nsync.(*Mutex).Unlock(...)\n\t/opt/go/src/sync/mutex.go:214\nyunion.io/x/onecloud/pkg/util/xfsutils.UnlockXfsPartition({0xc0007382b1, 0x24})\n\t/root/go/src/yunion.io/x/onecloud/pkg/util/xfsutils/lock.go:48 +0xf4 fp=0xc0006879d0 sp=0xc000687970 pc=0xd1a8d4\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:283 +0x45 fp=0xc0006879f0 sp=0xc0006879d0 pc=0xd27945\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount(0xc0003b4960)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:303 +0x699 fp=0xc000687b88 sp=0xc0006879f0 pc=0xd277d9\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).UmountRootfs(0xc0000bd380?, {0x12c14a0?, 0xc0000103b0?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:117 +0x3b fp=0xc000687ba0 sp=0xc000687b88 pc=0xe7a7fb\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:475 +0x36 fp=0xc000687bc8 sp=0xc000687ba0 pc=0xd20a16\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs({0x12bdd48, 0xc000346a50}, 0xc0003580f0)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:485 +0x366 fp=0xc000687c90 sp=0xc000687bc8 pc=0xd20906\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).DeployGuestfs(0xc000346a50?, 0x0?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:123 +0x26 fp=0xc000687cb8 sp=0xc000687c90 pc=0xe7a866\nyunion.io/x/onecloud/pkg/hostman/diskutils.(*SKVMGuestDisk).DeployGuestfs(...)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/kvm.go:144\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*LocalDeploy).DeployGuestFs(0xc00070e000?, 0xc0003580f0)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:46 +0x184 fp=0xc000687d88 sp=0xc000687cb8 pc=0xe85c84\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.StartLocalDeploy({0x7ffd24a7fcb0?, 0x4?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:130 +0x2a8 fp=0xc000687de8 sp=0xc000687d88 pc=0xe86dc8\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*SDeployService).RunService(0xc0002fb2a0?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/deployserver.go:266 +0x5b fp=0xc000687ed0 sp=0xc000687de8 pc=0xe83cfb\nyunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc00000e6f0)\n\t/root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xfa fp=0xc000687f50 sp=0xc000687ed0 pc=0xb70e5a\nmain.main()\n\t/root/go/src/yunion.io/x/onecloud/cmd/host-deployer/main.go:28 +0xe5 fp=0xc000687f80 sp=0xc000687f50 pc=0xe87625\nruntime.main()\n\t/opt/go/src/runtime/proc.go:250 +0x212 fp=0xc000687fe0 sp=0xc000687f80 pc=0x43a0f2\nruntime.goexit()\n\t/opt/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000687fe8 sp=0xc000687fe0 pc=0x46aa61\n\ngoroutine 9 [syscall]:\nos/signal.signal_recv()\n\t/opt/go/src/runtime/sigqueue.go:151 +0x2f\nos/signal.loop()\n\t/opt/go/src/os/signal/signal_unix.go:23 +0x19\ncreated by os/signal.Notify.func1.1\n\t/opt/go/src/os/signal/signal.go:151 +0x2a\n\ngoroutine 10 [chan receive]:\nyunion.io/x/pkg/util/signalutils.StartTrap.func1()\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:72 +0xa7\ncreated by yunion.io/x/pkg/util/signalutils.StartTrap\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:62 +0xd4\n\ngoroutine 11 [chan send]:\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2.1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:189 +0x24b\ncreated by yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:186 +0xe5\n\ngoroutine 28 [chan send]:\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2.1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:193 +0x238\ncreated by yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:186 +0xe5\n\ngoroutine 30 [chan send]:\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2.1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:193 +0x238\ncreated by yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:186 +0xe5\n",
    "__stage__": "OnDeployGuestComplete",
    "__status__": "error"
}
wanyaoqi commented 10 months ago

用新镜像来创新虚拟机,同样的问题

{
    "__reason__": "Deploy guest fs: request deploy guest fs: rpc error: code = Unknown desc = run deploy_guest_fs failed []: \"/opt/yunion/bin/host-deployer --common-config-file /opt/yunion/common.conf --config /opt/yunion/host.conf --deploy-action deploy_guest_fs --deploy-params '{\\\"disk_info\\\":{\\\"path\\\":\\\"rbd:nvmepool/9c380d1a-dcf8-443c-8646-9bf67a592158:mon_host=172.16.1.216\\\\\\\\;172.16.1.218\\\\\\\\;172.16.1.217:key=AQBz5pZlFX41OBAAJqPwV73/Zxc0nKEjdGb0uw\\\\\\\\=\\\\\\\\=:rados_mon_op_timeout=5:rados_osd_op_timeout=1200:client_mount_timeout=120\\\"},\\\"guest_desc\\\":{\\\"name\\\":\\\"HWSaaS\\\",\\\"uuid\\\":\\\"14333d0c-6241-4f78-8912-7834ac74d4d7\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"nics\\\":[{\\\"mac\\\":\\\"00:22:31:ad:35:2d\\\",\\\"ip\\\":\\\"172.16.1.198\\\",\\\"net\\\":\\\"vm-static-net\\\",\\\"net_id\\\":\\\"e532366c-2ba4-4fed-895b-efd402812149\\\",\\\"gateway\\\":\\\"172.16.1.1\\\",\\\"dns\\\":\\\"172.16.1.200\\\",\\\"domain\\\":\\\"cloud.onecloud.io\\\",\\\"ifname\\\":\\\"dhcp1-dsf\\\",\\\"masklen\\\":24,\\\"driver\\\":\\\"virtio\\\",\\\"bridge\\\":\\\"br1\\\",\\\"wire_id\\\":\\\"2a4e5367-e4c5-4410-81dd-217698d99ff2\\\",\\\"vlan\\\":1,\\\"interface\\\":\\\"bond0\\\",\\\"bw\\\":1000,\\\"mtu\\\":1500}],\\\"disks\\\":[{\\\"disk_id\\\":\\\"9c380d1a-dcf8-443c-8646-9bf67a592158\\\",\\\"driver\\\":\\\"scsi\\\",\\\"cache_mode\\\":\\\"none\\\",\\\"aio_mode\\\":\\\"native\\\",\\\"size\\\":102400,\\\"template_id\\\":\\\"d4d0b10b-89d3-49c4-88c8-d3528312d5c1\\\",\\\"storage_id\\\":\\\"1b298235-a82f-4579-8b7a-e6dd2d9916d3\\\",\\\"path\\\":\\\"rbd:nvmepool/9c380d1a-dcf8-443c-8646-9bf67a592158\\\",\\\"format\\\":\\\"raw\\\"}],\\\"Hypervisor\\\":\\\"kvm\\\",\\\"hostname\\\":\\\"HWSaaS\\\"},\\\"deploy_info\\\":{\\\"public_key\\\":{\\\"admin_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDG7U+zsDlTXjbDWg4/C0NElAGPJ2CXrs8dh89ftJFjPbB5W9ghrVoen4UTBBm6GqXc4hl5zGVM2zL2H31n85HfYgBo47uKFEKu9c4DpSdiTBf15zBEvhNZziOJ0FEhwglZ1WRvSKDd2+3AH23WMp++btcz/ruhbib2mdUW9nwfQj783Sl+WfJ9Ss6p3RthRtolDxrpSXAIP5KH41jwYvCLPMLBndh5sz3fHuB6AfpbjYgG++pBrhf0rtemj5f1ZtgbvQ5IlYs5L1QUcctA6BbzwlRPbaNvSaM6+hjiU3g7Fm68qmT+4uNBRVKqip0hBkMBJSW8A8ZUSLIvP4G4DDXF\\\\n\\\",\\\"project_public_key\\\":\\\"ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDQxBHbbAyqBKf71sa4+xLV/9gTkZe7kIJgSyU+9ViGqfzN9B0TjBqL4pnZujHUl4Gch4EK9TGg3FtQNWTBHETRMaB4JVrjSpu4uXEYRj3EVVqJKCwwWNOoy4hj7eHmEaAFkw8CVNvBlJAPFXVXUIcPZplQQQI/Da5gUfZ8beGIlrhBWtz2Julw/5sxPiaENm2PPItiw6iZnPZ88/bZCvSHy0Cx2odZE3TJrN3H5Zob/3O09n8wCqPUrvMz9ibKb9z5iT0ANLnKtSCQW1xxIml5JlSFLEPPKFEyCdrE2mTsfPp7Gc+BUD9/KZy+8hih6gfS+dL1kK6OPOVfJLxDcjNJ\\\\n\\\"},\\\"is_init\\\":true,\\\"default_root_user\\\":true,\\\"windows_default_admin_user\\\":true,\\\"telegraf\\\":{\\\"telegraf_conf\\\":\\\"### MANAGED BY ansible-telegraf ANSIBLE ROLE ###\\\\n\\\\n[global_tags]\\\\n\\\\n    vm_ip = \\\\\\\"172.16.1.198\\\\\\\"\\\\n    vm_name = \\\\\\\"HWSaaS\\\\\\\"\\\\n    status = \\\\\\\"start_deploy\\\\\\\"\\\\n    tenant = \\\\\\\"system\\\\\\\"\\\\n    brand = \\\\\\\"OneCloud\\\\\\\"\\\\n    scaling_group_id = \\\\\\\"\\\\\\\"\\\\n    project_domain = \\\\\\\"Default\\\\\\\"\\\\n    host = \\\\\\\"node9-172-16-1-233\\\\\\\"\\\\n    os_type = \\\\\\\"Linux\\\\\\\"\\\\n    cloudregion = \\\\\\\"Default\\\\\\\"\\\\n    region_ext_id = \\\\\\\"\\\\\\\"\\\\n    vm_id = \\\\\\\"14333d0c-6241-4f78-8912-7834ac74d4d7\\\\\\\"\\\\n    zone = \\\\\\\"华南-广州\\\\\\\"\\\\n    zone_id = \\\\\\\"7b6ae896-1b3d-40e5-879f-cfd00799200b\\\\\\\"\\\\n    cloudregion_id = \\\\\\\"default\\\\\\\"\\\\n    tenant_id = \\\\\\\"2e152fe0619046a38081d7e487028358\\\\\\\"\\\\n    host_id = \\\\\\\"33184fbe-77c0-4aad-8460-f3b27f8648fc\\\\\\\"\\\\n    zone_ext_id = \\\\\\\"\\\\\\\"\\\\n    domain_id = \\\\\\\"default\\\\\\\"\\\\n\\\\n# Configuration for telegraf agent\\\\n[agent]\\\\n    interval = \\\\\\\"60s\\\\\\\"\\\\n    debug = false\\\\n    hostname = \\\\\\\"\\\\\\\"\\\\n    round_interval = true\\\\n    flush_interval = \\\\\\\"60s\\\\\\\"\\\\n    flush_jitter = \\\\\\\"0s\\\\\\\"\\\\n    collection_jitter = \\\\\\\"0s\\\\\\\"\\\\n    metric_batch_size = 1000\\\\n    metric_buffer_limit = 10000\\\\n    quiet = false\\\\n    logfile = \\\\\\\"/var/log/telegraf.log\\\\\\\"\\\\n    logfile_rotation_max_size = \\\\\\\"10MB\\\\\\\"\\\\n    logfile_rotation_max_archives = 1\\\\n    omit_hostname = true\\\\n\\\\n###############################################################################\\\\n#                                  OUTPUTS                                    #\\\\n###############################################################################\\\\n\\\\n[[outputs.influxdb]]\\\\n    urls = [\\\\\\\"http://169.254.169.254/monitor\\\\\\\"]\\\\n    database = \\\\\\\"telegraf\\\\\\\"\\\\n    insecure_skip_verify = true\\\\n\\\\n###############################################################################\\\\n#                                  INPUTS                                     #\\\\n###############################################################################\\\\n[[inputs.cpu]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    percpu = true\\\\n    totalcpu = true\\\\n    collect_cpu_time = false\\\\n    report_active = true\\\\n[[inputs.disk]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    ignore_fs = [\\\\\\\"tmpfs\\\\\\\", \\\\\\\"devtmpfs\\\\\\\", \\\\\\\"overlay\\\\\\\", \\\\\\\"squashfs\\\\\\\", \\\\\\\"iso9660\\\\\\\"]\\\\n[[inputs.diskio]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    skip_serial_number = false\\\\n[[inputs.kernel]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.kernel_vmstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.mem]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.processes]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.swap]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.system]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.net]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.netstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.nstat]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n[[inputs.internal]]\\\\n    name_prefix = \\\\\\\"agent_\\\\\\\"\\\\n    collect_memstats = false\\\\n\\\"}}}'\" error: Process exited with status 2, cmd error: [info 240202 03:34:22 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/host.conf\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tc-man\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-reserved-memory\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-live-migrate-downtime\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument windows-default-admin-user\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument use-boot-vga\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument servers-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument slots\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-delay-seconds\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bandwidth-limit\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-iops-per-cpu\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-kvm\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-cpu-binding\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-config-file\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tunnel-padding-bytes\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-router-vms\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-iops-per-cpu\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-block-size\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument migrate-expect-rate\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument check-system-services\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-server-port\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument allow-switch-vms\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-qemu-debug-log\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument fetcherfs-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-south-database\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-guest-man\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-probe-kubelet\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument report-interval\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-socket-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument binary-memclean-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument min-migrate-timeout-seconds\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-dir-suffix\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument snapshot-recycle-day\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-ksm\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-storage-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-monitor\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-allow-conntrack-invalid\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bridge-driver\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-request-worker-count\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-eip-bridge\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument pcie-root-port-count\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-read-bps-per-cpu\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-openflow-controller\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-pid-file\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-backup-temp-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-type\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-integration-bridge\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-fallocate-disk\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ethtool-enable-gso\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-eip-man\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument auto-merge-backing-template\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-lease-timeout\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-lease-time\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument recycle-diskfile-keep-days\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument tap-bridge-name\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument always-recycle-diskfile\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-custom-device\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument block-io-scheduler\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovmf-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sync-storage-info-duration-second\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument dhcp-renewal-time\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-vm-uuid\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-usb\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-telegraf\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-image-save-format\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-gpu\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument local-image-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-mapped-bridge\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-underlay-mtu\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-set-cgroup\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-health-timeout\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument image-cache-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ovn-encap-ip\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument agent-temp-limit\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument rack\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument sdn-enable-tap-man\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-virtio-rng-device\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument max-hotplug-vcpu-count\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument restrict-qemu-img-convert-worker\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument linux-default-root-user\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument zero-clean-disk-data\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument ping-region-interval\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-skip-tls-verify\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument set-vnc-password\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument enable-template-backing\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument etcd-use-tls\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument kubelet-run-directory\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument memory-snapshots-path\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument bw-download-bandwidth\n[warning 240202 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-write-bps-per-cpu\n[info 240202 03:34:22 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-02-02 03:34:22 options.parseOptions(options.go:331)] Use configuration file: /opt/yunion/common.conf\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument default-qemu-version\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument floppy-count\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument cdrom-count\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-local-vpc\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument manage-ntp-configuration\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument no-hpet\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument disable-security-group\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument host-cpu-passthrough\n[warning 2024-02-02 03:34:22 structarg.(*ArgumentParser).parseJSONKeyValue(structarg.go:1215)] Cannot find argument live-migrate-cpu-throttle-max\n[info 2024-02-02 03:34:22 options.parseOptions(options.go:354)] Set log level to \"info\"\n[info 2024-02-02 03:34:22 procutils.WaitZombieLoop(zombie_others.go:36)] My pid is not 1 and no need to wait zombies\n[info 2024-02-02 03:34:22 deployserver.(*SDeployService).InitService(deployserver.go:454)] exec socket path: /var/run/onecloud/exec.sock\nfatal error: sync: unlock of unlocked mutex\n\ngoroutine 1 [running]:\nruntime.throw({0x111a4d0?, 0xc000320380?})\n\t/opt/go/src/runtime/panic.go:992 +0x71 fp=0xc000687928 sp=0xc0006878f8 pc=0x4379d1\nsync.throw({0x111a4d0?, 0xf19900?})\n\t/opt/go/src/runtime/panic.go:978 +0x1e fp=0xc000687948 sp=0xc000687928 pc=0x4656de\nsync.(*Mutex).unlockSlow(0xc00034fa70, 0xffffffff)\n\t/opt/go/src/sync/mutex.go:220 +0x3c fp=0xc000687970 sp=0xc000687948 pc=0x474c1c\nsync.(*Mutex).Unlock(...)\n\t/opt/go/src/sync/mutex.go:214\nyunion.io/x/onecloud/pkg/util/xfsutils.UnlockXfsPartition({0xc0007382b1, 0x24})\n\t/root/go/src/yunion.io/x/onecloud/pkg/util/xfsutils/lock.go:48 +0xf4 fp=0xc0006879d0 sp=0xc000687970 pc=0xd1a8d4\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:283 +0x45 fp=0xc0006879f0 sp=0xc0006879d0 pc=0xd27945\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).Umount(0xc0003b4960)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:303 +0x699 fp=0xc000687b88 sp=0xc0006879f0 pc=0xd277d9\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).UmountRootfs(0xc0000bd380?, {0x12c14a0?, 0xc0000103b0?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:117 +0x3b fp=0xc000687ba0 sp=0xc000687b88 pc=0xe7a7fb\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs.func1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:475 +0x36 fp=0xc000687bc8 sp=0xc000687ba0 pc=0xd20a16\nyunion.io/x/onecloud/pkg/hostman/diskutils/fsutils.DeployGuestfs({0x12bdd48, 0xc000346a50}, 0xc0003580f0)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/fsutils/fsutils.go:485 +0x366 fp=0xc000687c90 sp=0xc000687bc8 pc=0xd20906\nyunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm.(*LocalDiskDriver).DeployGuestfs(0xc000346a50?, 0x0?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/qemu_kvm/local_driver.go:123 +0x26 fp=0xc000687cb8 sp=0xc000687c90 pc=0xe7a866\nyunion.io/x/onecloud/pkg/hostman/diskutils.(*SKVMGuestDisk).DeployGuestfs(...)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/diskutils/kvm.go:144\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*LocalDeploy).DeployGuestFs(0xc00070e000?, 0xc0003580f0)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:46 +0x184 fp=0xc000687d88 sp=0xc000687cb8 pc=0xe85c84\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.StartLocalDeploy({0x7ffd24a7fcb0?, 0x4?})\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/localdeploy.go:130 +0x2a8 fp=0xc000687de8 sp=0xc000687d88 pc=0xe86dc8\nyunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver.(*SDeployService).RunService(0xc0002fb2a0?)\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/hostdeployer/deployserver/deployserver.go:266 +0x5b fp=0xc000687ed0 sp=0xc000687de8 pc=0xe83cfb\nyunion.io/x/onecloud/pkg/cloudcommon/service.(*SServiceBase).StartService(0xc00000e6f0)\n\t/root/go/src/yunion.io/x/onecloud/pkg/cloudcommon/service/services.go:58 +0xfa fp=0xc000687f50 sp=0xc000687ed0 pc=0xb70e5a\nmain.main()\n\t/root/go/src/yunion.io/x/onecloud/cmd/host-deployer/main.go:28 +0xe5 fp=0xc000687f80 sp=0xc000687f50 pc=0xe87625\nruntime.main()\n\t/opt/go/src/runtime/proc.go:250 +0x212 fp=0xc000687fe0 sp=0xc000687f80 pc=0x43a0f2\nruntime.goexit()\n\t/opt/go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000687fe8 sp=0xc000687fe0 pc=0x46aa61\n\ngoroutine 9 [syscall]:\nos/signal.signal_recv()\n\t/opt/go/src/runtime/sigqueue.go:151 +0x2f\nos/signal.loop()\n\t/opt/go/src/os/signal/signal_unix.go:23 +0x19\ncreated by os/signal.Notify.func1.1\n\t/opt/go/src/os/signal/signal.go:151 +0x2a\n\ngoroutine 10 [chan receive]:\nyunion.io/x/pkg/util/signalutils.StartTrap.func1()\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:72 +0xa7\ncreated by yunion.io/x/pkg/util/signalutils.StartTrap\n\t/root/go/src/yunion.io/x/onecloud/vendor/yunion.io/x/pkg/util/signalutils/signalutils.go:62 +0xd4\n\ngoroutine 11 [chan send]:\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2.1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:189 +0x24b\ncreated by yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:186 +0xe5\n\ngoroutine 28 [chan send]:\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2.1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:193 +0x238\ncreated by yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:186 +0xe5\n\ngoroutine 30 [chan send]:\nyunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2.1()\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:193 +0x238\ncreated by yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart.(*SKVMGuestDiskPartition).mount.func2\n\t/root/go/src/yunion.io/x/onecloud/pkg/hostman/guestfs/kvmpart/kvmpart.go:186 +0xe5\n",
    "__stage__": "OnDeployGuestComplete",
    "__status__": "error"
}

@chenjacken 目前确实发现xfs 镜像部署有问题,修复后会再这里通知

chenjacken commented 10 months ago

好的,谢谢。v3.10.12也会存在这个问题吗?

wanyaoqi commented 10 months ago

好的,谢谢。v3.10.12也会存在这个问题吗?

@chenjacken 是的,这个问题是最新发现的

chenjacken commented 10 months ago

发现对应的host-pod会有以下的报错信息:

[warning 2024-02-02 06:41:49 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 249 cycles...
[warning 2024-02-02 06:42:19 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 250 cycles...
[warning 2024-02-02 06:42:49 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 251 cycles...
[warning 2024-02-02 06:43:19 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 252 cycles...
[warning 2024-02-02 06:43:49 appsrv.do_worker_watchdog(workers_watchdog.go:64)] WorkerManager RequestWorker has been busy for 253 cycles...
[error 2024-02-02 06:44:06 storageman.(*SRbdStorage).SaveToGlance(storage_rbd.go:485)] Save to glance failed: {"error":{"class":"TimeoutError","code":504,"details":"request process timeout","request":{"headers":{"Content-Length":"18977849344","Content-Type":"application/octet-stream","User-Agent":"yunioncloud-go/201708","X-Auth-Token":"*","X-Image-Meta-Image_id":"66d45f68-eb72-4e4e-89a2-938b2904a203","X-Yunion-Parent-Id":"","X-Yunion-Peer-Service-Name":"host","X-Yunion-Remote-Addr":"default-glance:30292","X-Yunion-Span-Id":"0","X-Yunion-Span-Name":"","X-Yunion-Strace-Debug":"true","X-Yunion-Strace-Id":"e368273d"},"method":"PUT","url":"https://default-glance:30292/v1/images/66d45f68-eb72-4e4e-89a2-938b2904a203"}}}
[info 2024-02-02 06:44:07 hostdhcp.(*SGuestDHCPServer).serveDHCPInternal(dhcpserver.go:278)] Make DHCP Reply 172.16.1.195 TO 00:22:f0:d0:47:c8
[error 2024-02-02 06:44:08 storageman.(*SRbdStorage).SaveToGlance(storage_rbd.go:492)] Fail to remote cache image: {"error":{"class":"UnclassifiedError","code":500,"details":"sql: no rows in result set","request":{"body":"{\"storagecachedimage\":{\"path\":\"rbd:nvmepool/image_cache_66d45f68-eb72-4e4e-89a2-938b2904a203:mon_hos...=120\",\"status\":\"active\"}}","headers":{"Content-Length":"288","Content-Type":"application/json","User-Agent":"yunioncloud-go/201708","X-Auth-Token":"*","X-Yunion-Parent-Id":"","X-Yunion-Peer-Service-Name":"host","X-Yunion-Remote-Addr":"default-region:30888","X-Yunion-Span-Id":"0","X-Yunion-Span-Name":"","X-Yunion-Strace-Debug":"true","X-Yunion-Strace-Id":"721570cd"},"method":"PUT","url":"https://default-region:30888/storagecaches/ef9df16d-2111-44c2-8989-72a35b8fa0d6/cachedimages/66d45f68-eb72-4e4e-89a2-938b2904a203?auto_create=true"}}}
[info 2024-02-02 06:44:08 workmanager.(*workerTask).Run(manager.go:95)] DelayTask complete: <nil>
[info 2024-02-02 06:44:08 modules.TaskComplete(task.go:34)] Sync task 6c52864a-8bb7-4b75-88dc-086bef358237 complete succ
[error 2024-02-02 06:46:40 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
chenjacken commented 10 months ago

另外想咨询下,安装 Cloudbase-Init,配置的metadata地址是如下的吗?

metadata_services=cloudbaseinit.metadata.services.configdrive.ConfigDriveService,cloudbaseinit.metadata.services.ec2service.EC2Service

metadata_base_url=http://169.254.169.254/
ec2_metadata_base_url=http://169.254.169.254/

Cloudbase-Init的日志:


2024-02-02 11:00:50.959 4664 ERROR cloudbaseinit.metadata.services.base   File "C:\Program Files\Cloudbase Solutions\Cloudbase-Init\Python\lib\site-packages\requests\sessions.py", line 701, in send
2024-02-02 11:00:50.959 4664 ERROR cloudbaseinit.metadata.services.base     r = adapter.send(request, **kwargs)
2024-02-02 11:00:50.959 4664 ERROR cloudbaseinit.metadata.services.base   File "C:\Program Files\Cloudbase Solutions\Cloudbase-Init\Python\lib\site-packages\requests\adapters.py", line 553, in send
2024-02-02 11:00:50.959 4664 ERROR cloudbaseinit.metadata.services.base     raise ConnectTimeout(e, request=request)
2024-02-02 11:00:50.959 4664 ERROR cloudbaseinit.metadata.services.base requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/local-hostname (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000018A857E06D0>, 'Connection to 169.254.169.254 timed out. (connect timeout=None)'))
2024-02-02 11:00:50.959 4664 ERROR cloudbaseinit.metadata.services.base 
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service [-] HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/local-hostname (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000018A857E06D0>, 'Connection to 169.254.169.254 timed out. (connect timeout=None)')): requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/local-hostname (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000018A857E06D0>, 'Connection to 169.254.169.254 timed out. (connect timeout=None)'))
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service Traceback (most recent call last):
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service   File "C:\Program Files\Cloudbase Solutions\Cloudbase-Init\Python\lib\site-packages\urllib3\connection.py", line 174, in _new_conn
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service     conn = connection.create_connection(
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service   File "C:\Program Files\Cloudbase Solutions\Cloudbase-Init\Python\lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service     raise err
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service   File "C:\Program Files\Cloudbase Solutions\Cloudbase-Init\Python\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service     sock.connect(sa)
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service 
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service During handling of the above exception, another exception occurred:

...

2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service requests.exceptions.ConnectTimeout: HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /2009-04-04/meta-data/local-hostname (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x0000018A857E06D0>, 'Connection to 169.254.169.254 timed out. (connect timeout=None)'))
2024-02-02 11:00:50.963 4664 ERROR cloudbaseinit.metadata.services.ec2service 
2024-02-02 11:00:50.966 4664 DEBUG cloudbaseinit.metadata.services.ec2service [-] Metadata not found at URL 'http://169.254.169.254/' load C:\Program Files\Cloudbase Solutions\Cloudbase-Init\Python\lib\site-packages\cloudbaseinit\metadata\services\ec2service.py:46
2024-02-02 11:00:50.967 4664 ERROR cloudbaseinit.init [-] No metadata service found: cloudbaseinit.exception.MetadataNotFoundException: No available service found
wanyaoqi commented 10 months ago

@chenjacken 是这个地址,虚机的网络通吗,vpc网络 还是经典网络

chenjacken commented 10 months ago

虚拟机状态未知 对应的磁盘也是未知

同时发现,我的虚拟机是在node7节点,磁盘未知错误的信息显示对应的地址是node9(ip是172.16.1.233)

image
{
    "__reason__": "{\"error\":{\"class\":\"ClientError\",\"code\":499,\"details\":\"Get \\\"https://172.16.1.233:8885/disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/6f4b9da3-977c-45a1-8a78-e605d87b8adf/status\\\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)\",\"request\":{\"headers\":{\"User-Agent\":\"yunioncloud-go/201708\",\"X-Auth-Token\":\"*\",\"X-Region-Version\":\"v2\",\"X-Request-Id\":\"180688-8c5614\",\"X-Task-Id\":\"38f00d85-6aa1-494e-8d4d-bc5b3c41f451\",\"X-Task-Notify-Url\":\"https://default-region:30888/tasks/38f00d85-6aa1-494e-8d4d-bc5b3c41f451\",\"X-Yunion-Parent-Id\":\"0.0\",\"X-Yunion-Peer-Service-Name\":\"compute_v2\",\"X-Yunion-Remote-Addr\":\"172.16.1.233:8885\",\"X-Yunion-Span-Id\":\"0.0.0\",\"X-Yunion-Span-Name\":\"\",\"X-Yunion-Strace-Debug\":\"true\",\"X-Yunion-Strace-Id\":\"d5e74afa\"},\"method\":\"GET\",\"url\":\"https://172.16.1.233:8885/disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/6f4b9da3-977c-45a1-8a78-e605d87b8adf/status\"}}}",
    "__stage__": "OnDiskSyncStatusComplete",
    "__status__": "ERROR"
}

以下是node9的pod日志

发现default-host的日志是这样:

[error 2024-02-04 01:43:39 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[info 2024-02-04 01:43:53 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 200 bab40c-68a47f-d9bc6b GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/6f4b9da3-977c-45a1-8a78-e605d87b8adf/status (172.16.1.213:54833:compute_v2) 15081.15ms
[error 2024-02-04 01:43:54 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:45:09 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:45:24 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:46:39 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[info 2024-02-04 01:46:43 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 200 70ba9b-5394a3-abe9bb GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/6f4b9da3-977c-45a1-8a78-e605d87b8adf/status (172.16.1.213:5294:compute_v2) 15006.18ms
[error 2024-02-04 01:46:54 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:48:09 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:48:24 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:49:39 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:49:54 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[info 2024-02-04 01:50:33 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 200 f2f6ec-ad3d15-1fdca6 GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/37cb2e83-eba2-4f52-86cb-722849296034/status (172.16.1.213:15363:compute_v2) 15005.53ms
[info 2024-02-04 01:50:48 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 200 401181-fe7f38-3f1291 GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/7a4be1cf-3a81-47b4-80e2-02378388d2ba/status (172.16.1.213:8115:compute_v2) 27147.93ms
[error 2024-02-04 01:51:09 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:51:24 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:52:39 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:52:54 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[info 2024-02-04 01:53:29 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 200 f70c7f-fe8d60-3594ca GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/37cb2e83-eba2-4f52-86cb-722849296034/status (172.16.1.213:21241:compute_v2) 15005.69ms
[info 2024-02-04 01:53:44 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 200 3d7dd8-ddf644-a1639f GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/7a4be1cf-3a81-47b4-80e2-02378388d2ba/status (172.16.1.213:59919:compute_v2) 25330.46ms
[error 2024-02-04 01:54:10 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:54:25 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[info 2024-02-04 01:54:59 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 200 5fc5c5-8f01cf-2db3c9 GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/37cb2e83-eba2-4f52-86cb-722849296034/status (172.16.1.213:23987:compute_v2) 15005.94ms
[error 2024-02-04 01:55:40 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:55:55 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:57:10 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[info 2024-02-04 01:57:20 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 200 180688-8c5614-b45cf9 GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/6f4b9da3-977c-45a1-8a78-e605d87b8adf/status (172.16.1.213:37811:compute_v2) 15006.26ms
[error 2024-02-04 01:57:25 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:58:40 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 01:58:55 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 02:00:10 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-SSD size failed: GetCapacity: output: stderr "": signal: killed
[error 2024-02-04 02:00:25 storageman.GatherHostStorageStats(core.go:454)] sync storage Ceph-HDD size failed: GetCapacity: output: stderr "": signal: killed

default-host-deployer 的日志:

[info 2024-02-03 09:02:30 fsutils.MountRootfs(fsutils.go:437)] detect partition /dev/sda
[error 2024-02-03 09:02:30 kvmpart.(*SKVMGuestDiskPartition).Mount(kvmpart.go:115)] Mount fs failed: unsupport fs  on /dev/sda
[info 2024-02-03 09:02:30 fsutils.MountRootfs(fsutils.go:437)] detect partition /dev/sda1
[info 2024-02-03 09:02:30 xfsutils.LockXfsPartition(lock.go:24)] xfs lock f956f023-a2a0-4fd5-8e89-0b44b0848ab3
[info 2024-02-03 09:02:30 guestfs.IsPartitionReadonly(core.go:219)] File system /tmp/_dev_sda1 is not readonly
[info 2024-02-03 09:02:30 kvmpart.(*SKVMGuestDiskPartition).Mount(kvmpart.go:139)] mount fs xfs on /dev/sda1 successfully
[info 2024-02-03 09:02:30 fsutils.MountRootfs(fsutils.go:445)] Use rootfs CentosRootFs, partition /dev/sda1
[info 2024-02-03 09:02:30 kvmpart.(*SKVMGuestDiskPartition).Umount(kvmpart.go:296)] umount /dev/sda1: /tmp/_dev_sda1
[info 2024-02-03 09:02:30 kvmpart.(*SKVMGuestDiskPartition).Umount(kvmpart.go:302)] umount /dev/sda1 successfully
[info 2024-02-03 09:02:30 xfsutils.UnlockXfsPartition(lock.go:43)] xfs unlock f956f023-a2a0-4fd5-8e89-0b44b0848ab3
[info 2024-02-03 09:02:30 fsutils.ResizeDiskFs(fsutils.go:142)] Parts: [[1 true 2048 16777215 16775168 primary xfs /dev/sda1]] label: msdos
[info 2024-02-03 09:02:30 fsutils.ResizeDiskFs(fsutils.go:207)] resize disk partition: [parted -a none -s /dev/sda -- resizepart 1 104857599s]
[error 2024-02-03 09:02:31 fsutils.FsckXfsFs(fsutils.go:327)] xfs_check failed: exec: "xfs_check": executable file not found in $PATH, , try xfs_repair -n <dev> instead
[info 2024-02-03 09:02:32 xfsutils.LockXfsPartition(lock.go:24)] xfs lock f956f023-a2a0-4fd5-8e89-0b44b0848ab3
[info 2024-02-03 09:02:38 xfsutils.UnlockXfsPartition(lock.go:43)] xfs unlock f956f023-a2a0-4fd5-8e89-0b44b0848ab3

[info 2024-02-03 09:02:38 monitor.(*HmpMonitor).write(hmp.go:125)] HMP Write : quit

[info 2024-02-03 09:02:38 qemu_kvm.(*QemuBaseDriver).CleanGuest(driver.go:557)] kill  process kill: cannot find process "12552
"
 exit status 1
[info 2024-02-03 09:02:38 qemu_kvm.(*QemuDeployManager).Release(driver.go:121)] release QemuDeployManager
[info 2024-02-03 09:02:38 monitor.(*HmpMonitor).read(hmp.go:79)] HMP Read : quit

[info 2024-02-03 09:02:38 monitor.(*HmpMonitor).read(hmp.go:91)] Scan over  ...
[error 2024-02-03 09:02:38 qemu_kvm.(*QemuX86Driver).StartGuest.func2(driver.go:655)] monitor disconnect %!s(<nil>)

default-host-health的日志:

[error 2024-02-02 09:03:27 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:03:37.447Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:03:37 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:03:47.447Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:03:47 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:03:57.448Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:03:57 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:04:07.449Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:04:07 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:04:17.450Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:04:17 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:04:27.450Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:04:27 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:04:37.451Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:04:37 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:04:47.452Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:04:47 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:04:57.453Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:04:57 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:05:07.454Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:05:07 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:05:17.454Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:05:17 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:05:27.454Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:05:27 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:05:37.455Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:05:37 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:05:47.455Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:05:47 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:05:57.456Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:05:57 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:06:07.456Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:06:07 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:06:17.457Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:06:17 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:06:27.458Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:06:27 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:06:37.460Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:06:37 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:06:47.460Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:06:47 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:06:57.462Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:06:57 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:07:07.464Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:07:07 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:07:17.465Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:07:17 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:07:27.465Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:07:27 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:07:37.466Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:07:37 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:07:47.467Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:07:47 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:07:57.468Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:07:57 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:08:07.469Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:08:07 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:08:17.469Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:08:17 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:08:27.470Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:08:27 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:08:37.470Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:08:37 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:08:47.471Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:08:47 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:08:57.472Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[error 2024-02-02 09:08:57 host_health.(*SHostHealthManager).Reconnect(health_manager.go:203)] restart session failed context deadline exceeded
{"level":"warn","ts":"2024-02-02T09:08:59.729Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc0003dfdc0/#initially=[http://default-etcd-client.onecloud.svc:2379]","attempt":0,"error":"rpc error: code = Unavailable desc = error reading from server: read tcp 10.106.26.5:37338->10.106.26.5:2379: read: connection timed out"}
wanyaoqi commented 10 months ago

@chenjacken 看起来是host-agent访问 ceph 失败了,需要确认一下 ceph 集群状态是否正常

chenjacken commented 10 months ago

@chenjacken 看起来是host-agent访问 ceph 失败了,需要确认一下 ceph 集群状态是否正常

[root@master1 ~]# ceph -s
  cluster:
    id:     e4a15469-543d-4dd4-8367-569d27b1b58f
    health: HEALTH_WARN
            15 daemons have recently crashed

  services:
    mon: 3 daemons, quorum i,j,l (age 4h)
    mgr: b(active, since 2d), standbys: a
    osd: 17 osds: 14 up (since 4h), 14 in (since 16h)

  data:
    pools:   3 pools, 1153 pgs
    objects: 231.53k objects, 894 GiB
    usage:   2.6 TiB used, 47 TiB / 50 TiB avail
    pgs:     1153 active+clean

  io:
    client:   2.3 KiB/s rd, 2.6 MiB/s wr, 0 op/s rd, 346 op/s wr
chenjacken commented 10 months ago

重启pod,再同步磁盘状态,host日志显示:

[error 2024-02-04 02:20:39 httperrors.HTTPError(httperrors.go:110)] Send error Storage 1b298235-a82f-4579-8b7a-e6dd2d9916d3 not found
[info 2024-02-04 02:20:39 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 404 0b47ee-d64597-11c67a GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/6f4b9da3-977c-45a1-8a78-e605d87b8adf/status (172.16.1.213:37705:compute_v2) 363.94ms
[error 2024-02-04 02:21:01 httperrors.HTTPError(httperrors.go:110)] Send error Storage 1b298235-a82f-4579-8b7a-e6dd2d9916d3 not found
[info 2024-02-04 02:21:01 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 404 b39dff-d2b699-09bdd8 GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/37cb2e83-eba2-4f52-86cb-722849296034/status (172.16.1.213:22744:compute_v2) 0.38ms
[error 2024-02-04 02:21:07 httperrors.HTTPError(httperrors.go:110)] Send error Storage 1b298235-a82f-4579-8b7a-e6dd2d9916d3 not found
[info 2024-02-04 02:21:07 appsrv.(*Application).ServeHTTP(appsrv.go:288)] 1Dy_ML5U1CU3_77NQE_4CAIeFiA= 404 0a4523-7b95ef-0b042a GET /disks/1b298235-a82f-4579-8b7a-e6dd2d9916d3/7a4be1cf-3a81-47b4-80e2-02378388d2ba/status (172.16.1.213:40057:compute_v2) 0.19ms
wanyaoqi commented 10 months ago

@chenjacken 得先解决一下这个节点访问 ceph的问题,不然这个host 注册不上对应的ceph storage. 看起来 rook-ceph 不稳定,经常会有 pod crash,需要查看日志确认原因。比如可能是内存资源预留不足?或者是网络问题?

chenjacken commented 10 months ago

@chenjacken 得先解决一下这个节点访问 ceph的问题,不然这个host 注册不上对应的ceph storage. 看起来 rook-ceph 不稳定,经常会有 pod crash,需要查看日志确认原因。比如可能是内存资源预留不足?或者是网络问题?

请教下rook-ceph 如何排查问题,对应怎么看日志?

rook-ceph pod资源预留的配置是:

image
[root@master1 ~]# kubectl -n rook-ceph get ConfigMap rook-config-override -o yaml
apiVersion: v1
data:
  config: |
    [global]
    public network =  172.16.1.0/24
    cluster network = 10.0.1.0/24
    public addr = ""
    cluster addr = ""
    osd pool default size = 3
    mon_allow_pool_delete = true
    osd_pool_default_pg_num = 32
    mon_max_pg_per_osd = 250
    mon_osd_full_ratio = 0.95
    mon_osd_nearfull_ratio = 0.85

    [osd]
    osd_recovery_op_priority = 1
    osd_recovery_max_active = 1
    osd_max_backfills = 1
    osd_recovery_max_chunk = 1048576
    osd_scrub_begin_hour = 1
    osd_scrub_end_hour = 6
kind: ConfigMap
metadata:
  annotations:
[root@master1 ~]# ceph crash ls-new
ID                                                                ENTITY  NEW  
2024-02-01T10:43:35.187292Z_33951c20-fb62-4bc5-a510-18931342c728  mon.e    *   
2024-02-01T10:43:35.271997Z_e8a468b2-9007-4ff0-ad1e-ad7fd300b04f  mon.e    *   
2024-02-02T01:07:13.150854Z_0b1a92c0-e340-4897-982a-10d5e2f12d53  osd.4    *   
2024-02-02T01:07:47.914936Z_58d47c59-0e7e-49e2-bac4-b20a56941dec  osd.4    *   
2024-02-02T14:07:30.602672Z_2461b980-4d81-4a7d-adc8-a9e283c51a8c  osd.5    *   
2024-02-02T14:08:42.740582Z_c38b4ea3-3de1-4d07-bb67-fffceadec19a  osd.5    *   
2024-02-03T02:07:15.512683Z_6421a25a-307a-42c0-aa53-0f5502ed38a8  osd.10   *   
2024-02-03T02:07:19.785384Z_96a3dd36-b3ed-465c-8ea0-b66bd5c605aa  osd.0    *   
2024-02-03T02:08:35.567413Z_b01851dd-a776-437f-8340-e4538d3d0a10  osd.0    *   
2024-02-03T02:09:30.284489Z_165b520e-ff86-4a2a-bdc0-2245ab861a37  osd.10   *   
2024-02-03T07:29:09.737937Z_2e98f19d-783b-4441-9ba8-df4356f65e26  osd.4    *   
2024-02-03T07:29:55.762694Z_1cce7885-7564-4460-94a6-a3b9b649ad43  osd.4    *   
2024-02-03T07:29:55.762704Z_d6754c56-eb17-42b6-9d9b-e493f8c790a8  osd.9    *   
2024-02-03T15:22:16.450396Z_f812bed4-b9c4-4cca-8622-668269b8e155  osd.4    *   
2024-02-03T15:22:51.012045Z_e2656828-8c38-4661-99f8-a80e51b2ac22  osd.4    *   
[root@master1 ~]# 
[root@master1 ~]# ceph crash info 2024-02-03T15:22:51.012045Z_e2656828-8c38-4661-99f8-a80e51b2ac22
{
    "assert_condition": "abort",
    "assert_file": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.14/rpm/el8/BUILD/ceph-16.2.14/src/common/HeartbeatMap.cc",
    "assert_func": "bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*, ceph::coarse_mono_time)",
    "assert_line": 85,
    "assert_msg": "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.14/rpm/el8/BUILD/ceph-16.2.14/src/common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*, ceph::coarse_mono_time)' thread 7f04a53ab700 time 2024-02-03T15:21:25.817843+0000\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.14/rpm/el8/BUILD/ceph-16.2.14/src/common/HeartbeatMap.cc: 85: ceph_abort_msg(\"hit suicide timeout\")\n",
    "assert_thread_name": "tp_osd_tp",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7f04c5ec0cf0]",
        "abort()",
        "(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b6) [0x55ac60cc64cb]",
        "(ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point<ceph::coarse_mono_clock, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >)+0x4c9) [0x55ac6144dfe9]",
        "(ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >)+0x23e) [0x55ac6144e39e]",
        "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b0) [0x55ac61472ee0]",
        "(ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55ac61475dd4]",
        "/lib64/libpthread.so.0(+0x81ca) [0x7f04c5eb61ca]",
        "clone()"
    ],
    "ceph_version": "16.2.14",
    "crash_id": "2024-02-03T15:22:51.012045Z_e2656828-8c38-4661-99f8-a80e51b2ac22",
    "entity_name": "osd.4",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-osd",
    "stack_sig": "f5c83f1671dffd9e88e869ef7e5a5ba16e742b0333f64f711e0254759c3df114",
    "timestamp": "2024-02-03T15:22:51.012045Z",
    "utsname_hostname": "node4",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.130-1.yn20230805.el7.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Wed Oct 11 03:26:01 UTC 2023"
}
[root@master1 ~]# ceph crash info 2024-02-01T10:43:35.187292Z_33951c20-fb62-4bc5-a510-18931342c728
{
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12cf0) [0x7f7cc23cdcf0]",
        "gsignal()",
        "abort()",
        "ceph-mon(+0x775316) [0x55a68f376316]",
        "ceph-mon(+0x775432) [0x55a68f376432]",
        "(rocksdb::InstrumentedMutex::Lock()+0x9c) [0x55a68f2c8c0c]",
        "ceph-mon(+0x59e420) [0x55a68f19f420]",
        "(rocksdb::Cleanable::~Cleanable()+0x1c) [0x55a68f32bebc]",
        "(rocksdb::DBIter::~DBIter()+0x4da) [0x55a68f20f7ba]",
        "(rocksdb::ArenaWrappedDBIter::~ArenaWrappedDBIter()+0x23) [0x55a68f3893c3]",
        "(std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release()+0x47) [0x55a68eea1eb7]",
        "(std::_Sp_counted_ptr<MonitorDBStore::WholeStoreIteratorImpl*, (__gnu_cxx::_Lock_policy)2>::_M_dispose()+0x5e) [0x55a68eefc77e]",
        "(std::_Rb_tree<unsigned long, std::pair<unsigned long const, Monitor::SyncProvider>, std::_Select1st<std::pair<unsigned long const, Monitor::SyncProvider> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, Monitor::SyncProvider> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, Monitor::SyncProvider> >*)+0xc8) [0x55a68ef02b68]",
        "(Monitor::~Monitor()+0x39c) [0x55a68eee814c]",
        "(Monitor::~Monitor()+0xd) [0x55a68eee8c4d]",
        "main()",
        "__libc_start_main()",
        "_start()"
    ],
    "ceph_version": "16.2.14",
    "crash_id": "2024-02-01T10:43:35.187292Z_33951c20-fb62-4bc5-a510-18931342c728",
    "entity_name": "mon.e",
    "os_id": "centos",
    "os_name": "CentOS Stream",
    "os_version": "8",
    "os_version_id": "8",
    "process_name": "ceph-mon",
    "stack_sig": "98fe9d7efe083ca88907d5eb1ab86eee77f3abf650476738f28138c3a0b97a5c",
    "timestamp": "2024-02-01T10:43:35.187292Z",
    "utsname_hostname": "node3",
    "utsname_machine": "x86_64",
    "utsname_release": "5.4.130-1.yn20230805.el7.x86_64",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP Wed Oct 11 03:26:01 UTC 2023"
}
[root@master1 ~]# 
wanyaoqi commented 10 months ago

@chenjacken 一般就是看下 rook-ceph namespace 下 pod 状态和日志,ceph的问题需要自己具体排查一下。 有一个点需要确认的是看下宿主机是否开启了大页?大页会提前分配内存,确认一下大页预留的内存是否够用

chenjacken commented 10 months ago

@chenjacken 一般就是看下 rook-ceph namespace 下 pod 状态和日志,ceph的问题需要自己具体排查一下。 有一个点需要确认的是看下宿主机是否开启了大页?大页会提前分配内存,确认一下大页预留的内存是否够用

嗯,谢谢。 有开启大页,没特意配置做预留,都是默认的。一般预留多少比较合适?通过什么计算吗?

[root@node4 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:            62G         57G        887M        3.2G        4.7G        1.2G
Swap:            0B          0B          0B
[root@node4 ~]# cat /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
50
[root@node4 ~]# cat /sys/kernel/mm/hugepages/hugepages-1048576kB/free_hugepages
22
[root@node4 ~]# 
wanyaoqi commented 10 months ago

@chenjacken 一般控制节点默认不会开启大页,ceph是否运行在控制节点上? 计算节点默认是 20% 内存预留,如果上面没有跑其他特殊服务的话应该不用特殊配置

chenjacken commented 10 months ago

@chenjacken 一般控制节点默认不会开启大页,ceph是否运行在控制节点上? 计算节点默认是 20% 内存预留,如果上面没有跑其他特殊服务的话应该不用特殊配置

好的,谢谢。rook-ceph的mon没特意指派到控制节点, mon pod会跳动。

chenjacken commented 10 months ago

@chenjacken 是这个地址,虚机的网络通吗,vpc网络 还是经典网络

虚拟机网络是通的,是经典网络。

wanyaoqi commented 10 months ago

@chenjacken 可以在虚机里面访问一下这个地址试试,理论上虚机是通的这个地址应该也没问题。 也可能是 host-agent 当时没启动?metadata server 是在 host-agent 服务中启动的

chenjacken commented 10 months ago

@chenjacken 可以在虚机里面访问一下这个地址试试,理论上虚机是通的这个地址应该也没问题。 也可能是 host-agent 当时没启动?metadata server 是在 host-agent 服务中启动的

image

host-agent 是一个pod吗?

node7是虚拟机所在的宿主机。

[root@master1 ~]# kubectl get pods -n onecloud|grep agent
default-esxi-agent-57f79cb476-stsv6                  1/1     Running       0          13h
default-lbagent-t847s                                2/2     Running       5          8h
default-vpcagent-8557ff4466-cbnmv                    1/1     Running       0          13h
[root@master1 ~]# kubectl get pods -n onecloud -owide |grep node7
default-host-deployer-lhrpz                          1/1     Running       0          13h     172.16.1.231    node7      <none>           <none>
default-host-health-9nz6b                            1/1     Running       0          13h     172.16.1.231    node7      <none>           <none>
default-host-image-4dbhr                             1/1     Running       0          12h     172.16.1.231    node7      <none>           <none>
default-host-ngvsl                                   3/3     Running       0          13h     172.16.1.231    node7      <none>           <none>
default-telegraf-5ctkn                               1/1     Running       0          14h     172.16.1.231    node7      <none>           <none>
[root@master1 ~]# 
wanyaoqi commented 10 months ago

@chenjacken 就是 default-host pod ,http 不通的话看起来是有问题,ping 是不通的。 可以看下 host 日志中是否有对应的请求日志。 还可以使用 ovs-ofctl dump-flows br0 | grep 169.254 看下有没有对应的流表。如果没有就需要看下 sdnagent 日志排查一下

chenjacken commented 10 months ago

感觉是那里出错了吗?

[root@node9 ~]# ovs-ofctl dump-flows br0  | grep 169.254
 cookie=0x0, duration=104795.600s, table=0, n_packets=0, n_bytes=0, idle_age=65534, hard_age=65534, priority=29310,tcp,in_port=LOCAL,nw_dst=169.254.169.254,tp_dst=80 actions=NORMAL

default-host里的sdnagent日志:

[info 2024-02-05 15:31:00 server.(*FlowMan).doCheck(flowman.go:92)] flowman brmapped: check done
[info 2024-02-05 15:31:00 server.(*FlowMan).doCheck(flowman.go:130)] flowman br0: check done
[info 2024-02-05 15:31:00 server.(*FlowMan).Start(flowman.go:232)] flowman br1: do idle check
[info 2024-02-05 15:31:00 server.(*FlowMan).doCheck(flowman.go:130)] flowman br1: check done
[info 2024-02-05 15:31:02 server.(*ovnMdMan).cleanup(ovn-md.go:1006)] ovnMd: clean done
[info 2024-02-05 15:31:05 server.(*FlowMan).Start(flowman.go:232)] flowman brtap: do idle check
[info 2024-02-05 15:31:05 server.(*FlowMan).doCheck(flowman.go:130)] flowman brtap: check done
[info 2024-02-05 15:31:11 options.parseOptions(options.go:331)] Use configuration file: /etc/yunion/host.conf
[info 2024-02-05 15:31:11 options.parseOptions(options.go:354)] Set log level to "info"
[info 2024-02-05 15:31:11 options.parseOptions(options.go:331)] Use configuration file: /etc/yunion/common/common.conf
[info 2024-02-05 15:31:11 options.parseOptions(options.go:354)] Set log level to "info"
[info 2024-02-05 15:31:13 server.(*TcMan).doIdleCheck(tcman.go:155)] tcman: doing idle check
[info 2024-02-05 15:31:13 server.(*TcMan).doCheckPage(tcman.go:177)] skip dhcp1-196: it uses mq
[info 2024-02-05 15:31:13 server.(*TcMan).doCheckPage(tcman.go:177)] skip dhcp1-190: it uses mq
[info 2024-02-05 15:31:13 server.(*TcMan).doIdleCheck(tcman.go:160)] tcman: done idle check
[info 2024-02-05 15:31:13 server.(*FlowMan).Start(flowman.go:232)] flowman brmapped: do idle check
[error 2024-02-05 15:31:13 server.(*FlowMan).doDumpFlows(flowman.go:72)] flowman brmapped: dump-flows failed: exit status 1: ovs-ofctl: brmapped is not a bridge or a socket
[error 2024-02-05 15:31:13 server.(*FlowMan).doCheck(flowman.go:91)] FlowMan doCheck doDumpFlows fail DumpFlows: exit status 1: ovs-ofctl: brmapped is not a bridge or a socket
[info 2024-02-05 15:31:13 server.(*FlowMan).doCheck(flowman.go:92)] flowman brmapped: check done
chenjacken commented 10 months ago

另外,镜像处理感觉容易出现故障,会存在等待保存一直不动,除了上面fs问题(等待修复结果🌹),是否我这环境里的minio存在了问题。?

API: BackgroundHeal()
Time: 14:54:51 UTC 02/05/2024
DeploymentID: dfa47b2d-e946-4087-8f19-f4cdd587965e
Error: Heal attempt failed for .minio.sys/buckets/.usage.json: Storage resources are insufficient for the read operation .minio.sys/buckets/.usage.json (*fmt.wrapError)
       2: cmd/admin-heal-ops.go:807:cmd.(*healSequence).healItemsFromSourceCh()
       1: cmd/admin-heal-ops.go:818:cmd.(*healSequence).healFromSourceCh()

具体的日志:

[root@master1 ~]# kubectl get pods -n onecloud-minio -owide
NAME      READY   STATUS    RESTARTS   AGE   IP              NODE      NOMINATED NODE   READINESS GATES
minio-0   1/1     Running   0          41m   10.40.136.61    master3   <none>           <none>
minio-1   1/1     Running   0          12h   10.40.137.100   master1   <none>           <none>
minio-2   1/1     Running   0          27h   10.40.180.26    master2   <none>           <none>
minio-3   1/1     Running   0          12h   10.40.137.69    master1   <none>           <none>

[root@master1 ~]# kubectl -n onecloud-minio logs minio-0

API: SYSTEM()
Time: 14:52:38 UTC 02/05/2024
Error: lookup minio-1.minio-svc.onecloud-minio.svc.cluster.local on 10.96.0.10:53: read udp 10.40.136.61:40916->10.96.0.10:53: i/o timeout (*net.DNSError)
       host=minio-1.minio-svc.onecloud-minio.svc.cluster.local, elapsedTime=50 seconds elapsed
       5: cmd/endpoint.go:483:cmd.Endpoints.UpdateIsLocal()
       4: cmd/endpoint.go:667:cmd.CreateEndpoints()
       3: cmd/endpoint-ellipses.go:373:cmd.createServerEndpoints()
       2: cmd/server-main.go:145:cmd.serverHandleCmdArgs()
       1: cmd/server-main.go:440:cmd.serverMain()

API: SYSTEM()
Time: 14:53:29 UTC 02/05/2024
Error: lookup minio-2.minio-svc.onecloud-minio.svc.cluster.local on 10.96.0.10:53: read udp 10.40.136.61:52327->10.96.0.10:53: i/o timeout (*net.DNSError)
       elapsedTime=1 minute elapsed, host=minio-2.minio-svc.onecloud-minio.svc.cluster.local
       5: cmd/endpoint.go:483:cmd.Endpoints.UpdateIsLocal()
       4: cmd/endpoint.go:667:cmd.CreateEndpoints()
       3: cmd/endpoint-ellipses.go:373:cmd.createServerEndpoints()
       2: cmd/server-main.go:145:cmd.serverHandleCmdArgs()
       1: cmd/server-main.go:440:cmd.serverMain()

API: SYSTEM()
Time: 14:54:19 UTC 02/05/2024
Error: lookup minio-3.minio-svc.onecloud-minio.svc.cluster.local on 10.96.0.10:53: read udp 10.40.136.61:33450->10.96.0.10:53: i/o timeout (*net.DNSError)
       host=minio-3.minio-svc.onecloud-minio.svc.cluster.local, elapsedTime=2 minutes elapsed
       5: cmd/endpoint.go:483:cmd.Endpoints.UpdateIsLocal()
       4: cmd/endpoint.go:667:cmd.CreateEndpoints()
       3: cmd/endpoint-ellipses.go:373:cmd.createServerEndpoints()
       2: cmd/server-main.go:145:cmd.serverHandleCmdArgs()
       1: cmd/server-main.go:440:cmd.serverMain()
WARNING: MINIO_ACCESS_KEY and MINIO_SECRET_KEY are deprecated.
         Please use MINIO_ROOT_USER and MINIO_ROOT_PASSWORD

 You are running an older version of MinIO released 2 years ago 
 Update: Run `mc admin update` 

Waiting for all MinIO sub-systems to be initialized.. lock acquired
Verifying if 1 bucket is consistent across drives...
All MinIO sub-systems initialized successfully
Waiting for all MinIO IAM sub-system to be initialized.. lock acquired
IAM initialization complete
Status:         4 Online, 0 Offline. 
Endpoint: http://10.40.136.61:9000  http://127.0.0.1:9000 

Browser Access:
   http://10.40.136.61:9000  http://127.0.0.1:9000

Object API (Amazon S3 compatible):
   Go:         https://docs.min.io/docs/golang-client-quickstart-guide
   Java:       https://docs.min.io/docs/java-client-quickstart-guide
   Python:     https://docs.min.io/docs/python-client-quickstart-guide
   JavaScript: https://docs.min.io/docs/javascript-client-quickstart-guide
   .NET:       https://docs.min.io/docs/dotnet-client-quickstart-guide

API: BackgroundHeal()
Time: 14:54:51 UTC 02/05/2024
DeploymentID: dfa47b2d-e946-4087-8f19-f4cdd587965e
Error: Heal attempt failed for .minio.sys/buckets/.usage.json: Storage resources are insufficient for the read operation .minio.sys/buckets/.usage.json (*fmt.wrapError)
       2: cmd/admin-heal-ops.go:807:cmd.(*healSequence).healItemsFromSourceCh()
       1: cmd/admin-heal-ops.go:818:cmd.(*healSequence).healFromSourceCh()
[root@master1 ~]# 

[root@master1 ~]# kubectl -n onecloud-minio logs minio-1

API: SYSTEM()
Time: 03:18:23 UTC 02/05/2024
Error: lookup minio-0.minio-svc.onecloud-minio.svc.cluster.local on 10.96.0.10:53: read udp 10.40.137.100:59775->10.96.0.10:53: i/o timeout (*net.DNSError)
       host=minio-0.minio-svc.onecloud-minio.svc.cluster.local, elapsedTime=50 seconds elapsed
       5: cmd/endpoint.go:483:cmd.Endpoints.UpdateIsLocal()
       4: cmd/endpoint.go:667:cmd.CreateEndpoints()
       3: cmd/endpoint-ellipses.go:373:cmd.createServerEndpoints()
       2: cmd/server-main.go:145:cmd.serverHandleCmdArgs()
       1: cmd/server-main.go:440:cmd.serverMain()

API: SYSTEM()
Time: 03:19:13 UTC 02/05/2024
Error: lookup minio-2.minio-svc.onecloud-minio.svc.cluster.local on 10.96.0.10:53: read udp 10.40.137.100:54355->10.96.0.10:53: i/o timeout (*net.DNSError)
       host=minio-2.minio-svc.onecloud-minio.svc.cluster.local, elapsedTime=1 minute elapsed
       5: cmd/endpoint.go:483:cmd.Endpoints.UpdateIsLocal()
       4: cmd/endpoint.go:667:cmd.CreateEndpoints()
       3: cmd/endpoint-ellipses.go:373:cmd.createServerEndpoints()
       2: cmd/server-main.go:145:cmd.serverHandleCmdArgs()
       1: cmd/server-main.go:440:cmd.serverMain()

API: SYSTEM()
Time: 03:20:03 UTC 02/05/2024
Error: lookup minio-3.minio-svc.onecloud-minio.svc.cluster.local on 10.96.0.10:53: read udp 10.40.137.100:40491->10.96.0.10:53: i/o timeout (*net.DNSError)
       host=minio-3.minio-svc.onecloud-minio.svc.cluster.local, elapsedTime=2 minutes elapsed
       5: cmd/endpoint.go:483:cmd.Endpoints.UpdateIsLocal()
       4: cmd/endpoint.go:667:cmd.CreateEndpoints()
       3: cmd/endpoint-ellipses.go:373:cmd.createServerEndpoints()
       2: cmd/server-main.go:145:cmd.serverHandleCmdArgs()
       1: cmd/server-main.go:440:cmd.serverMain()
WARNING: MINIO_ACCESS_KEY and MINIO_SECRET_KEY are deprecated.
         Please use MINIO_ROOT_USER and MINIO_ROOT_PASSWORD

 You are running an older version of MinIO released 2 years ago 
 Update: Run `mc admin update` 

Waiting for all MinIO sub-systems to be initialized.. lock acquired
Verifying if 1 bucket is consistent across drives...
All MinIO sub-systems initialized successfully
Waiting for all MinIO IAM sub-system to be initialized.. lock acquired
IAM initialization complete
Status:         4 Online, 0 Offline. 
Endpoint: http://10.40.137.100:9000  http://127.0.0.1:9000 

Browser Access:
   http://10.40.137.100:9000  http://127.0.0.1:9000

Object API (Amazon S3 compatible):
   Go:         https://docs.min.io/docs/golang-client-quickstart-guide
   Java:       https://docs.min.io/docs/java-client-quickstart-guide
   Python:     https://docs.min.io/docs/python-client-quickstart-guide
   JavaScript: https://docs.min.io/docs/javascript-client-quickstart-guide
   .NET:       https://docs.min.io/docs/dotnet-client-quickstart-guide

API: SYSTEM()
Time: 14:45:59 UTC 02/05/2024
DeploymentID: dfa47b2d-e946-4087-8f19-f4cdd587965e
Error: Marking http://minio-0.minio-svc.onecloud-minio.svc.cluster.local:9000/minio/lock/v6 temporary offline; caused by Post "http://minio-0.minio-svc.onecloud-minio.svc.cluster.local:9000/minio/lock/v6/lock?owner=minio-1.minio-svc.onecloud-minio.svc.cluster.local%3A9000&quorum=3&source=%5Bdata-scanner.go%3A103%3ArunDataScanner%28%29%5D&uid=e3efc0db-4493-46b1-b867-6bfb566336d0": dial tcp 10.40.136.11:9000: connect: connection refused (*fmt.wrapError)
       5: internal/rest/client.go:147:rest.(*Client).Call()
       4: cmd/lock-rest-client.go:66:cmd.(*lockRESTClient).callWithContext()
       3: cmd/lock-rest-client.go:102:cmd.(*lockRESTClient).restCall()
       2: cmd/lock-rest-client.go:121:cmd.(*lockRESTClient).Lock()
       1: internal/dsync/drwmutex.go:393:dsync.lock.func1()

API: SYSTEM()
Time: 14:46:12 UTC 02/05/2024
DeploymentID: dfa47b2d-e946-4087-8f19-f4cdd587965e
Error: Marking http://minio-0.minio-svc.onecloud-minio.svc.cluster.local:9000/minio/storage/export/v37 temporary offline; caused by Post "http://minio-0.minio-svc.onecloud-minio.svc.cluster.local:9000/minio/storage/export/v37/listvols?disk-id=b742c870-5f02-40ce-9ea4-6dd270baa0da": dial tcp 10.40.136.11:9000: i/o timeout (*fmt.wrapError)
       5: internal/rest/client.go:147:rest.(*Client).Call()
       4: cmd/storage-rest-client.go:151:cmd.(*storageRESTClient).call()
       3: cmd/storage-rest-client.go:325:cmd.(*storageRESTClient).ListVols()
       2: cmd/erasure-healing.go:182:cmd.listAllBuckets.func1()
       1: internal/sync/errgroup/errgroup.go:123:errgroup.(*Group).Go.func1()

API: SYSTEM()
Time: 14:46:22 UTC 02/05/2024
DeploymentID: dfa47b2d-e946-4087-8f19-f4cdd587965e
Error: lookup minio-0.minio-svc.onecloud-minio.svc.cluster.local on 10.96.0.10:53: dial udp 10.96.0.10:53: i/o timeout (*net.DNSError)
       4: internal/logger/logonce.go:54:logger.(*logOnceType).logOnceIf()
       3: internal/logger/logonce.go:94:logger.LogOnceIf()
       2: internal/http/dial_dnscache.go:188:http.(*DNSCache).Refresh()
       1: internal/http/dial_dnscache.go:139:http.NewDNSCache.func1()

API: SYSTEM()
Time: 14:46:26 UTC 02/05/2024
DeploymentID: dfa47b2d-e946-4087-8f19-f4cdd587965e
Error: Disk: http://minio-0.minio-svc.onecloud-minio.svc.cluster.local:9000/export returned disk not found (*fmt.wrapError)
       endpoint=http://minio-0.minio-svc.onecloud-minio.svc.cluster.local:9000/export
       2: cmd/prepare-storage.go:50:cmd.glob..func7.1()
       1: cmd/erasure-sets.go:228:cmd.(*erasureSets).connectDisks.func2()

API: SYSTEM()
Time: 14:46:27 UTC 02/05/2024
DeploymentID: dfa47b2d-e946-4087-8f19-f4cdd587965e
Error: lookup minio-2.minio-svc.onecloud-minio.svc.cluster.local on 10.96.0.10:53: dial udp 10.96.0.10:53: i/o timeout (*net.DNSError)
       4: internal/logger/logonce.go:54:logger.(*logOnceType).logOnceIf()
       3: internal/logger/logonce.go:94:logger.LogOnceIf()
       2: internal/http/dial_dnscache.go:188:http.(*DNSCache).Refresh()
       1: internal/http/dial_dnscache.go:139:http.NewDNSCache.func1()

API: SYSTEM()
Time: 14:51:17 UTC 02/05/2024
DeploymentID: dfa47b2d-e946-4087-8f19-f4cdd587965e
Error: lookup minio-0.minio-svc.onecloud-minio.svc.cluster.local on 10.96.0.10:53: no such host (*net.DNSError)
       4: internal/logger/logonce.go:54:logger.(*logOnceType).logOnceIf()
       3: internal/logger/logonce.go:94:logger.LogOnceIf()
       2: internal/http/dial_dnscache.go:188:http.(*DNSCache).Refresh()
       1: internal/http/dial_dnscache.go:139:http.NewDNSCache.func1()
Client http://minio-0.minio-svc.onecloud-minio.svc.cluster.local:9000/minio/storage/export/v37 online
Client http://minio-0.minio-svc.onecloud-minio.svc.cluster.local:9000/minio/lock/v6 online
[root@master1 ~]# 
wanyaoqi commented 10 months ago

另外,镜像处理感觉容易出现故障,会存在等待保存一直不动,除了上面fs问题(等待修复结果🌹),是否我这环境里的minio存在了问题。?

xfs 问题已经修复了,https://github.com/yunionio/cloudpods/pull/19456 你可以更新一下host-deployer 镜像 registry.cn-hangzhou.aliyuncs.com/d3lx/host-deployer:20240205-210119

镜像是否保存失败需要先看下 glance 服务的日志

chenjacken commented 10 months ago

另外,镜像处理感觉容易出现故障,会存在等待保存一直不动,除了上面fs问题(等待修复结果🌹),是否我这环境里的minio存在了问题。?

xfs 问题已经修复了,#19456 你可以更新一下host-deployer 镜像 registry.cn-hangzhou.aliyuncs.com/d3lx/host-deployer:20240205-210119

镜像是否保存失败需要先看下 glance 服务的日志

好的,谢谢,点赞🌹

minio-0看日志貌似是有问题,帮忙看看如何排查


[root@master1 ~]# kubectl -n onecloud-minio logs minio-0
Browser Access:
   http://10.40.136.61:9000  http://127.0.0.1:9000

Object API (Amazon S3 compatible):
   Go:         https://docs.min.io/docs/golang-client-quickstart-guide
   Java:       https://docs.min.io/docs/java-client-quickstart-guide
   Python:     https://docs.min.io/docs/python-client-quickstart-guide
   JavaScript: https://docs.min.io/docs/javascript-client-quickstart-guide
   .NET:       https://docs.min.io/docs/dotnet-client-quickstart-guide

API: BackgroundHeal()
Time: 14:54:51 UTC 02/05/2024
DeploymentID: dfa47b2d-e946-4087-8f19-f4cdd587965e
Error: Heal attempt failed for .minio.sys/buckets/.usage.json: Storage resources are insufficient for the read operation .minio.sys/buckets/.usage.json (*fmt.wrapError)
       2: cmd/admin-heal-ops.go:807:cmd.(*healSequence).healItemsFromSourceCh()
       1: cmd/admin-heal-ops.go:818:cmd.(*healSequence).healFromSourceCh()
[root@master1 ~]# 
wanyaoqi commented 10 months ago

Error: Heal attempt failed for .minio.sys/buckets/.usage.json: Storage resources are insufficient for the read operation .minio.sys/buckets/.usage.json (*fmt.wrapError)

@chenjacken 看这个报错应该就是minio 使用的磁盘 io 太慢了,但是应该不影响使用。你先看 glance 日志找找有没有报错

chenjacken commented 10 months ago

Error: Heal attempt failed for .minio.sys/buckets/.usage.json: Storage resources are insufficient for the read operation .minio.sys/buckets/.usage.json (*fmt.wrapError)

@chenjacken 看这个报错应该就是minio 使用的磁盘 io 太慢了,但是应该不影响使用。你先看 glance 日志找找有没有报错

好的,谢谢。那个有问题(一直显示等待)的镜像我已经删除了,后续再留意情况。感谢感谢!🌹

chenjacken commented 10 months ago

另外,镜像处理感觉容易出现故障,会存在等待保存一直不动,除了上面fs问题(等待修复结果🌹),是否我这环境里的minio存在了问题。?

xfs 问题已经修复了,#19456 你可以更新一下host-deployer 镜像 registry.cn-hangzhou.aliyuncs.com/d3lx/host-deployer:20240205-210119

镜像是否保存失败需要先看下 glance 服务的日志

升级了,新建虚拟机时候(镜像是Windows-2016),出现”磁盘分配失败“,具体的web日志内容如下:

{
    "__reason__": {
        "reason": {
            "__reason__": {
                "reason": "{\"__reason__\":{\"reason\":\"{\\\"__reason__\\\":{\\\"reason\\\":{\\\"image_id\\\":\\\"809470c0-0e38-49b0-8e3f-fb522f2b6598\\\",\\\"reason\\\":{\\\"__reason__\\\":\\\"AcquireImage: convert loca image 809470c0-0e38-49b0-8e3f-fb522f2b6598 to rbd pool hddpool at host : exit status 1\\\",\\\"__stage__\\\":\\\"OnImageCacheComplete\\\",\\\"__status__\\\":\\\"error\\\"}},\\\"stage\\\":\\\"OnImageCacheComplete\\\"},\\\"__stage__\\\":\\\"OnStorageCacheImageComplete\\\",\\\"__status__\\\":\\\"error\\\",\\\"__task_name__\\\":\\\"StorageCacheImageTask\\\"}\",\"stage\":\"OnStorageCacheImageComplete\"},\"__stage__\":\"on_kvm_disk_prepared\",\"__status__\":\"error\",\"__task_name__\":\"DiskCreateTask\"}",
                "stage": "on_kvm_disk_prepared"
            },
            "__stage__": "on_disk_prepared",
            "__status__": "error",
            "__task_name__": "KVMGuestCreateDiskTask"
        },
        "stage": "on_disk_prepared"
    },
    "__stage__": "OnDiskPrepared",
    "__status__": "error",
    "__task_name__": "GuestCreateDiskTask"
}

另外一个虚拟机的报错信息:

{
    "__reason__": {
        "reason": {
            "__reason__": {
                "reason": "{\"__reason__\":{\"reason\":{\"__reason__\":\"cloneImage(image_cache_809470c0-0e38-49b0-8e3f-fb522f2b6598): findOrCreateSnap: CreateSnapshot: snap create: rbd 2024-02-06T04:58:33.279+0000 7f98537fe700 -1 librbd::SnapshotCreateRequest: failed to allocate snapshot id: (110) Connection timed outrbd: failed to create snapshot: \\n(110) Connection timed out\\n: exit status 110\",\"__stage__\":\"OnDiskReady\",\"__status__\":\"error\"},\"stage\":\"OnDiskReady\"},\"__stage__\":\"on_kvm_disk_prepared\",\"__status__\":\"error\",\"__task_name__\":\"DiskCreateTask\"}",
                "stage": "on_kvm_disk_prepared"
            },
            "__stage__": "on_disk_prepared",
            "__status__": "error",
            "__task_name__": "KVMGuestCreateDiskTask"
        },
        "stage": "on_disk_prepared"
    },
    "__stage__": "OnDiskPrepared",
    "__status__": "error",
    "__task_name__": "GuestCreateDiskTask"
}