ossobv / proxmove

Migrate virtual machines between different Proxmox VE clusters with minimal downtime
270 stars 31 forks source link

ERROR: Failed to create VM (at target cluster) #42

Closed megabert closed 2 years ago

megabert commented 2 years ago

Hi,

I'm having issues when migrating:

# ./proxmove --no-verify-ssl px04 pxc01 px03 localdisk kvm09-xxx

2022-02-02 14:24:20,067: INFO: Attempt moving px04<6ddebafe> => pxc01<6ddebafe> (node 'px03'): kvm09-xxx
2022-02-02 14:24:20,067: INFO: - source VM kvm09-xxx@px04<qemu/100/running>
2022-02-02 14:24:20,067: INFO: - storage 'ide2': None,media=cdrom (host=<unknown>, guest=<unknown>)
2022-02-02 14:24:20,067: INFO: - storage 'scsi0': localdisk:100/vm-100-disk-0.qcow2,cache=unsafe,discard=on,size=10G (host=10.0GiB, guest=10.0GiB)
2022-02-02 14:24:20,070: INFO: Creating new VM 'kvm09-xxx--CREATING' on 'pxc01', node 'px03'
2022-02-02 14:24:20,100: ERROR: Failed to create VM with parameters:

  # https://px03.xxx:8006/api2/json
  api.nodes("px03/qemu").create(**{'smbios1': 'uuid=9bdc528d-3a9a-424c-8cb7-5944832d741e', 'tags': 'prod', 'ostype': 'l26', 'boot': 'cd', 'sockets': 1, 'numa': 0, 'name': 'kvm09-xxx--CREATING', 'bootdisk': 'scsi0', 'net0': 'virtio=12:05:48:BF:7E:91,bridge=vmbr1,firewall=1', 'onboot': 1, 'meta': 'creation-qemu=6.1.0,ctime=1643742755', 'cores': 1, 'memory': 2048, 'agent': '1', 'vmid': 128})

Traceback (most recent call last):
  File "/root/proxmove/./proxmove", line 1983, in _start_moving_vm
    dst_vm = self.dst_pve.get_vm(
  File "/root/proxmove/./proxmove", line 572, in get_vm
    raise ProxmoxVm.DoesNotExist(
ProxmoxVm.DoesNotExist: VM named 'kvm09-xxx' not found in cluster 'pxc01'; do you have the PVEVMAdmin role?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/proxmove/./proxmove", line 477, in create_vm
    vmhash = getattr(api_node, 'qemu').create(**mutable_config)
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 135, in create
    return self.post(*args, **data)
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 126, in post
    return self(args)._request("POST", data=data)
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 105, in _request
    raise ResourceException(
proxmoxer.core.ResourceException: 400 Bad Request: Parameter verification failed.
Traceback (most recent call last):
  File "/root/proxmove/./proxmove", line 1983, in _start_moving_vm
    dst_vm = self.dst_pve.get_vm(
  File "/root/proxmove/./proxmove", line 572, in get_vm
    raise ProxmoxVm.DoesNotExist(
__main__.DoesNotExist: VM named 'kvm09-xxx' not found in cluster 'pxc01'; do you have the PVEVMAdmin role?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/proxmove/./proxmove", line 2299, in <module>
    main()
  File "/root/proxmove/./proxmove", line 2295, in main
    vmmover.run(options.dry_run)
  File "/root/proxmove/./proxmove", line 1935, in run
    self.move_vm(vm, translator, dry_run)
  File "/root/proxmove/./proxmove", line 1975, in move_vm
    dst_vm = self._start_moving_vm(src_vm, translator)
  File "/root/proxmove/./proxmove", line 1989, in _start_moving_vm
    dst_vm = self.dst_pve.create_vm(
  File "/root/proxmove/./proxmove", line 477, in create_vm
    vmhash = getattr(api_node, 'qemu').create(**mutable_config)
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 135, in create
    return self.post(*args, **data)
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 126, in post
    return self(args)._request("POST", data=data)
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 105, in _request
    raise ResourceException(
proxmoxer.core.ResourceException: 400 Bad Request: Parameter verification failed.
root@px04:~/proxmove# 

This is my .proxmoverc on both nodes:

# pxc01 is a cluster with 4 nodes
[pve:pxc01]
        api=https://adminrobot@pve:secret@px03.xxx:8006

        [storage:pxc01:localdisk@px01]
                ssh=root@px01.xxx
                path=/proxmox/images
                temp=/proxmox/temp

        [storage:pxc01:localdisk@px03]
                ssh=root@px03.xxx
                path=/proxmox/images
                temp=/proxmox/temp

# px04 is a single node
[pve:px04]
        api=https://adminrobot@pve:secret@px04.xxx:8006

        [storage:px04:localdisk@px04]
                ssh=root@px04.xxx
                path=/proxmox/images
                temp=/proxmox/temp

I'd logged into both clusters with the adminrobot account with the given password and verified that I have full "Administrator" - Role privileges. The proxmove from pxc01 to px04 is working(sometimes, sometimes not with the same error shown here), but the way back never works due to the given error message.

All Proxmox Nodes are up2date with the no-subscription repo:

pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-3-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-8
pve-kernel-5.13: 7.1-6
pve-kernel-5.4: 6.4-12
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.4.162-1-pve: 5.4.162-2
pve-kernel-5.4.140-1-pve: 5.4.140-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-2
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.4-1
proxmox-backup-file-restore: 2.1.4-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-5
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1
wdoekes commented 2 years ago

I think there are two exceptions and you're only showing us the first one. Am I correct?

That first Traceback is not a problem, as that is caught on L1985: https://github.com/ossobv/proxmove/blob/be56486531d971b08d1e48f1458fa54bf65a53f5/proxmove#L1981-L1991

Ah, indeed, your edits add this one:

Traceback (most recent call last):
  File "/root/proxmove/./proxmove", line 2299, in <module>
    main()
  File "/root/proxmove/./proxmove", line 2295, in main
    vmmover.run(options.dry_run)
  File "/root/proxmove/./proxmove", line 1935, in run
    self.move_vm(vm, translator, dry_run)
  File "/root/proxmove/./proxmove", line 1975, in move_vm
    dst_vm = self._start_moving_vm(src_vm, translator)
  File "/root/proxmove/./proxmove", line 1989, in _start_moving_vm
    dst_vm = self.dst_pve.create_vm(
  File "/root/proxmove/./proxmove", line 477, in create_vm
    vmhash = getattr(api_node, 'qemu').create(**mutable_config)
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 135, in create
    return self.post(*args, **data)
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 126, in post
    return self(args)._request("POST", data=data)
  File "/usr/local/lib/python3.9/dist-packages/proxmoxer/core.py", line 105, in _request
    raise ResourceException(
proxmoxer.core.ResourceException: 400 Bad Request: Parameter verification failed.

Apparently it dislikes one of these:

api.nodes("px03/qemu").create(**{
  'smbios1': 'uuid=9bdc528d-3a9a-424c-8cb7-5944832d741e',
  'tags': 'prod', 'ostype': 'l26', 'boot': 'cd', 'sockets': 1, 'numa': 0,
  'name': 'kvm09-xxx--CREATING', 'bootdisk': 'scsi0',
  'net0': 'virtio=12:05:48:BF:7E:91,bridge=vmbr1,firewall=1',
  'onboot': 1, 'meta': 'creation-qemu=6.1.0,ctime=1643742755',
  'cores': 1, 'memory': 2048, 'agent': '1', 'vmid': 128
})

What you can do is add a breakpoint before L477:

        try:
            import pdb; pdb.set_trace()  # <-- add this line
            vmhash = getattr(api_node, 'qemu').create(**mutable_config)
        except ResourceException:
            log.exception(

Then you can try running the code above manually, whilst leaving out one or more arguments.

E.g.

(pdb) mydict = {'smbios1': 'uuid=9bdc528d-3a9a-424c-8cb7-5944832d741e', 'tags': 'prod', 'ostype': 'l26', 'boot': 'cd', 'sockets': 1, 'numa': 0, 'name': 'kvm09-xxx--CREATING', 'bootdisk': 'scsi0', 'net0': 'virtio=12:05:48:BF:7E:91,bridge=vmbr1,firewall=1', 'onboot': 1, 'meta': 'creation-qemu=6.1.0,ctime=1643742755', 'cores': 1, 'memory': 2048, 'agent': '1', 'vmid': 128}
(pdb) tmpdict = mydict.copy()
(pdb) del tmpdict['tags']
(pdb) api.nodes("px03/qemu").create(**tmpdict)

If that works, then the new proxmox doesn't handle the tags parameter.

wdoekes commented 2 years ago

Also, you'll likely want one of these fixes, so you get useful info from the proxmox 400 error:

https://github.com/ossobv/proxmove#debugging https://github.com/proxmoxer/proxmoxer/commit/6c980a9f9c364f4f317924f0425f87ba16fbeb6f https://github.com/swayf/proxmoxer/pull/83/files https://github.com/swayf/proxmoxer/pull/55/commits/065ec5963189da3a7cf5e7c93f154033a1ba9fff

megabert commented 2 years ago

Thanks for the support. The patch for the issues you hinted for additional debug info is already applied to my proxmoxer Version.

I created a new test-VM.

The issue is caused by the "meta" key:

(Pdb) mydict = {'ostype': 'l26', 'smbios1': 'uuid=5142e2cd-6f73-4ddc-abce-eb75805c3cbb', 'boot': 'cd', 'net0': 'virtio=CA:50:BA:45:1E:88,bridge=vmbr0,firewall=1', 'name': 'testvm--CREATING', 'numa': 0, 'scsihw': 'virtio-scsi-pci', 'sockets': 1, 'cores': 1, 'meta': 'creation-qemu=6.1.0,ctime=1643811510', 'memory': 2048, 'vmid': 128}
(Pdb) tmpdict = mydict.copy()
(Pdb) self.api.nodes("px03/qemu").create(**tmpdict)
*** proxmoxer.core.ResourceException: 400 Bad Request: Parameter verification failed.
(Pdb) del tmpdict['meta']
(Pdb) self.api.nodes("px03/qemu").create(**tmpdict)
'UPID:px03:000B75E2:097D2D73:61FA9776:qmcreate:128:adminrobot@pve:'

A "meta" - Key is not listed in Proxmox PVE API Viewer:

https://pve.proxmox.com/pve-docs/api-viewer/index.html#/nodes/{node}/qemu

This may serve as a first workaround:

+++ proxmove    2022-02-02 15:53:26.243581105 +0100
@@ -466,6 +466,7 @@
         # Guess new VMID, set id and name.
         vmid = self.get_free_vmid()
         mutable_config['vmid'] = vmid
+        del mutable_config['meta']
         mutable_config['name'] = name_with_suffix
         assert 'hostname' not in mutable_config, mutable_config  # lxc??
wdoekes commented 2 years ago

Nice going. This commit should fix it for you.