RFE: Migrating linked ZFS clone disks: not possible

m-witt commented 4 years ago

When i try to migrate a VM which has a "linked-clone" Disk i get the following error


root@prx001:~/proxmove# ./proxmove overlords cluster2 pve data abtest.xxxxx.de --no-verify-ssl
Traceback (most recent call last):
  File "./proxmove", line 1748, in <module>
    main()
  File "./proxmove", line 1738, in main
    vmmover.prepare()
  File "./proxmove", line 1351, in prepare
    self.prepare_vm_config()
  File "./proxmove", line 1401, in prepare_vm_config
    self.prepare_vm_config_volumes(vm)
  File "./proxmove", line 1405, in prepare_vm_config_volumes
    if volume.get_size('host') is None:
  File "./proxmove", line 1260, in get_size
    self.location))
  File "./proxmove", line 879, in get_image_size
    'zfs', 'destroy', temp_snapname], hide_stderr=True)
  File "./proxmove", line 679, in ssh_command
    hide_stderr=hide_stderr, tty=tty)
  File "./proxmove", line 632, in run_command
    returncode=proc.returncode, cmd=command, output=out)
__main__.CalledProcessError2: Command '['ssh', '-A', 'root@192.168.202.132', 'zfs', 'snapshot', 'data/base-9000-disk-0/vm-123-disk-0@temp-0.6604574787343084', '&&', 'zfs', 'send', '-Rnv', 'data/base-9000-disk-0/vm-123-disk-0@temp-0.6604574787343084', '&&', 'zfs', 'destroy', 'data/base-9000-disk-0/vm-123-disk-0@temp-0.6604574787343084']' returned non-zero exit status 2.

(output)

(stderr)
cannot open 'data/base-9000-disk-0/vm-123-disk-0': dataset does not exist
usage:
        snapshot [-r] [-o property=value] ... <filesystem|volume>@<snap> ...

For the property list, run: zfs set|get

For the delegated permission list, run: zfs allow|unallow```

Is it possible to migrate linked-clones with this script ?

wdoekes commented 4 years ago

I have no idea what linked clones are. But I would say that your storage config is wrong.

./proxmove overlords cluster2 pve data abtest.xxxxx.de --no-verify-ssl

proxmove tries to connect to your old-cluster "overlords" to find the VM "abtest.xxxxx.de" (vm 123?).

Through the API, it has found (I presume) base-9000-disk-0/vm-123-disk-0 on a storage you have defined in your .proxmoverc.

Proxmove concatenates your base path=zfs:data with that base-9000-disk-0/vm-123-disk-0 and comes up with the zfs fileset name data/base-9000-disk-0/vm-123-disk-0 (or perhaps the base-9000-disk-0 is part of the path=, I cannot tell from here).

In either case: proxmove calls ssh to the "overlords"-storage at 192.168.202.132. There it tries to create a temporary snapshot, so the data can be transferred. This snapshotting fails because the data/base-9000-disk-0/vm-123-disk-0 fileset does not exist.

Something in your config is likely wrong. If you find the zfs zvol where the source vm-123-disk-0 is, you should be able to fix things.

For reference:

ssh -A root@192.168.202.132 \
  zfs snapshot data/base-9000-disk-0/vm-123-disk-0@temp-0.6604574787343084 &&
  zfs send -Rnv data/base-9000-disk-0/vm-123-disk-0@temp-0.6604574787343084 &&
  zfs destroy data/base-9000-disk-0/vm-123-disk-0@temp-0.6604574787343084

darkpixel commented 4 years ago

I'm guessing he's talking about linked zfs clones. i.e. create a VM, install an OS, shut down the VM, snapshot, then zfs clone data/virt/vm-100-disk-0@snap data/virt/vm-101-disk-0.

Repeat for however many identical VMs you need. The benefit is that they all share the same source and the clones will only take up space if it differs from the source.

It's handy for setting up a virtual windows workstation, sysprepping it, then making 50 identical copies for a hacky/not-so-expensive VDI.

m-witt commented 4 years ago

@darkpixel is right. About the linked zfs clones i found the following explanation in proxmox docs : https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_copy_and_clone

From my perspective i've done the configuration right :

api=https://root@pam:MyPassword@192.168.202.132:8006

  [storage:overlords:data@prx001] ; local disk on node1 only
  ssh=root@192.168.202.132
  path=zfs:data
  temp=/srv/temp

  [storage:overlords:local@prx001] ; other local disk on node2 only
  ssh=root@192.168.202.132
  path=/var/lib/vz
  temp=/srv/temp

...
[pve:cluster2]
api=https://root@pam:MyPassword@192.168.202.122:8006

  [storage:cluster2:data@pve]
  ssh=root@192.168.202.122
  path=zfs:data
  temp=/srv/temp

ZFS List

root@prx001:~# zfs list
NAME                     USED  AVAIL     REFER  MOUNTPOINT
data                     308G  3.92T     28.7G  /data
data/base-9000-disk-0   4.76G  3.92T     1.17G  -
data/base-9005-disk-0   18.4G  3.93T     2.08G  -
data/vm-123-disk-0      99.0G  3.92T      100G  -
data/vm-130-disk-0      21.0M  3.92T     21.0M  -
data/vm-130-disk-1      23.7G  3.92T     23.7G  -
data/vm-200-cloudinit   8.40M  3.92T      121K  -
data/vm-200-disk-0      18.3G  3.93T     2.78G  -
data/vm-333-disk-0      19.2G  3.92T     16.0G  -
data/vm-333-disk-1       121K  3.92T      109K  -
data/vm-666-disk-0      48.3G  3.92T     48.3G  -
data/vm-666-disk-1       109K  3.92T      109K  -
data/vm-666-disk-2       109K  3.92T      109K  -
data/vm-9000-cloudinit  8.40M  3.92T     89.5K  -
data/vm-9005-cloudinit  8.40M  3.92T     89.5K  -
data/vm-999-disk-0      48.0G  3.92T     48.0G  -
data/vm-999-disk-1       109K  3.92T      109K  -

Further i've already migrated successfully one VM with this script which was a full-clone instead of a linked-clone. Cloud my config still be wrong ?

wdoekes commented 4 years ago

Ok. I guess that looks legit then :)

In that case the base-9000-disk-0/vm-123-disk-0 disk location is unexpected.

ZFS has a data/vm-123-disk-0, but proxmove apparently gets base-9000-disk-0/vm-123-disk-0.

    def get_volumes(self):
        if 'volumes' not in self._cache:
            volumes = {}
            for key, value in self.get_config().items():
                if PROXMOX_VOLUME_TYPES_RE.match(key):
                    location, properties = value.split(',', 1)
                    if location == 'none':
                        volume = ProxmoxVolume(None, properties)
                    else:
                        storage, location = location.split(':', 1)
                        storage = self.cluster.get_storage(self.node, storage)
                        volume = storage.get_volume(location, properties)
                    volumes[key] = volume
            self._cache['volumes'] = volumes
        return self._cache['volumes']

You could try and see what happens if you change the location:

volume = storage.get_volume(location, properties)

replace with:

volume = storage.get_volume(location.split('/')[-1], properties)

That might work. (But then the destination will lose any notion of a linked clone.)

Some more debug output from your side also helps. Run with --debug and perhaps also add this:

            for key, value in self.get_config().items():
                print('config', key, value)  # <-- add this one
                if PROXMOX_VOLUME_TYPES_RE.match(key):

m-witt commented 4 years ago

Here is the result :

2020-05-07 15:14:44,484: DEBUG: Parsing config file: ~/.proxmoverc
2020-05-07 15:14:44,485: DEBUG: (api) Connecting to 192.168.202.132
2020-05-07 15:14:44,628: DEBUG: (api) Connecting to 192.168.202.122
2020-05-07 15:14:44,709: DEBUG: (api) 'cluster2' nodes: [{'maxdisk': 965291671552, 'cpu': 0.0193597502171085, 'type': 'node', 'uptime': 165142, 'disk': 1636040704, 'maxcpu': 24, 'status': 'online', 'maxmem': 67493126144, 'ssl_fingerprint': '6F:6D:2D:1B:46:BF:01:F3:AC:BE:96:DC:DC:74:9E:69:09:84:F7:31:B7:74:F1:7F:59:58:38:2C:95:B8:9C:B8', 'node': 'pve', 'id': 'node/pve', 'mem': 7939264512, 'level': ''}]
2020-05-07 15:14:44,709: DEBUG: Sanity checks and preparation
2020-05-07 15:14:44,709: DEBUG: Checking VMs existence on source and destination                                                                                                                                  
2020-05-07 15:14:44,732: DEBUG: Checking for problematic config in 1 VMs to move                                                                                                                                  
config boot c                                                                                                                                                                                                     
config hotplug disk,network,usb,memory,cpu                                                                                                                                                                        
config onboot 1                                                                                                                                                                                                   
config serial0 socket                                                                                                                                                                                             
config searchdomain xxxxx.de                                                                                                                                                                                   
config vmgenid 3205710a-5dc3-4ff1-9248-35d5944c78ff                                                                                                                                                               
config numa 1                                                                                                                                                                                                     
config digest 80bcf9e7e434b03c6e8e396e188e4c9cfa6f4750                                                                                                                                                            
config scsihw virtio-scsi-pci                                                                                                                                                                                     
config scsi0 data:base-9000-disk-0/vm-123-disk-0,size=300G                                                                                                                                                        
config sockets 1                                                                                                                                                                                                  
config agent 1                                                                                                                                                                                                    
config sshkeys ssh-ed25519%20AAAAC3NzaC1lZDI1NTE5AAAAIIghMxw%2B5CJPorrvsj5%2BZju84zsuqKAiojLaxwUXG%2BBC%20                                                                                     
config ipconfig0 ip=xx.xxx.xx.xxx/25,gw=xx.xxx.xx.xxx                                                                                                                                                             
config net0 virtio=CE:15:B0:0B:3D:36,bridge=vmbr0                                                                                                                                                                 
config description Cloudron Test-Installation                                                                                                                                                                     
config cores 4                                                                                                                                                                                                    
config bootdisk scsi0                                                                                                                                                                                             
config memory 8192
config ostype l26
config balloon 2048
config name abtest.xxxx.de
config smbios1 uuid=57aab168-b7f8-4e6f-b333-a5792a9c0250
config nameserver xx.xxx.xx.xx
2020-05-07 15:14:44,738: DEBUG: (exec) ssh -A root@192.168.202.132 /bin/true
2020-05-07 15:14:44,867: DEBUG: (exec) ssh -A root@192.168.202.132 zfs snapshot data/vm-123-disk-0@temp-0.02936786425268223 '&&' zfs send -Rnv data/vm-123-disk-0@temp-0.02936786425268223 '&&' zfs destroy data/vm-123-disk-0@temp-0.02936786425268223
2020-05-07 15:14:45,462: DEBUG: (exec) ssh -A root@192.168.202.132 zfs list -H -o volsize data/vm-123-disk-0
2020-05-07 15:14:45,596: DEBUG: Found 2 relevant storages: data, data
2020-05-07 15:14:45,596: DEBUG: Checking storage prerequisites
2020-05-07 15:14:45,596: DEBUG: (exec) ssh -A root@192.168.202.132 which ssh zfs mbuffer
2020-05-07 15:14:45,727: DEBUG: (exec) ssh -A root@192.168.202.132 test -d /srv/temp
2020-05-07 15:14:45,850: DEBUG: (exec) ssh -A root@192.168.202.122 /bin/true
root@192.168.202.122's password: 
2020-05-07 15:15:15,680: DEBUG: (exec) ssh -A root@192.168.202.122 which ssh zfs mbuffer                                                                                                                          
root@192.168.202.122's password:                                                                                                                                                                                  
2020-05-07 15:15:20,596: DEBUG: (exec) ssh -A root@192.168.202.122 test -d /srv/temp                                                                                                                              
root@192.168.202.122's password:                                                                                                                                                                                  
2020-05-07 15:15:28,250: INFO: Attempt moving overlords<806edfe1> => cluster2<806edfe1> (node 'pve'): abtest.xxxxx.de                                                                                          
2020-05-07 15:15:28,250: INFO: - source VM abtest.xxxx.de@prx001<qemu/123/running>                                                                                                                            
2020-05-07 15:15:28,250: INFO: - storage 'scsi0': data:vm-123-disk-0,size=300G (host=74.4GiB, guest=300.0GiB)                                                                                                     
2020-05-07 15:15:28,250: INFO: Creating new VM 'abtest.xxxxx.de' on 'cluster2', node 'pve'                                                                                                                     
2020-05-07 15:15:28,421: INFO: - created new VM 'abtest.xxxxx.de--CREATING' as UPID:pve:0000B5B3:00FC100A:5EB409F0:qmcreate:102:root@pam:; waiting for it to show up                                           
2020-05-07 15:15:30,472: INFO: - created new VM 'abtest.xxxxx.de--CREATING': abtest.xxxx.de--CREATING@pve<qemu/102/stopped>                                                                                
2020-05-07 15:15:30,472: INFO: Stopping VM abtest.xxxxx.de@prx001<qemu/123/running>; will forcibly kill after 130 seconds                                                                                      
2020-05-07 15:17:21,632: INFO: - stopped VM abtest.xxxx.de@prx001<qemu/123/stopped>                                                                                                                           
config vmgenid 5ac4b46b-3882-4052-9633-064293229e80                                                                                                                                                               
config bootdisk scsi0                                                                                                                                                                                             
config hotplug disk,network,usb,memory,cpu                                                                                                                                                                        
config sshkeys ssh-ed25519%20AAAAC3NzaC1lZDI1NTE5AAAAIIghMxw%2B5CJPorrvsj5%2BZju84zsuqKAiojLaxwUXG%2BBC%20                                                                                 
config searchdomain xxxx.de                                                                                                                                                                                   
config net0 virtio=CE:15:B0:0B:3D:36,bridge=vmbr0                                                                                                                                                                 
config memory 8192                                                                                                                                                                                                
config ipconfig0 ip=xx.xxx.xx.xxx/25,gw=xx.xxx.xx.xxx                                                                                                                                                             
config scsihw virtio-scsi-pci
config balloon 2048
config smbios1 uuid=57aab168-b7f8-4e6f-b333-a5792a9c0250
config name abtest.xxxx.de--CREATING
config boot c
config onboot 1
config cores 4
config description Cloudron Test-Installation
config nameserver xx.xxx.xx.x
config numa 1
config digest ef2505999d1f167b86ddea2b12262778e9a6d66d
config sockets 1
config serial0 socket
config ostype l26
config agent 1
2020-05-07 15:17:21,653: INFO: Begin copy of 'scsi0' (data:vm-123-disk-0,size=300G) to data
2020-05-07 15:17:21,653: INFO: zfs(1) send/recv 74.4GiB data from 'data/vm-123-disk-0@proxmove-200507-151721' to 'data/vm-102-scsi0' (on data)
2020-05-07 15:17:21,653: DEBUG: (exec) ssh -A root@192.168.202.132 zfs snapshot data/vm-123-disk-0@proxmove-200507-151721
2020-05-07 15:17:21,946: DEBUG: (exec) ssh -A root@192.168.202.122 which pv
root@192.168.202.122's password: 
2020-05-07 15:17:29,952: DEBUG: Missing commands: {'pv'}
./proxmove:975: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  'pv(1) command is not found on the destination storage; '
2020-05-07 15:17:29,953: WARNING: pv(1) command is not found on the destination storage; consider installing it to get a pretty progress bar
2020-05-07 15:17:29,953: DEBUG: (exec) ssh -A root@192.168.202.132 -t 'zfs send -R data/vm-123-disk-0@proxmove-200507-151721 | mbuffer -q -s 128k -m 1G  | ssh -o StrictHostKeyChecking=no root@192.168.202.122 \'mbuffer  -s 128k -m 1G |   zfs recv data/vm-102-scsi0\''
root@192.168.202.122's password: 
in @ 11.2 MiB/s, out @  0.0 kiB/s, 1024 kiB total, buffer   5% fullcannot receive: local origin for clone data/vm-102-scsi0@proxmove-200507-151721 does not exist
mbuffer: error: outputThread: error writing to <stdout> at offset 0x100000: Broken pipe

mbuffer: warning: error during output to <stdout>: Broken pipe
summary: 1024 kiByte in  4.2sec - average of  243 kiB/s
mbuffer: error: outputThread: error writing to <stdout> at offset 0x3220000: Broken pipe
mbuffer: warning: error during output to <stdout>: Broken pipe
Connection to 192.168.202.132 closed.
Traceback (most recent call last):
  File "./proxmove", line 1750, in <module>
    main()
  File "./proxmove", line 1746, in main
    vmmover.run(options.dry_run)
  File "./proxmove", line 1439, in run
    self.move_vm(vm, translator, dry_run)
  File "./proxmove", line 1526, in move_vm
    dst_vm.create_volume(key, volume, storage=storage)
  File "./proxmove", line 1110, in create_volume
    new_volume = source_volume.clone(storage, self.id, key)
  File "./proxmove", line 1288, in clone
    new_storage, new_vmid, new_name)
  File "./proxmove", line 587, in copy
    image_size, src_location, dst_storage, dst_name)
  File "./proxmove", line 996, in copy_direct
    tty=True)
  File "./proxmove", line 679, in ssh_command
    hide_stderr=hide_stderr, tty=tty)
  File "./proxmove", line 621, in run_command
    output='Failure with status {}'.format(status))
__main__.CalledProcessError2: <exception str() failed>

wdoekes commented 4 years ago

So, cannot receive: local origin for clone data/vm-102-scsi0@proxmove-200507-151721 does not exist then.

I suspect it might work if you get that data/base-9000-disk-0 copied first. It should have a snapshot already (where you're cloning from).

ssh -A to root@192.168.202.132, and then:

# zfs list -r -t all data/base-9000-disk-0

Find the snapshot and then:

# zfs send data/base-9000-disk-0@SNAPSHOT |
  ssh root@192.168.202.122 zfs recv data/base-9000-disk-0

Possibly?

wdoekes commented 3 years ago

Ok. This is not something I'm willing to spend time on. Nobody I know uses linked clones. I'll leave it open with a wontfix label for now.

ossobv / proxmove

RFE: Migrating linked ZFS clone disks: not possible #28