milkey-mouse / backup-vm

Back up a full image of a libvirt-based VM using Borg
MIT License
37 stars 9 forks source link

backup fails with internal error from libvirt: block name doesn't match #15

Open sabelka opened 6 years ago

sabelka commented 6 years ago

Since a couple of days I'm using "backup-vm" for some qemu/libvirt VMs, so far mostly successful. Today, a backup failed with the following error:

starting backup
libvirt: error code 1: internal error: qemu block name '/dev/vg_data01/mail2-sys
tem' doesn't match expected '/var/lib/libvirt/images/mail2-sda-tempsnap.qcow2'
Traceback (most recent call last):
  File "/usr/local/bin/backup-vm", line 11, in <module>
    load_entry_point('backup-vm==0.1.dev28+g442ce38', 'console_scripts', 'backup
-vm')()
  File "/usr/local/lib/python3.6/site-packages/backup_vm-0.1.dev28+g442ce38-py3.
6.egg/backup_vm/backup.py", line 54, in main
    borg_failed = multi.assimilate(args.archives)
  File "/usr/local/lib/python3.6/site-packages/backup_vm-0.1.dev28+g442ce38-py3.
6.egg/backup_vm/snapshot.py", line 175, in __exit__
    self.blockcommit(disks_to_backup)
  File "/usr/local/lib/python3.6/site-packages/backup_vm-0.1.dev28+g442ce38-py3.
6.egg/backup_vm/snapshot.py", line 81, in blockcommit
    | libvirt.VIR_DOMAIN_BLOCK_COMMIT_SHALLOW) < 0:
  File "/usr/local/lib64/python3.6/site-packages/libvirt.py", line 701, in block
Commit
    if ret == -1: raise libvirtError ('virDomainBlockCommit() failed', dom=self)
libvirt.libvirtError: internal error: qemu block name '/dev/vg_data01/mail2-syst
em' doesn't match expected '/var/lib/libvirt/images/mail2-sda-tempsnap.qcow2'

The first backup of this VM a day before completd without errors, so either the first backup left the VM in some state which caused problems during the next run, or there was some non-deterministic (e.g. timing-dependent) issue in the second run.

The VM (called mail2) has three disks (LVM logical volums):

sda  /dev/vg_data01/mail2_system
sdb  /dev/vg_data01/mail2_swap
sdc  /dev/vg_data01/mail2_data

After the failed backup, the VMs disks were in the following state:

[root@sabavm1 ~]# virsh domblklist mail2
Target     Source
------------------------------------------------
sda        /var/lib/libvirt/images/mail2-sda-tempsnap.qcow2
sdb        /var/lib/libvirt/images/mail2-sdb-tempsnap.qcow2
sdc        /dev/vg_data01/mail2-data

I tried then to remove the snapshots manually, but only sdb was succesful:

[root@sabavm1 ~]# virsh blockcommit mail2 sda --verbose --pivot
error: internal error: unable to find backing name for device drive-scsi0-0-0-0

[root@sabavm1 ~]# virsh blockcommit mail2 sdb --verbose --pivot
Block commit: [100 %]
Successfully pivoted

Next, I've shut the VM down and restarted it again. After I did that I was able to remove the snapshot and the status of the disks was back to normal:

[root@sabavm1 ~]# virsh blockcommit mail2 sda --verbose --pivotBlock commit: [100 %]
Successfully pivoted
[root@sabavm1 ~]# virsh domblklist mail2
Target     Source
------------------------------------------------
sda        /dev/vg_data01/mail2-system
sdb        /dev/vg_data01/mail2-swap
sdc        /dev/vg_data01/mail2-data

I wonder if this is an issue with libvirt and/or qemu (I have libvirt version 4.0.0 and qemu 2.9.0) or with "backup-vm". What could I do to debug things further?

milkey-mouse commented 6 years ago

Hmm, upon cursory googling seems to be related to this bug. It doesn't look like it was ever fixed according to bugzilla.

Perhaps you could try to reproduce the error sans backup-vm as described in the bug report? Then we know it's libvirt's fault...

sabelka commented 6 years ago

I tried the script, but it did not work for me:

# virsh snapshot-create-as vfw1 20180304 20180304-backup --disk-only --atomic 
error: unsupported configuration: source for disk 'sda' is not a regular file; refusing to generate external snapshot name

Maybe this is because I use lvm logical volumes for the VMs disk images? It works when I add explicit image path names, though.

 --diskspec sda,file=/data/vm/20180305-backup-sda.qcow2 --diskspec sdb,file=/data/vm/20180305-backup-sdb.qcow2

With the script from the bug report modified in such a way, it worked. I let it run for 1000 loop iterations but could not reproduce the error. I also added some command to copy some data on the VMs disks while the snapshot was active, in order to put some load on the block commit. Still I did not get an error.

milkey-mouse commented 6 years ago

Does the error show up every time with backup-vm? If you make another test VM with the same environment (i.e. LVM) and try to back it up twice, is it left in a similar inconsistent state?

I think this has something to do with either LVM, which I never explicitly tested backup-vm with (note to self, add this for #10), or some difference in how I'm snapshotting/pivoting disks from how virsh does it. The snapshot code is pretty much a direct port of the calls made in virsh, but I'll look again to see if virsh has made any changes that cause it to work when my version doesn't.