tasket / wyng-backup

Fast backups for logical volumes & disk images
GNU General Public License v3.0
245 stars 16 forks source link

KeyError on volume delete #183

Closed seven-beep closed 5 months ago

seven-beep commented 6 months ago

Hello,

From the main branch, I cannot send volumes to the archives, the volume is not correctly recognized and the operation will let the archive in an incoherent state :

# wyng arch-init --dest=qubes://disp9061/home/user/qubes.backup
Wyng 0.8beta release 20240225
Enter new encryption passphrase:
Re-enter passphrase:

Encryption    : xchacha20-dgr (xchacha20-poly1305-msr)
Data Hashing  : hmac-sha256
Compression   : zlib:4
Done.
# wyng send vm-lorgnette-dvm-private --dest=qubes://disp9061/home/user/qubes.backup --local=qubes_dom0/vm-pool
Wyng 0.8beta release 20240225
Enter passphrase:
Encrypted archive 'qubes://disp9061/home/user/qubes.backup'
Last updated 2024-04-01 11:17:42.422740 (+02:00)

Preparing snapshots in '/dev/qubes_dom0/'...
  Skipping vm-lorgnette-dvm-private; snapshot is from a different archive.
No new data.

# wyng receive vm-lorgnette-dvm-private --dest=qubes://disp9061/home/user/qubes.backup --local=qubes_dom0/vm-pool
Wyng 0.8beta release 20240225
Enter passphrase:
Encrypted archive 'qubes://disp9061/home/user/qubes.backup'
Last updated 2024-04-01 11:33:44.867592 (+02:00)
No sessions available.
Error on volume(s): vm-lorgnette-dvm-private
Exception ignored in atexit callback: <function cleanup at 0x7331941447c0>
Traceback (most recent call last):
  File "/sbin/wyng", line 4457, in cleanup
    sys.exit(2)
SystemExit: 2
# wyng list --dest=qubes://disp9061/home/user/qubes.backup
Wyng 0.8beta release 20240225
Enter passphrase:
Encrypted archive 'qubes://disp9061/home/user/qubes.backup'
Last updated 2024-04-01 11:33:44.867592 (+02:00)

Volumes:
 vm-lorgnette-dvm-private
# wyng delete vm-lorgnette-dvm-private --dest=qubes://disp9061/home/user/qubes.backup
Wyng 0.8beta release 20240225
Enter passphrase:
Encrypted archive 'qubes://disp9061/home/user/qubes.backup'
Last updated 2024-04-01 11:33:44.867592 (+02:00)

Warning! Delete will remove ALL metadata AND archived data for volume vm-lorgnette-dvm-private
Are you sure? [y/N]: y

Deleting volume vm-lorgnette-dvm-private from archive.
Traceback (most recent call last):
  File "/sbin/wyng", line 4751, in <module>
    delete_volume(storage, aset, options.volumes[0])
  File "/sbin/wyng", line 4257, in delete_volume
    for lvol_name in (storage.lvols[dv].snap1, storage.lvols[dv].snap2):
                      ~~~~~~~~~~~~~^^^^
KeyError: 'vm-lorgnette-dvm-private'
tasket commented 6 months ago

@seven-beep The "different archive" issue would be because a backup was done for that volume to a different archive, leaving a snapshot behind. LVM cannot accommodate snapshots from different archives due to its volume naming restrictions (this is not an issue when backing up reflink sources like Btrfs).

In this case use --remap with send, which will remove the old snapshot and create a new one associated with the current archive.

I think if there is a bug here, it is in the way an empty volume entry is created before exiting due to the snapshot conflict. The exit should occur before the volume is created in the archive.


The receive command error: A new volume can exist without any session (i.e. no data) and that is the case here. The atexit callback error is an unrelated exit cleanup issue.


The KeyError appears to be a bug in handling a data-less volume entry. I'll try to reproduce it.

Thanks for reporting this!

seven-beep commented 6 months ago

Yes I confirm that using --remap fixes the situation.

Thank you for looking at it !

nijave commented 6 months ago

@seven-beep The "different archive" issue would be because a backup was done for that volume to a different archive, leaving a snapshot behind. LVM cannot accommodate snapshots from different archives due to its volume naming restrictions (this is not an issue when backing up reflink sources like Btrfs).

In this case use --remap with send, which will remove the old snapshot and create a new one associated with the current archive.

I think if there is a bug here, it is in the way an empty volume entry is created before exiting due to the snapshot conflict. The exit should occur before the volume is created in the archive.

The receive command error: A new volume can exist without any session (i.e. no data) and that is the case here. The atexit callback error is an unrelated exit cleanup issue.

The KeyError appears to be a bug in handling a data-less volume entry. I'll try to reproduce it.

Thanks for reporting this!

Seeing KeyError consistently on delete with a little bit older version

# python3 /mnt/exports/files/.local/bin/wyng --dest=file:/mnt/archives/disk_images/lv_backups/ delete vmubtkube01
Wyng 0.8beta release 20231002
Un-encrypted archive 'file:/mnt/archives/disk_images/lv_backups/'
Last updated 2024-04-02 00:07:21.006795 (+00:00)

Warning! Delete will remove ALL metadata AND archived data for volume vmubtkube01
Are you sure? [y/N]: y

Deleting volume vmubtkube01 from archive.
Traceback (most recent call last):
  File "/mnt/exports/files/.local/bin/wyng", line 4750, in <module>
    delete_volume(storage, aset, options.volumes[0])
  File "/mnt/exports/files/.local/bin/wyng", line 4256, in delete_volume
    for lvol_name in (storage.lvols[dv].snap1, storage.lvols[dv].snap2):
KeyError: 'vmubtkube01'

# python3 /mnt/exports/files/.local/bin/wyng --dest=file:/mnt/archives/disk_images/lv_backups/ delete vmubtkube01
Wyng 0.8beta release 20231002
Un-encrypted archive 'file:/mnt/archives/disk_images/lv_backups/'
Last updated 2024-04-02 00:10:56.392962 (+00:00)
Volume 'vmubtkube01' not configured; Skipping.
Volume not found.

Seems like it deleted successfully though

Edit: Maybe my case is different, I think it's because I manually deleted the LVs first

tasket commented 6 months ago

Yes, the key error occurs for a local storage object, so its not archive-related. And the archive delete is done before the storage delete, so its just an issue with trying to delete snapshots for a volume that doesn't exist.

Edit: Actually the storage state is offline when this occurs, since no --local is specified. I'll simply check for that.

tasket commented 5 months ago

This should be fixed now.