openSUSE / snapper

Manage filesystem snapshots and allow undo of system modifications
http://snapper.io/
GNU General Public License v2.0
904 stars 128 forks source link

LVM2 thin snapshots make booting extremely slow #216

Open patrakov opened 8 years ago

patrakov commented 8 years ago

I have read https://lizards.opensuse.org/2012/07/25/snapper-lvm/

As I don't trust btrfs, I decided to give snapper on lvm2 thin snapshots a try. I have backed up all my data to external storage, started with a blank 512 GB SSD, created the /boot partition, an LVM PV on another partition, a volume group, a thin pool (initially sized at 320 GB, so that I have space for rollback if anything goes wrong). Created thin volumes for / and for /home. Created initramfs, let it boot, installed snapper. Copied data to /home from the backup.

Configured snapper to make snapshots (basically enabled the defaut configuration which makes them every hour and cleans up when it wants).

I used this system for two weeks.

The problem is that it gradually became slower and slower to boot, and finally the initramfs gave up waiting for the root device to appear. Investigation showed that at this time the "thin_check" process was still running. Letting it finish allows for the system to continue booting.

Using a rescue system, I tried to activate and deactivate the volume group manually. The "thin_check runs for more than 2 minutes from vgchange -ay or vgchange -an" issue does exist there.

At this point, each volume had "snapshot94" as the highest-numbered snapshot, and the pool has 74% used of its data volume and 22% for the metadata. "thin_dump" produces an XML file that is 145 MB in size (1877625 lines). "thin_check", when invoked manually, runs for a long time (spending most time in its "examining mapping tree" phase) but eventually exits successfully.

I don't think that it is a snapper bug per se, but maybe that blog post and other documentation can be updated to reflect the issue, and maybe suggest workarounds (e.g. --skip-mappings in lvm.conf - but I have not tried this).

# thin_check --version
0.4.1
# lvm version
  LVM version:     2.02.109(2) (2014-08-05)
  Library version: 1.02.88 (2014-08-05)
  Driver version:  4.31.0
oniko commented 8 years ago

How many snapshots do you have in a thin pool checked by a thin_check tool during boot?

you can print out all thin LVs (including any thin snapshot) in your thin pool by i.e.: lvs -S pool_lv=your_thin_pool_lv_name

patrakov commented 8 years ago

I am not at that PC right now, will provide the exact number of snapshots when I return home. But I guess it is between 100 and 200.

oniko commented 8 years ago

Ok, also please include kernel version you run and thin chunk size. You can get thin chunk size via: lvs -o +chunk_size your_vg/your_thin_pool_lv

patrakov commented 8 years ago

64k chunk size, 188 snapshots

This is also reproducible with thin_check 0.5.6, and the following LVM version:

LVM version: 2.02.137(2) (2015-12-05) Library version: 1.02.113 (2015-12-05) Driver version: 4.33.0

The kernel version is now 4.3.3.

patrakov commented 8 years ago

Unfortunately, the thin pool failed on me. Currently, I have to run thin_repair. If it fails, I will have to restore from a backup. In any case, I have to remove the thin pool (and thus any ability to contribute to the ticket), in order to avoid any repeat of the incident.

patrakov commented 8 years ago

I have complained to the LVM mailing list, and they suggest adding --skip-mappings to thin_check_options. This does help. So I would be happy if this gets included in the documentation.

https://www.redhat.com/archives/linux-lvm/2016-January/msg00010.html