Open patrakov opened 8 years ago
How many snapshots do you have in a thin pool checked by a thin_check tool during boot?
you can print out all thin LVs (including any thin snapshot) in your thin pool by i.e.: lvs -S pool_lv=your_thin_pool_lv_name
I am not at that PC right now, will provide the exact number of snapshots when I return home. But I guess it is between 100 and 200.
Ok, also please include kernel version you run and thin chunk size. You can get thin chunk size via: lvs -o +chunk_size your_vg/your_thin_pool_lv
64k chunk size, 188 snapshots
This is also reproducible with thin_check 0.5.6, and the following LVM version:
LVM version: 2.02.137(2) (2015-12-05) Library version: 1.02.113 (2015-12-05) Driver version: 4.33.0
The kernel version is now 4.3.3.
Unfortunately, the thin pool failed on me. Currently, I have to run thin_repair. If it fails, I will have to restore from a backup. In any case, I have to remove the thin pool (and thus any ability to contribute to the ticket), in order to avoid any repeat of the incident.
I have complained to the LVM mailing list, and they suggest adding --skip-mappings to thin_check_options. This does help. So I would be happy if this gets included in the documentation.
https://www.redhat.com/archives/linux-lvm/2016-January/msg00010.html
I have read https://lizards.opensuse.org/2012/07/25/snapper-lvm/
As I don't trust btrfs, I decided to give snapper on lvm2 thin snapshots a try. I have backed up all my data to external storage, started with a blank 512 GB SSD, created the /boot partition, an LVM PV on another partition, a volume group, a thin pool (initially sized at 320 GB, so that I have space for rollback if anything goes wrong). Created thin volumes for / and for /home. Created initramfs, let it boot, installed snapper. Copied data to /home from the backup.
Configured snapper to make snapshots (basically enabled the defaut configuration which makes them every hour and cleans up when it wants).
I used this system for two weeks.
The problem is that it gradually became slower and slower to boot, and finally the initramfs gave up waiting for the root device to appear. Investigation showed that at this time the "thin_check" process was still running. Letting it finish allows for the system to continue booting.
Using a rescue system, I tried to activate and deactivate the volume group manually. The "thin_check runs for more than 2 minutes from vgchange -ay or vgchange -an" issue does exist there.
At this point, each volume had "snapshot94" as the highest-numbered snapshot, and the pool has 74% used of its data volume and 22% for the metadata. "thin_dump" produces an XML file that is 145 MB in size (1877625 lines). "thin_check", when invoked manually, runs for a long time (spending most time in its "examining mapping tree" phase) but eventually exits successfully.
I don't think that it is a snapper bug per se, but maybe that blog post and other documentation can be updated to reflect the issue, and maybe suggest workarounds (e.g. --skip-mappings in lvm.conf - but I have not tried this).