mpalmer / lvmsync

Synchronise LVM LVs across a network by sending only snapshotted changes
http://theshed.hezmatt.org/lvmsync
GNU General Public License v3.0
380 stars 60 forks source link

lvmsync not working as expected #23

Closed irvintim closed 9 years ago

irvintim commented 9 years ago

I followed the instructions in the example on moving a kvm image to a new server. To test that it was working, I started a job running on the virtual server that writes the current data to a file, and closes the file every 2 seconds: while true; do date >> /testdate.txt; sleep 2; done

I then verified this is working by doing a tail -f /testdate.txt.

The VM's disk file is a LVM lv -- /dev/volgroup00/testvm on kvm01, and I want to move it to kvm02.

I did the following (the machine I ran the command on is indicated before the ":"

kvm02: lvcreate -n testvm -L 20G volgroup00 kvm01: lvcreate --snapshot -L10G -n testvm_lvmsync volgroup00 kvm01: dd if=/dev/volgroup/testvm_lvmsync bs=1M | ssh kvm02 "pv -bpefr -s 21474836480 | dd of=/dev/volgroup00/testvm" kvm01: virsh shutdown testvm (verified that testvm was shutdown and the kvm state was "shut off") kvm01: lvmsync /dev/volgroup00/testvm_lvmsync kvm02:/dev/volgroup/testvm (This resulted in the following message: Transferred 1105920 bytes in 0.86 seconds You transferred your changes 19418.07x faster than a full dd!) kvm01: virsh dumpxml testvm > /tmp/testvm.xml kvm01: scp /tmp/testvm.xml kvm02:/tmp kvm02: virsh define /tmp/testvm.xml kvm02: virsh start testvm

Once the VM started up, I logged in and looked at the /testdate.txt file -- it only had updates up to the point of the snapshot being created on kvm01, it didn't have any of the dates added to the file afterwards. So, it appears that the dd completed correctly, but the lvmsync didn't do anything -- even though it claimed that it worked.

irvintim commented 9 years ago

Further info:

I should have said what my environment is:

CentOS 6.5 64-bit Kernel: 2.6.32-431.23.3.el6.x86_64 lvm> version LVM version: 2.02.100(2)-RHEL6 (2013-10-23) Library version: 1.02.79-RHEL6 (2013-10-23) Driver version: 4.24.6 ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux] SElinux = permissive mode

For the next test, I replicated the same test environment. KVM-based VM running with LVM-backed storage.

  1. created the snapshot on the LVM device on kvm01
  2. Performed a md5sum on that snapshot device: 0ea9a1c104d1bc01aebea0ac498d20fc /dev/vg_kvm01_data/test081300disk_lvmsync
  3. Created the lv on kvm02 and dd'd from the snapshot on kvm01 to the new lv on kvm02.
  4. Performed a md5sum on the new lv on kvm02: 0ea9a1c104d1bc01aebea0ac498d20fc /dev/vg_kvm02_data/test081300disk
  5. Halted the vm on kvm01 and waited for it to power down.
  6. lvmsync from the snapshot device on kvm01 to the lv on kvm02. Transferred 2056192 bytes in 1.14 seconds You transferred your changes 10443.98x faster than a full dd!
  7. Performed an md5som from the regular lv on kvm01 and the new lv on kvm02: a165852e0f48624039aa55d3efe17156 /dev/vg_kvm01_data/test081300disk c33cdb9699f7f731540cfc15feaadd35 /dev/vg_kvm02_data/test081300disk

So clearly the lvmsync isn't getting the new lv in sync with the old lv.

Any other details that you need to help with this?

Tim

mpalmer commented 9 years ago

I've had this problem reported before, and managed to replicate it once locally, but not consistently. I'm suspecting that perhaps /proc/sys/vm/drop_caches isn't as effective as one might hope, but I haven't been able to confirm that yet.

I can't commit to a date when I'll be able to fix this, but in the meantime, you might try testing a version of lvmsync from before I started supporting thin snapshots (9e30954 is the last commit before the thin snapshot support was added). It may be coincidence, but I only started getting these reports after that point.

localguru commented 9 years ago

I can confirm this:

root@test01:~# lvcreate --snapshot -L10GB -n source-lvmsync /dev/test01-vg/source Logical volume "source-lvmsync" created root@test01:~# dd if=/dev/test01-vg/source-lvmsync of=/dev/test01-vg/dest 204800+0 Datensätze ein 204800+0 Datensätze aus 104857600 Bytes (105 MB) kopiert, 5,92605 s, 17,7 MB/s root@test01:~# md5sum /dev/test01-vg/source-lvmsync 5769819ecb544229132cf94d52a8d187 /dev/test01-vg/source-lvmsync root@test01:~# md5sum /dev/test01-vg/dest 5769819ecb544229132cf94d52a8d187 /dev/test01-vg/dest root@test01:~# mount /dev/test01-vg/source /mnt/ root@test01:~# echo test > /mnt/test.txt root@test01:~# umount /mnt/ root@test01:~# lvmsync /dev/test01-vg/source-lvmsync /dev/test01-vg/dest Transferred 36864 bytes in 0.26 seconds You transferred your changes 2844.44x faster than a full dd! root@test01:~# mount /dev/test01-vg/dest /mnt/ root@test01:~# ls -al /mnt/test.txt ls: Zugriff auf /mnt/test.txt nicht möglich: No such file or directory

root@test01:~# lvm lvm> version LVM version: 2.02.66(2) (2010-05-20) Library version: 1.02.48 (2010-05-20) Driver version: 4.25.0 lvm> root@test01:~# uname -a Linux test01 3.11.0-26-generic #45~precise1-Ubuntu SMP Tue Jul 15 04:02:35 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux root@test01:~# dpkg -l | grep ruby ii libruby1.9.1 1.9.3.0-1ubuntu2.8 ii libruby1.9.1-dbg 1.9.3.0-1ubuntu2.8 ii libtcltk-ruby1.9.1 1.9.3.0-1ubuntu2.8 ii ruby1.9.1 1.9.3.0-1ubuntu2.8 ii ruby1.9.1-dev 1.9.3.0-1ubuntu2.8 ii ruby1.9.1-examples 1.9.3.0-1ubuntu2.8 ii ruby1.9.1-full 1.9.3.0-1ubuntu2.8 ii ruby1.9.3 1.9.3.0-1ubuntu2.8

irvintim commented 9 years ago

Matt:

I can conform that 9e30954 works perfectly. I'll use that version for now.

Thanks,

Tim

mpalmer commented 9 years ago

Thanks for confirming that. It gives me another data point for hunting down what's gone wrong.

dipohl commented 9 years ago

I also can confirm that https://github.com/mpalmer/lvmsync/commit/9e30954ed09aac8e075604417333ec5cbcd630d3 worked for me (System-Info see #30)

mpalmer commented 9 years ago

OK, all fixed. Turns out the problem was an incomplete protocol change I made in the thin snapshot code, which was fixed by @AnchorCat in 719f19b. I couldn't track it down myself because the test suite wasn't running the local lvmsync, but the system-installed one. I've fixed up the test suite so that doesn't happen, and all is well. I've pushed my changes, and will release a new version Real Soon Now.