Open GoogleCodeExporter opened 9 years ago
Hi Jesus,
This is definitely high priority in our roadmap. I am not entirely sure about
the proposed approach though: considering Ganeti sees instances as black boxes,
wouldn't "opening" that violate that requirement? Sure, we can use the os help
for this, but that has definitely the opportunity of introducing more bugs,
unfortunately.
Definitely we need to resolve the locking of the watcher during the migration,
that is sure agreed.
thoughts?
Thanks,
Guido
Original comment by ultrot...@google.com
on 24 May 2013 at 6:57
Does the process as it stands now (using dd) use any kind of compression? Since
the core of what Jesus is suggesting to is to avoid copying empty blocks, using
compression should at least minimize the data representing those empty blocks,
perhaps exponentially, and it would permit us to continue to use a file system
agnostic approach.
Original comment by jpwoodbu@google.com
on 6 Jun 2013 at 10:11
On second thought, while compression would certainly help, with sufficient
churn on the disk most of the free blocks won't be filled with zeros.
Original comment by jpwoodbu@google.com
on 6 Jun 2013 at 10:17
Original comment by ultrot...@google.com
on 9 Jul 2013 at 2:49
Whats about synchronisation of the blockdevices only for the changed parts?
Something like http://bdsync.rolf-fokkens.nl/
Original comment by tilo....@googlemail.com
on 27 Jul 2013 at 10:00
a bit off topic i guess but as for changed block device synchronization tools,
lvmsync (https://github.com/mpalmer/lvmsync) seems also interesting and may be
more efficient for lvm filesystem.
Original comment by informat...@gmail.com
on 27 Jul 2013 at 12:10
Everything which involves mounting instance filesystems on the host should be
avoided. See [0], [1] or [2] for why that's insecure.
libguestfs ([3]) works around this issue by starting a tiny appliance in a VM
and performs the mounting in this appliance. It understands quite some
filesystems and can also help to sparsify disks (by using e.g. zerofree [4]).
I see two options how to speed up disk moves:
1)
* Use libguestfs to mount all filesystems in a disk and zero the free blocks in them
* Perform the move using dd | (gzip,bzip2)
2)
* Use libguestfs to export a disk to a compressed sparse qcow2 image (this zeros free blocks as well)
* Send the resulting image over using dd
* Advantage: The resulting image should be smaller than what dd | (gzip,bzip2) produces (thanks to qcow2 understanding sparseness)
* Drawback: Exporting the image takes time (we can't start sending data while it's exporting) and takes up to two times the space of the disk as temporary storage
If libguestfs is not available we still can use dd | (gzip,bzip2) instead of
plain dd for sending the data. This will obviously send a lot of useless data
in the process.
BTW, libguestfs is available for a couple of Linux distributions, and is
available in Debian as of squeeze-backports.
What do you think?
[0]: http://libguestfs.org/guestfs.3.html#security-of-mounting-filesystems
[1]: http://lwn.net/Articles/538898/
[2]:
https://www.berrange.com/posts/2013/02/20/a-reminder-why-you-should-never-mount-
guest-disk-images-on-the-host-os/
[3]: http://libguestfs.org/
[4]: http://manpages.ubuntu.com/manpages/lucid/man8/zerofree.8.html
Original comment by thoma...@google.com
on 30 Sep 2013 at 12:57
Original comment by thoma...@google.com
on 30 Sep 2013 at 1:09
I conducted benchmarks regarding various instance move strategies. I used one
300GB disk with just a minimal Debian OS installed on it. Data was not actually
sent over the network, but I only exported the disk to an image file (on the
same physical disk). So timing values are not really meaningful, but the
resulting image size is.
The strategies I tested are:
- dd: simple `dd if=<block dev> of=<img file> bs=1M`
- dd | gzip: same as dd, but piped through gzip
- dd | gzip --fast: same as dd | gzip, but using the --fast option of gzip
- dd | bzip2: same as dd, but piped through bzip2
- virt-sparsify: use virt-sparsify to create a compressed sparse qcow2 image
Steps performed by virt-sparsify:
* Launch small VM using KVM (no hardware acceleration on dom0's...)
* Create overlay QEMU image, so no write access actually goes to the original disk (requires a lot of temporary space)
* Mount all filesystems of the disk
* Fill free blocks with zeros (something like `dd if=/dev/zero of=/tmp/zeros; rm /tmp/zeros`) (writes ~295GB to overlay disk)
* Unmount filesystems
* Calls something like `qemu-img convert -c -O qcow2 <overlay image> <destination image>`
- fill zeros in libguestfs, qemu-img: use libguestfs to fill free space in image with zeros, then qemu-img to create sparse compressed image
This essentially performs the same steps as virt-sparsify, but does not create an overlay image to "protect" the instance disk.
- fill zeros in VM, qemu-img: Free blocks are zeroed from within the VM
- zerofree in VM, qemu-img: Call zerofree in VM (requires RO mount of the file systems)
- fill zeros on host, qemu-img: Mount file systems on host (insecure), fill with zeros there
Part of those strategies were tested in three different constellations (see
attached diagrams)
- zerod_disk.png: The disk was zeroed before handing it to the OS installation scripts, and only the OS was installed on it
- random_disk.png: Free blocks of the disk were filled with random data. The actually used amount of data was only the OS. Note that the dd | bzip2 benchmark didn't run through, as it would have taken too long, so the numbers are extrapolations.
- kvm_random_disk.png: As libguestfs uses KVM to start its appliance, performance was tested on a machine which had hardware virtualization support (unlike the dom0 domains used in the other tests). Timings are not comparable to the other two constellations, as the machine was a different one.
The bars in the diagram show:
- time: The total time it took to export the disk to a disk image (on the same physical disk, so this time is not quite meaningful)
- size: The size of the resulting image
- est: The estimated time of an instance move with a throughput of 30 MB/s (encryption + network speed). For the dd-based strategies, that's size/30. The qemu-img based strategies require to store the image on the host (qcow2 requires random access while writing), so there it's time + size/30.
One quick note about throughput: Make sure to use '--enable-socat-compress'
during `./configure` and a socat version which supports it (see the socat note
in INSTALL), otherwise the throughput during instance moves will suffer quite a
bit.
Performance-wise I'm leaning towards zeroing free blocks in a VM, but running
it in the hypervisor which is available on the host (so no fully emulated KVM
on dom0's). It would be preferable to build on libguestfs, as they have put a
lot of effort in auto-detecting OS's, file systems and so on, but we're not
sure if that's doable with Xen, for example.
Any thoughts? Comments? Strategies I missed to benchmark?
Original comment by thoma...@google.com
on 11 Oct 2013 at 9:09
Attachments:
Assigning to Riba, he's working on this.
Original comment by thoma...@google.com
on 2 Apr 2014 at 7:23
Original issue reported on code.google.com by
clim...@google.com
on 23 May 2013 at 7:00