xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.32k stars 74 forks source link

new blocks slow to write in virtual disks #557

Closed koehlerb closed 2 years ago

koehlerb commented 2 years ago

I have noticed that new blocks are slow to write in virtual disks. When the blocks are rewritten or overwritten, they write much faster at the expected speed.

Steps to reproduce:

  1. Create a new VM with a minimal Debian installation
  2. Install fio in the VM.
  3. Create a new SR (can be ext (local) or LVM (local) -- the results are the same for either type of SR).
  4. Create a new disk in the new SR.
  5. Attach the new disk to the VM.
  6. Make a new filesystem on the new disk (e.g. mkfs.ext4 /dev/xvdX).
  7. Mount the new disk (e.g. mount /dev/xvdX /mnt).
  8. Perform a write test (e.g. sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/mnt/test --bs=4M --iodepth=64 --size=4G --readwrite=write --ramp_time=4)
  9. Repeat the command in the previous step.

My experience has been that the first write test has very poor performance (high latency and low throughput) and the second write test (which overwrites the first file) has much better performance (in my case 10x higher throughput) in line with what would be expected from the underlying hardware.

If you repeat the test with new filenames each time, the performance is also poor (presumably because the underlying virtual disk has to allocate new blocks).

If you delete the file and repeat the test with the same filename, the performance is still poor (presumably because the underlying virtual disk is still allocating new blocks).

Eventually, there are no new blocks to allocate and the virtual disk has to reuse existing blocks and at that point all write operations perform well.

Not a problem when you use the underlying physical disk directly in dom0.

Also not a problem when you use hard disk pass through.

Pretty sure this has something to do with the way virtual disks work.

Would like to be able to use virtual disks and get good initial performance (without having to "condition" the virtual disks by writing all the blocks first).

HeMaN-NL commented 2 years ago

I expect this is the behaviour you have when using thin provisioning. When creating the virtual disk it will only use the space that is occupied by files, when using more space on a disk it will need to allocate that space and "expand". when using thick provisioning it will immediately claim the total disk space at creation of the virtual disk.

koehlerb commented 2 years ago

That was my initial reaction as well, however I get the same result with thick and thin provisioning -- ext (local) and LVM (local). Also, the throughput difference is really bad -- like by a factor of 10x. I would expect some overhead with virtual disks as you describe, but it is really bad. I really think there is something going on at the virtual disk layer.

With the same underlying physical disks, there is no problem in dom0 using the disks directly. There is also no problem with disk pass through. This seems to really point to an issue in the virtual disk layer.

olivierlambert commented 2 years ago

This is a normal behavior of SMAPIv1. In "regular" operations, this is almost invisible. Your bottleneck is your dom0 CPU raw IPC, because tapdisk is only monothreaded.

A better CPU will yield better results or another data path than tapdisk using VHD.

DSJ2 commented 2 years ago

Are you rebooting the VM between tests?

-David On Sun, 2022-07-03 at 09:20 -0700, koehlerb wrote:

That was my initial reaction as well, however I get the same result with thick and thin provisioning -- ext (local) and LVM (local). Also, the throughput difference is really bad -- like by a factor of 10x. I would expect some overhead with virtual disks as you describe, but it is really bad. I really think there is something going on at the virtual disk layer. With the same underlying physical disks, there is no problem in dom0 using the disks directly. There is also no problem with disk pass through. This seems to really point to an issue in the virtual disk layer. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

koehlerb commented 2 years ago

Doesn't seem to make a difference.

I've decided to go with "RAW" disks which have better initial performance.

Thanks everyone for the suggestions and replies.

Kalloritis commented 2 years ago

Could you also offer up some context as to why you were chasing this initial performance? To Oliver's point, this is not normally seen felt when normally interacting with a VM's storage through the backends. I could maybe see this for things like DBs with large amounts of additions or to make sure time series DB's don't suffer from IO latency, but not much beyond that.

koehlerb commented 2 years ago

The reason is to speed up a migration to XCP-ng. I have 4TB of data to move over and worked out that it will take about 70 hours using a VHD disk and only 11 hours using a RAW disk. I don't know the internals, but it really seems odd that extending a VHD really slows things down, whereas overwriting existing blocks in a VHD is OK.

I think for my application, I don't really need the additional features of the VHD disk, so I'll just use the RAW format.