virtio-win / kvm-guest-drivers-windows

Windows paravirtualized drivers for QEMU\KVM
https://www.linux-kvm.org/page/WindowsGuestDrivers
BSD 3-Clause "New" or "Revised" License
2.05k stars 387 forks source link

windows gets low disk performances #67

Closed absinthetized closed 6 years ago

absinthetized commented 8 years ago

Please, have a look at this.

Shortly: It's a post where I report about some tests conducted on kvm/virtualbox to better understand a poor disk performance demonstrated by windows VMs under KVM on Centos 7.2.

Used qemu-kvm is coming from centos virt sig, virtio is 0.1.117 and host kernel is latest stock. Test is a basic iozone random-mix on small files.

I've seen kvm Windows guests provide poor performance in iozone random mix with 1 thread, so I've started performing some comparisons to understand where the flaw was.

As a summary: I've tested both a virtio scsi and an iscsi target (the latter is mounted from within the VM as additional storage and is NOT passed as libvirt storage). I've tested kvm VMs, both linux and Windows, plus a remote Windows workstation (for the iscsi target comparisons) along with a number of Virtual Box instances running on such workstation.

While Windows has good performances even on a remote iscsi targets either as host or Virtual Box Guest, in kvm only linux can provide acceptable thuroughput.

Windows in kvm "fails" both on vscsi or (local) iscsi targets.

Maybe I can use the post as a starting point to provide you more info and let me fix the issue. server fault community simply suggests to wait for the upcoming centos 7.3... while this can be a good advice IMHO it is a bit over-simplistic to classify the issue as "kvm+virtio sucks on centos 7.2".

thank you!

vrozenfe commented 8 years ago

Thanks a lot for reporting the issue. Is it some sort of regression in 0.1.117 or it is a permanent problem? How vioscsi is bad comparing with viostor and IDE? Can you post the obtained results along with Windows and Linux VMs command lines?

Best regards, Vadim.

absinthetized commented 8 years ago

OK, I'll do some tests and I'll use this post as an ongoing effort populated on the fly by a number of results. There is no hyper-v features but those have shown marginal improvement, not 3x or 2x.

test cmd line is: iozone -i 0 -i 8 -t 1 -s 4m iozone is run inside the tested drive.

this is the KVM:

[root@dl190g9 ~]# yum info qemu-kvm
Plugin abilitati:fastestmirror
Determining fastest mirrors
 * base: mirrors.prometeus.net
 * extras: mirrors.prometeus.net
 * updates: mirrors.prometeus.net
Pacchetti disponibili
Nome         : qemu-kvm
Arch         : x86_64
Epoch        : 10
Versione     : 1.5.3
Rilascio     : 105.el7_2.4
Dimensione   : 1.8 M
Repo         : updates/7/x86_64
Sommario     : QEMU is a FAST! processor emulator
URL          : http://www.qemu.org/
Licenza      : GPLv2+ and LGPLv2+ and BSD
Descrizione  : qemu-kvm is an open source virtualizer that provides hardware emulation for
             : the KVM hypervisor. qemu-kvm acts as a virtual machine monitor together with
             : the KVM kernel modules, and emulates the hardware for a full system such as
             : a PC and its assocated peripherals.
             : 
             : As qemu-kvm requires no host kernel patches to run, it is safe and easy to use.

lets do some benches with raw LVM parittions. Those LVM are passed to the VM and formatted in NTFS with default params. cache is None IO is native. Windows virtio is 0.1.102.

Ubuntu 14.04 LTS VirtIO drive - used as reference system

/usr/libexec/qemu-kvm -name postgres -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -cpu Haswell-noTSX -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 6cf8bccd-db34-44cc-9c56-8bca0f806aa6 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-postgres/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x5.0x7 -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x5 -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x5.0x1 -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x5.0x2 -drive file=/dev/virt/postgres,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-ide0-0-1,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -netdev tap,fd=23,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:57:48:b7,bus=pci.0,addr=0x3 -netdev tap,fd=27,id=hostnet1 -device rtl8139,netdev=hostnet1,id=net1,mac=52:54:00:e8:26:df,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 0.0.0.0:2 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
   <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/virt/postgres'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
File size set to 4096 KB
    Command line used: iozone -i 0 -i 8 -t 1 -s 4m
    Output is in Kbytes/sec
    Time Resolution = 0.000001 seconds.
    Processor cache size set to 1024 Kbytes.
    Processor cache line size set to 32 bytes.
    File stride size set to 17 * record size.
    Throughput test with 1 process
    Each process writes a 4096 Kbyte file in 4 Kbyte records

    Children see throughput for  1 initial writers  =  814953.12 KB/sec
    Parent sees throughput for  1 initial writers   =   32151.22 KB/sec
    Min throughput per process          =  814953.12 KB/sec 
    Max throughput per process          =  814953.12 KB/sec
    Avg throughput per process          =  814953.12 KB/sec
    Min xfer                    =    4096.00 KB

    Children see throughput for  1 rewriters    = 1122148.12 KB/sec
    Parent sees throughput for  1 rewriters     =   16293.54 KB/sec
    Min throughput per process          = 1122148.12 KB/sec 
    Max throughput per process          = 1122148.12 KB/sec
    Avg throughput per process          = 1122148.12 KB/sec
    Min xfer                    =    4096.00 KB

    Children see throughput for 1 mixed workload    = 2760348.00 KB/sec
    Parent sees throughput for 1 mixed workload     =   65072.60 KB/sec
    Min throughput per process          = 2760348.00 KB/sec 
    Max throughput per process          = 2760348.00 KB/sec
    Avg throughput per process          = 2760348.00 KB/sec
    Min xfer                    =    4096.00 KB
$> lspci
...
00:06.0 SCSI storage controller: Red Hat, Inc Virtio block device

Windows 8.1 attached 8GB IDE drive

/usr/libexec/qemu-kvm -name win8_scsi_test -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 02cb8b1d-57c7-a6b4-23d9-c959517b0c9d -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-win8_scsi_test/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -drive file=/dev/virt/iscsi_test,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/dev/virt/virtio_test,if=none,id=drive-ide0-0-1,format=raw,cache=none,aio=native -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -drive file=/var/lib/libvirt/images/virtio-win.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=33,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:36:7c:56,bus=pci.0,addr=0x3 -netdev tap,fd=34,id=hostnet1,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:01:de:12,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:4 -k it -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/virt/virtio_test'/>
      <target dev='hdb' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>

Windows device manager screenshot

        Run began: Mon May 30 08:34:39 2016

        File size set to 4096 KB
        Command line used: /Iozone3.405/iozone -i 0 -i 8 -t 1 -s 4m
        Output is in Kbytes/sec
        Time Resolution = 0.000006 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Throughput test with 1 process
        Each process writes a 4096 Kbyte file in 4 Kbyte records

        Children see throughput for  1 initial writers  =  502857.50 KB/sec
        Parent sees throughput for  1 initial writers   =    6232.40 KB/sec
        Min throughput per process                      =  502857.50 KB/sec
        Max throughput per process                      =  502857.50 KB/sec
        Avg throughput per process                      =  502857.50 KB/sec
        Min xfer                                        =    4096.00 KB

        Children see throughput for  1 rewriters        =  676751.25 KB/sec
        Parent sees throughput for  1 rewriters         =    5796.95 KB/sec
        Min throughput per process                      =  676751.25 KB/sec
        Max throughput per process                      =  676751.25 KB/sec
        Avg throughput per process                      =  676751.25 KB/sec
        Min xfer                                        =    4096.00 KB

        Children see throughput for 1 mixed workload    =  508243.78 KB/sec
        Parent sees throughput for 1 mixed workload     =  425152.71 KB/sec
        Min throughput per process                      =  508243.78 KB/sec
        Max throughput per process                      =  508243.78 KB/sec
        Avg throughput per process                      =  508243.78 KB/sec
        Min xfer                                        =    4096.00 KB

Windows 8.1 attached 8GB scsi drive (virt-scsi)

   <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/virt/virtio_test'/>
      <target dev='sdb' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>

Windows device manager screenshot

   Run began: Mon May 30 09:24:35 2016

   File size set to 4096 KB
   Command line used: /Iozone3.405/iozone -i 0 -i 8 -t 1 -s 4m
   Output is in Kbytes/sec
   Time Resolution = 0.000005 seconds.
   Processor cache size set to 1024 Kbytes.
   Processor cache line size set to 32 bytes.
   File stride size set to 17 * record size.
   Throughput test with 1 process
   Each process writes a 4096 Kbyte file in 4 Kbyte records

   Children see throughput for  1 initial writers  =  422092.84 KB/sec
   Parent sees throughput for  1 initial writers   =    9668.59 KB/sec
   Min throughput per process                      =  422092.84 KB/sec
   Max throughput per process                      =  422092.84 KB/sec
   Avg throughput per process                      =  422092.84 KB/sec
   Min xfer                                        =    4096.00 KB

   Children see throughput for  1 rewriters        =  650162.56 KB/sec
   Parent sees throughput for  1 rewriters         =    8174.03 KB/sec
   Min throughput per process                      =  650162.56 KB/sec
   Max throughput per process                      =  650162.56 KB/sec
   Avg throughput per process                      =  650162.56 KB/sec
   Min xfer                                        =    4096.00 KB

   Children see throughput for 1 mixed workload    =  447388.50 KB/sec
   Parent sees throughput for 1 mixed workload     =  380579.27 KB/sec
   Min throughput per process                      =  447388.50 KB/sec
   Max throughput per process                      =  447388.50 KB/sec
   Avg throughput per process                      =  447388.50 KB/sec
   Min xfer                                        =    4096.00 KB

Windows 8.1 attached 8GB scsi drive (virtIO_blk)

/usr/libexec/qemu-kvm -name win8_scsi_test -S -machine pc-i440fx-rhel7.2.0,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 02cb8b1d-57c7-a6b4-23d9-c959517b0c9d -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-win8_scsi_test/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot menu=off,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x5 -drive file=/dev/virt/iscsi_test,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -drive file=/var/lib/libvirt/images/virtio-win.iso,if=none,id=drive-scsi0-0-0-1,readonly=on,format=raw -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -drive if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/dev/virt/virtio_test,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=33,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:36:7c:56,bus=pci.0,addr=0x3 -netdev tap,fd=34,id=hostnet1,vhost=on,vhostfd=35 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:01:de:12,bus=pci.0,addr=0x4 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:4 -k it -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -msg timestamp=on
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/virt/iscsi_test'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>

Windows device manager screenshot

        Run began: Mon May 30 09:46:01 2016

        File size set to 4096 KB
        Command line used: /Iozone3.405/iozone -i 0 -i 8 -t 1 -s 4m
        Output is in Kbytes/sec
        Time Resolution = 0.000006 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Throughput test with 1 process
        Each process writes a 4096 Kbyte file in 4 Kbyte records

        Children see throughput for  1 initial writers  =  489183.78 KB/sec
        Parent sees throughput for  1 initial writers   =   10275.42 KB/sec
        Min throughput per process                      =  489183.78 KB/sec
        Max throughput per process                      =  489183.78 KB/sec
        Avg throughput per process                      =  489183.78 KB/sec
        Min xfer                                        =    4096.00 KB

        Children see throughput for  1 rewriters        =  592971.62 KB/sec
        Parent sees throughput for  1 rewriters         =    9243.52 KB/sec
        Min throughput per process                      =  592971.62 KB/sec
        Max throughput per process                      =  592971.62 KB/sec
        Avg throughput per process                      =  592971.62 KB/sec
        Min xfer                                        =    4096.00 KB

        Children see throughput for 1 mixed workload    =  464378.28 KB/sec
        Parent sees throughput for 1 mixed workload     =  403773.31 KB/sec
        Min throughput per process                      =  464378.28 KB/sec
        Max throughput per process                      =  464378.28 KB/sec
        Avg throughput per process                      =  464378.28 KB/sec
        Min xfer                                        =    4096.00 KB

Comment

These tests comfirm my opinion: I'm doing it wrong! how is it possible that a stable virtIO drivers suite gets exactly the same performance even if I use IDE vs SCSI vs VIRTIO?!

vrozenfe commented 8 years ago

Thank you for your prompt reply.

First of all, please turn hv_time vCPU flag on, and make sure that guest doesn't use HPET or "useplatformclock" is enabled. In addition, try enabling hv_vapic as well (if your host CPU comes without APICv) Vadim.

absinthetized commented 8 years ago

OK,

new bench with virtIO and the following snippet added to xml:

 <clock >  
  <timer name='hypervclock' present='yes'/>  
 </clock>
<features>  
  <hyperv>  
   <vapic state='on'/>  
  </hyperv>  
 <features/> 

no trace of HPET or useplatformclock in original XML Results don't change...

   Run began: Mon May 30 10:37:31 2016

    File size set to 4096 KB
    Command line used: /Iozone3.405/iozone -i 0 -i 8 -t 1 -s 4m
    Output is in Kbytes/sec
    Time Resolution = 0.000005 seconds.
    Processor cache size set to 1024 Kbytes.
    Processor cache line size set to 32 bytes.
    File stride size set to 17 * record size.
    Throughput test with 1 process
    Each process writes a 4096 Kbyte file in 4 Kbyte records

    Children see throughput for  1 initial writers  =  277843.78 KB/sec
    Parent sees throughput for  1 initial writers   =   10234.10 KB/sec
    Min throughput per process                      =  277843.78 KB/sec
    Max throughput per process                      =  277843.78 KB/sec
    Avg throughput per process                      =  277843.78 KB/sec
    Min xfer                                        =    4096.00 KB

    Children see throughput for  1 rewriters        =  649932.06 KB/sec
    Parent sees throughput for  1 rewriters         =   13404.88 KB/sec
    Min throughput per process                      =  649932.06 KB/sec
    Max throughput per process                      =  649932.06 KB/sec
    Avg throughput per process                      =  649932.06 KB/sec
    Min xfer                                        =    4096.00 KB

    Children see throughput for 1 mixed workload    =  535083.25 KB/sec
    Parent sees throughput for 1 mixed workload     =  449170.28 KB/sec
    Min throughput per process                      =  535083.25 KB/sec
    Max throughput per process                      =  535083.25 KB/sec
    Avg throughput per process                      =  535083.25 KB/sec
    Min xfer                                        =    4096.00 KB

Still thinking about an error on my side... attached you find the full XML

As additional reference here is a win7 64bit (virtIO + hyperV), write and rewrite improve 3x and somewhat like 1.5x respectively. random write still low. In win7 write and rewrite are something like 2k KB/sec slower (maybe virtualization overhead is higher than linux) but random still lags too much IMHO.

        Run began: Mon May 30 12:47:50 2016

        File size set to 4096 KB
        Command line used: /Iozone3.405/iozone -t 1 -i 0 -i 8 -s 4m
        Output is in Kbytes/sec
        Time Resolution = 0.000011 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Throughput test with 1 process
        Each process writes a 4096 Kbyte file in 4 Kbyte records

        Children see throughput for  1 initial writers  =  672869.00 KB/sec
        Parent sees throughput for  1 initial writers   =   12067.84 KB/sec
        Min throughput per process                      =  672869.00 KB/sec
        Max throughput per process                      =  672869.00 KB/sec
        Avg throughput per process                      =  672869.00 KB/sec
        Min xfer                                        =    4096.00 KB

        Children see throughput for  1 rewriters        =  863882.69 KB/sec
        Parent sees throughput for  1 rewriters         =   16252.46 KB/sec
        Min throughput per process                      =  863882.69 KB/sec
        Max throughput per process                      =  863882.69 KB/sec
        Avg throughput per process                      =  863882.69 KB/sec
        Min xfer                                        =    4096.00 KB

        Children see throughput for 1 mixed workload    =  562315.56 KB/sec
        Parent sees throughput for 1 mixed workload     =  161441.74 KB/sec
        Min throughput per process                      =  562315.56 KB/sec
        Max throughput per process                      =  562315.56 KB/sec
        Avg throughput per process                      =  562315.56 KB/sec
        Min xfer                                        =    4096.00 KB
vrozenfe commented 6 years ago

We have completed performance testing cycle some time ago. (https://bugzilla.redhat.com/show_bug.cgi?id=1023894#c11) No issues have been found so far.

Closing this issue for now.

If anyone experiences the similar problems when running the resent virtio block or scsi drivers on the latest stable upstream qemu, please feel free to reopen it again or open a new issue.

Vadim.