quickemu-project / quickemu

Quickly create and run optimised Windows, macOS and Linux virtual machines
MIT License
10.7k stars 469 forks source link

Windows 10 VM won't launch after upgrade from 4.3 to 4.4 #572

Closed matthewadie closed 1 year ago

matthewadie commented 2 years ago

Expected behaviour

In quickemu 4.4, when launching my Windows 10 VM I expect it to launch as it did in quickemu 4.3

Actual behaviour

It shows the TianoCore bios, tries to load for a couple of seconds then dies. If I do it a second time it brings me to the windows repair screen, but cannot be repaired.

When I revert to quickemu 4.3 (by restoring a btrfs snapshot from before my update) everything works normally.

This occurs on my desktop (the machine I'm outputting here), and on my newer AMD Ryzen 5 HP Envy x360

Steps to reproduce the behaviour

Upgrade to 4.4 and launch a Windows 10 VM

Quickemu output

quickemu -vm windows-10.conf --display spice   1  Quickemu 4.4 using /usr/bin/qemu-system-x86_64 v7.1.0

Linux Distribution & Kernel

LSB Version: n/a Distributor ID: Arch Description: Arch Linux Release: rolling Codename: n/a

Linux DoctorDisco-ARCH 6.0.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 15 Oct 2022 14:00:49 +0000 x86_64 GNU/Linux

github-actions[bot] commented 2 years ago

Hello there 👋 Thanks for submitting your first issue to the Quickemu project 🐛 We'll try and take a look at your issue soon ⏲

In the meantime you might want to join the Wimpys World Discord 🗣 where we have a large community of Linux 🐧 enthusiasts and passionate open source developers 🧑‍💻

You might also be interested in following Wimpys World Twitch 📡 channel where Wimpy streams let's code video, including this project, several times a week. A back catalog of past live stream and other Linux related content is available on Wimpys World YouTube 📺 channel.

Tremeschin commented 2 years ago

This is also happening with me on Windows 11, manually downgrading to quickemu 4.3 makes the VM to boot properly, pretty much standard settings I'm using.

lucyllewy commented 1 year ago

This is possibly due to a change I added for improved hard disk compatibility... (fb8deb10e8dab766696c173fd10a58d0749841ab)

I did test with Windows 10 and 11 and didn't find any issue with switching, but maybe your installation is an older one without the full suite of paravirtual drivers installed.

Try:

matthewadie commented 1 year ago

I'm afraid that it didn't work... It still fails to boot, and when I try to boot again it gives me the windows repair screen.

I also tried updating to the latest stable version: virtio-win-0.1.225

FYI my installation is about 5 months old...

Matthew

matthewadie commented 1 year ago

It does look like the disk compatability change is what did it -- it's the only difference in the qemu options between 4.3 and 4.4:

4.3 -device virtio-blk-pci,drive=SystemDisk -drive id=SystemDisk,if=none,format=qcow2,file=windows-10/disk.qcow2

4.4 -device virtio-scsi-pci,id=scsi0 \ -device scsi-hd,drive=SystemDisk,bus=scsi0.0,lun=0,rotation_rate=1 -drive id=SystemDisk,if=none,format=qcow2,discard=unmap,file=windows-10/disk.qcow2

Matthew

matthewadie commented 1 year ago

I found a solution:

I create a test qcow2 image to pass to the VM using virtio-scsi-pci to see if it could read it (with the SystemDisk still using virtio-blk-pci):

-device virtio-scsi-pci,id=scsi0 \ -device scsi-hd,drive=test,bus=scsi0.0,lun=0,rotation_rate=1 -drive id=test,if=none,format=qcow2,file=windows-10/test.qcow2

It read it fine, and it did something to the Windows install because now I can boot it fine using 4.4 and virtio-scsi-pci.

Matthew

matthewadie commented 1 year ago

Sorry maybe I shouldn't have closed the issue because this is more of a work around ...

TuxVinyards commented 1 year ago

I have done a lot of reading on this problem over the last week or so. As usual, there's a lot of both out-of-date & conflicting information to contend with. And I can't say that the docs at qemu.org are that clear either ...

This solution works and throws no errors:

Open usr/bin/quickemu in administrator mode and add the following to ver.4.4 at line 1104, just before the else

elif [ "${guest_os}" == "windows" ]; then

      # shellcheck disable=SC2054,SC2206

     args+=(-device virtio-blk-pci,drive=SystemDisk
           -drive id=SystemDisk,if=none,format=qcow2,discard=unmap,file="${disk_img}" ${STATUS_QUO})

But, before I suggest this goes to pull request, I think we need some informed and experienced discussion.

First up, and contrary to what is stated in https://github.com/quickemu-project/quickemu/pull/569 which caused this problem to start with, I am reading that virtio-blk-pci does support TRIM and unmap. It may be me though & I am reading it all wrong. I don't have a lot of experience with the Qemu api (yet) :rofl: Go easy on me on this one.

https://chrisirwin.ca/posts/discard-with-kvm-2020/

https://github.com/virtio-win/kvm-guest-drivers-windows/issues/392

https://github.com/stefanha/qemu/commit/caa1ee43131c060347b32893abd41fe4865eaa2e

And here, where it also says virtio-scsi is slower than virtio-blk:

https://www.qemu.org/2021/01/19/virtio-blk-scsi-configuration/

Then we get to where Qemu say the 'modern way' is not even to use '-drive' but to use '-blockdev' :exploding_head:

https://www.qemu.org/docs/master/system/invocation.html?highlight=discard%20unmap#hxtool-1

I have to say that I spent a lot of time playing around with trying to set 'blockdev' and didn't get anywhere. IMHO wishful thinking on behalf of some of the people at qemu.org as 'drive' seems much easier to work with.

So, discussion:

Does TRIM work with virtio-blk-pci or are all the references talking about another type of 'blk' altogether. Could be the case ...

Or should the whole of https://github.com/quickemu-project/quickemu/pull/569 be reverted and the elif [ "${guest_os}" == "windows" ] not be there at all.

Or should the whole of quickemu be converted to using -blockdrv ....

TuxVinyards commented 1 year ago

While playing around with the args & such, as listed above, I wrote a few lines to show the args array in a more readable way. I thought this would make a pull request.

I also improved/fixed the --extra_args command.

This should help anyone else wanting to experiment, both with my discussion points & in general for any future work.

I decided to add the Windows fix. This can be a 'for now' solution. Or permanent ... But a fix is needed.

Given that I have added in an extra command to quickemu and it fixes the 'bump to 4.4' I thought this should be a bump to 4.5.

Until/if @flexiondotorg accepts this, the code lines are fairly easy to copy and add, if needed.

See pull request #588

flexiondotorg commented 1 year ago

Closed via https://github.com/quickemu-project/quickemu/commit/332f5b59f902e6bc866cdf5bd0b01cb348322183

matthewadie commented 1 year ago

I hate to say it but this fix has broken my ability to boot Windows 10 again... I had solved my 4.4 booting issues after the change to scsi but after upgrading quickemu to 4.5 it won't boot... If I roll back to 4.4 it still boots fine.

matthewadie commented 1 year ago

now that the disk has reverted back to" virtio-blk-pci":

-device virtio-blk-pci,drive=SystemDisk -drive id=SystemDisk,if=none,format=qcow2,file=windows-10/disk.qcow2

It doesn't recognize the "SystemDisk" that worked with "virtio-scsi-pci", and Windows 10 will no longer boot.

antonc42 commented 1 year ago

I believe I'm also affected by this issue. My Win 10 VM stopped booting after the upgrade to 4.5. Reverting to 4.4 makes it boot again.

matthewadie commented 1 year ago

Unfortunately I also can't seem to find any work around to make windows go back to working with "virtio-blk-pci". Should a new issue be created?

Matthew

stragu commented 1 year ago

Same here, update to 4.5 using the flexiondotorg ppa on Ubuntu 20.04 stopped my Win10 VM from working with the following conf:

#!/usr/bin/quickemu --vm
guest_os="windows"
disk_img="windows-10/disk.qcow2"
iso="windows-10/Win10_22H2_EnglishInternational_x64.iso"
fixed_iso="windows-10/virtio-win.iso"

Rolling back to 4.4 with the deb from the PPA resolved the issue: https://launchpad.net/~flexiondotorg/+archive/ubuntu/quickemu/+files/quickemu_4.4-1~focal1.0_all.deb

stragu commented 1 year ago

I tried the upgrade to 4.6, same issue: always going to Windows' blue screen recovery tool.

Unfortunately, the PPA doesn't hold version 4.4 anymore, so not sure how to roll back.

TuxVinyards commented 1 year ago

Behind the scenes, I have been quietly working on a modified version quickemu.

https://github.com/TuxVinyards/quickemu-mod

I think this could help you and others with these Windows problems. Obviously, I would like my work to be successful. I think it is a move forward for the quickemu codebase & at this point I am inviting a few people to try the beta .

There have been a lot of changes with quickemu's Hypervisor instructions between versions 4.2 and 4.6. According to qemu docs, if you build a machine using one set, then it may deploy badly when given another. It works a bit like when you do a physical hdd/sdd swap on a physical Windows machine. Or doesn't work, to put it more aptly.

I have built a HyperVisor recipe selector into the settings section that you may want to experiment with. Please make sure to use the snapshot functions ...

Use the [d] boot option when doing upgrades. Make sure to shutdown and snapshot after the update has installed. Then & only then, load the machine up again & press the 'restart now' .

Screenshot at 2023-02-22 12-59-04-1920

When it does do the restart, keep continually pressing the 'esc' till you get the Tiano core bios to show its menu. Then 'Boot manager' > 'misc-device' or 'qemu-harddisk' . Avoid 'Windows Boot manager'. That's when you get the blue screens.

Update 22 feb: Improved the UI and fixed a couple of bugs that somehow crept in

Let me know how you get on.

chasecovello commented 1 year ago

This issue came to my attention from my PR #696. I'm curious if anyone has had issues with "discard=unmap" on a working Windows 10/11 VM with a virtio-blk-pci disk. I haven't had any issues with that, but Windows will fail to boot if you change the disk type to something else, such as virtio-scsi-pci, as discussed here.

I can confirm that discard works with virtio-blk-pci, at least with qemu 6.2 and later. I have that enabled on several Windows, Linux, and Mac VMs without problem, and a "du" after issuing a trim from the OS does cause the disk image to shrink (although it's a sparse file, so the size doesn't appear to change from ls -l). I find trim support very helpful, especially after installing a huge Windows or Mac update which can sometimes expand the disk image by 20G or more.

I'm happy to help troubleshoot the specific "discard=unmap" option, but I also had a more general idea: we don't want a change in quickemu defaults to switch out the type of virtual disk device on a working VM. Windows is the most finicky about this, but going from virtio to, say, ahci changes the /dev node names on the BSDs and Linux as well. Linux usually boots just fine since most distros mount the root fs by UUID, but Free/NetBSD don't seem to have something like a UUID,. And even with Linux, you can get dumped to the EFI shell if the disk controller changes and it's not in the EFI boot list.

So I'd like to propose a config file option to specify the type of virtual disk. Something like:

disk_controller="virtio-blk-pci" | "ahci" | "ide" (or whatever other controllers are needed to support the various OSes)

quickget can write the current default to the config file when creating a new VM. That way, changing the defaults doesn't break existing VMs and quickemu can be more flexible with changing controllers in the future.

TuxVinyards commented 1 year ago

@chasecovello Pleased to see your comments.

I would probably want to go one step further and include the HyperVisor CPU args too. I would suggest a file in the main folder, such as 'qe-sys.conf' or similar. Also adding a few comment notes about the risks of changing the virtual system and of that upsetting Windows ...

I notice that Martin (@flexiondotorg) is yet to be convinced by your PR https://github.com/quickemu-project/quickemu/pull/696 . Possibly understandable as so many recent quickemu releases have broken things. For what it is worth, I am using "discard=unmap" without any problems in my 'quickemu-mod' code. Both Windows & MacOS load and run.

At the time that I wrote that code, I decided to use quickemu 4.4, the latest version, which itself has unmap as a general standard, and just revert the Windows section back a notch. I thought this would keep things more up-to-date.

In other words, ver4.4 but without PR https://github.com/quickemu-project/quickemu/pull/569, which is exactly what Martin did with ver 4.5. But I also added some other tweaks too:

The other big change that happened around that time was to the hypervisor section, which gets the Kernel VM to make Windows VM's think they are running inside a Microsoft environment. This happened when 'hv_passthrough' was added to quickemu 4.3.

From qemu docs hyperv .... hv-passthrough overrides all other ‘hv-‘ settings on the command line. Also, enabling this flag effectively prevents migration as the list of enabled enlightenments may differ between target and destination hosts.

https://www.qemu.org/docs/master/system/i386/hyperv.html

I also added the ability to revert to 4.2 hypervisor as a lot of people who built VM's with 4.2 found that they couldn't boot when that updated either.

Concluding, I think we need the ability to use the latest version of quickemu but also the ability to fix the virtual hardware clock so Windows doesn't think that we have switched machines.

I also think that this is generally why Window's updates only seem to work in SDL mode too. And often only by reverting the Tiano core as well. Even if the changes are virtual, Windows is seeing hardware changes when we reboot.

TuxVinyards commented 1 year ago

I have made a couple of changes to my qmod code to reflect the comments from @chasecovello

  ## DRIVE TRIM & HARDWARE

  # As noted in my comment https://github.com/quickemu-project/quickemu/issues/572#issuecomment-1313723715

  # Qemu itself seems to automatically add TRIM to the virtual drive ie "discard=unmap"

  # Ver 4.4 of quickemu added this as an instuction to the 'else' section which was picked up on by windows

  # BUT it also changed the drive from blk to scsi. This is the change that caused the problems, not the unmap.

  # @ 2023-05 following comment https://github.com/quickemu-project/quickemu/issues/572#issuecomment-1530824872

  # it seemed right, for completeness, to add some more options to this section.

  IgnoreTRIM=      #  Set =1  to not send the unmap command to Qemu  (although it's probably done internally anyway)

  ## All non-specified OS's  But NOT Windows. In qmod this set as 'virtio-blk' to avoid the 4.4 problems

  Default_SCSI=    #  If not specified, will use 'virtio-blk' as per ver 4.3 & 4.5 etc   Set =1 to use SCSI as 4.4.
chasecovello commented 1 year ago

@TuxVinyards thanks for that link to the qemu docs. I had always wondered about all those hv flags and couldn't find the documentation. Something like a "cpu_flags" directive in the VM conf file might be the way to go as well, to keep everything in a single config file per VM.

How does this relate to the different machine types (e.g., "pc-q35-6.2") you can pass to qemu? That seems to exist precisely to keep the hardware config stable as you migrate VMs, but I can't find any documentation of exactly what features the different hardware versions enable/disable.

TuxVinyards commented 1 year ago

The thing to note is that with setting 'hv_passthrough' on Qemu's -cpu args you are enabling all possible VT / VT-X etc features on the on actual physical CPU on which your VM is running. This means that when/if to move your VM to a different physical machine that a different virtual CPU may show up.

See https://en.wikipedia.org/wiki/X86_virtualization#Intelvirtualization(VT-x)

If you couple that with changes to the graphics and RAM you will find Windows falling over. This is kind of what happened with the quickemu 4.2 to 4.4 saga. If the initial build is with a restricted hypervisor you theoretically may have a more transportable VM. In practice you might find that Windows can cope with the idea of a CPU upgrade if you haven't upgraded all the other virtual components at the same time. Your mileage may vary ... as always.

It occurred to me this morning that both the qmod & qwrap codes already output the Virtual Hardware profile. This is the file "QemuArgsList.txt" that you can find in the VM folder:


  Present Working Directory:  /media/xxx/Files/VMQs

  Qemu:      /usr/bin/qemu-system-x86_64  7.2.0

  QuickEmu:  4.7

  Q wrap:    2023.05.19

  Date:      Fri 19 May 2023 18:15:32 CEST 

  -name windows-11,process=windows-11
  -pidfile windows-11/windows-11.pid
  -enable-kvm
  -machine q35,smm=on,vmport=off
  -no-hpet
  -global kvm-pit.lost_tick_policy=discard
  -global ICH9-LPC.disable_s3=1
  -cpu host,kvm=on,+hypervisor,+invtsc,l3-cache=on,migratable=no,hv_passthrough
  -smp cores=6,threads=2,sockets=1
  -m 10G

  -device virtio-balloon
  -vga none

  -device virtio-vga-gl
  -display sdl,gl=on
  -audiodev pa,id=audio0

  -device intel-hda

  -device hda-duplex,audiodev=audio0
  -rtc base=localtime,clock=host,driftfix=slew

  -device virtio-rng-pci,rng=rng0
  -object rng-random,id=rng0,filename=/dev/urandom

  -device qemu-xhci,id=spicepass
  -chardev spicevmc,id=usbredirchardev1,name=usbredir

  -device usb-redir,chardev=usbredirchardev1,id=usbredirdev1
  -chardev spicevmc,id=usbredirchardev2,name=usbredir

  -device usb-redir,chardev=usbredirchardev2,id=usbredirdev2
  -chardev spicevmc,id=usbredirchardev3,name=usbredir

  -device usb-redir,chardev=usbredirchardev3,id=usbredirdev3

  -device pci-ohci,id=smartpass

  -device usb-ccid
  -chardev spicevmc,id=ccid,name=smartcard

  -device ccid-card-passthru,chardev=ccid

  -device usb-ehci,id=input

  -device usb-kbd,bus=input.0
  -k en-us

  -device usb-tablet,bus=input.0

  -device virtio-net,netdev=nic
  -netdev user,hostname=windows-11,hostfwd=tcp::22220-:22,smb=/home/xxx/Public,id=nic
  -global driver=cfi.pflash01,property=secure,value=on
  -drive if=pflash,format=raw,unit=0,file=/usr/share/OVMF/OVMF_CODE_4M.secboot.fd,readonly=on
  -drive if=pflash,format=raw,unit=1,file=windows-11/OVMF_VARS.fd
  -drive media=cdrom,index=1,file=windows-11/virtio-win.iso

  -device virtio-blk-pci,drive=SystemDisk
  -drive id=SystemDisk,if=none,format=qcow2,file=windows-11/disk.qcow2
  -chardev socket,id=chrtpm,path=windows-11/windows-11.swtpm-sock
  -tpmdev emulator,id=tpm0,chardev=chrtpm

  -device tpm-tis,tpmdev=tpm0
  -monitor unix:windows-11/windows-11-monitor.socket,server,nowait
  -serial unix:windows-11/windows-11-serial.socket,server,nowait

  Secure Boot:  /usr/bin/swtpm 

  socket --ctrl type=unixio,path=windows-11/windows-11.swtpm-sock --terminate --tpmstate dir=windows-11 --tpm2 

I have now added a line to copy this file to a fixed record on first launch. I have also added the function to record any change snapshots into qwrap's utility section.

If you would to give this a try, perhaps you can let me know if you think this solves things.

https://github.com/TuxVinyards/quickemu-mod

kolAflash commented 3 weeks ago

I use -drive discard=unmap,detect-zeroes=unmap,media=disk,if=virtio,format=qcow2,... a lot without problems on Qemu+KVM VMs. Windows and Linux guests seem to work very well with it.

So I'd like to see an option to enable discard=unmap,detect-zeroes=unmap for quickemu VMs.
(for now I manually patched my quickemu script)

And a configurable option is probably the best, because sometimes you might like to disable this behavior, because it may cost a little performance.

 

P.S.
Same goes for -drive cache=unsafe,... which increases performance for the cost of possible data loss in case of a crash. (a good choice for VMs which hold no important data)