inga-lovinde / RadeonResetBugFix

Radeon Reset Bug fix service
https://oomza.cutegay.software/inga-lovinde/RadeonResetBugFix
Apache License 2.0
111 stars 8 forks source link

AMD Radeon 780M #6

Open forceclosed opened 2 months ago

forceclosed commented 2 months ago

Hello inga_lovinde,

Issue: I followed the instructions to this and startup works great. The issue begins after a reboot/shutdown/sleep of the VM. The VM is not able to start again until the host machine is rebooted.

Device: Beelink SER7 with AMD Radeon 780M Graphics Windows 11 Pro

Startup and Diagnose Logs: radeonfix_20240930_215717.log radeonfix_20240930_215905.log radeonfix_20240930_220617.log

Any advice is greatly appreciated, Thanks!

oznakn commented 1 month ago

Hello,

I’ve the very same problem with my system. Have you find any solution for this?

Thanks in advance.

inga-lovinde commented 1 month ago

Sorry for taking so long. It seems that in the recent AMD GPUs (or in the recent drivers) they introduced many more subdevices which should be disabled on shutdown but aren't (because this tool is not aware that they are related to AMD GPU), so the GPU itself is not shut down gracefully. I'll release an updated version, with improved AMD device detection logic, soon (as soon as I have access to a PC with VS.NET).

oznakn commented 1 month ago

Hello,

I semi-solved the problem by adding a third device, specifically 0000:c5:00.6 to the PCI passthrough config. Now, I can successfully reboot the host machine without any problem (this was not possible before). So no rebooting in the windows VM, and issue is semi-solved.

inga-lovinde commented 1 month ago

@oznakn that's interesting, what was the device called? I vaguely remember that on my host, (1) GPU device node was duplicated, and maybe (2) there was a separate HDMI audio device node, and I had to passthrough all of them in order to give guest full control over the actual GPU device, instead of splitting it between host and guest.

If this is the same for you, then I'll update the readme, this at least won't require PC with VS.NET from me :)

oznakn commented 1 month ago

Let me get back to you about this. However, if I passthrough all GPU devices, the host goes into a bootloop. I tried many combinations, and only one that works is passthrough 3 devices (main gpu, audio, and one more).

And also I want to state that even with RadeonResetBugFix rebooting vm does not work. So it's like:

inga-lovinde commented 1 month ago

However, if I passthrough all GPU devices, the host goes into a bootloop.

Not sure if I understand you correctly. Of course you should not pass through all GPU devices that are there (if you have multiple physical GPUs); you should pass through all device nodes that come from the physical AMD GPU you're trying to passthrough.

I can easily imagine that the host doesn't boot if it doesn't have any available GPUs. You'd need to somehow configure your host to boot in headless mode, and I'm not sure that Windows hosts even support this. So if your host is Windows-based, you'll need at least two physical GPUs (one to be used by the host, and another to be passed through to VM). And you only should pass through device nodes related to the second GPU, not to the first GPU.

Passthrough 3 devices + With RadeonResetBugFix: No reboot on vm, but can reboot on host

What do you mean by "no reboot on VM, but can reboot on host"?

oznakn commented 1 month ago

oh, let me provide more information.

I'm using a debian as the host OS. It does not have any graphics output, I just use proxmox using a browser. With this, I don't need any extra GPU. (Btw I'm using the same system as the issue author uses.)

When I passthrough only 2 devices I cannot reboot the host computer, with or without RadeonResetBugFix. But, if I passthrough 3 devices I can reboot the host computer. However, It's still problematic to reboot the vm.

So what's happening is since the host computer does not use GPU at all, when we try to reboot the Windows VM, GPU got stuck due to the reset bug. However, if I directly reboot the host computer, then it's working (only if I passthrough 3 devices).

What I was thinking that is probably like you said GPU itself is not shut down gracefully. And my gut says if RadeonResetBugFix can also reset the third device I passthroughed, probably it will be possible to reboot the windows VM.

inga-lovinde commented 1 month ago

@oznakn so what do you mean by

However, if I passthrough all GPU devices, the host goes into a bootloop.

?

This conflicts with your words that the host can be rebooted without any problems if you passthrough three devices. I'm just trying to understand what's going on here.

And also, this is a weird behavior you're seeing with passing through two devices, because IIRC the essence of the bug is that trying to initialize the GPU twice in the same host session (unless the GPU was shut down gracefully) causes the host to lock up. But if you just start the VM once and then try to reboot the host, this shouldn't be a problem (as long as it is a full reboot and not just reinit/reroot, not sure what's the right term for Linux).

So with "No reboot on both host and vm", what exactly is the problem? When does it hang up? On shutdown, or on the next startup?

The only possibility I can think about of why would it hang on shutdown is if for some reason the host tries to use / initialize the GPU after the guest released it. Then it would also hang when you shut down the guest without trying to reboot the host, or when you ungracefully "power off" the guest. Does this match your experience?

xiaomujiayou commented 1 month ago

Hello,

I semi-solved the problem by adding a third device, specifically 0000:c5:00.6 to the PCI passthrough config. Now, I can successfully reboot the host machine without any problem (this was not possible before). So no rebooting in the windows VM, and issue is semi-solved.

根据你的提示,我在8845HS上测试,win10、win11可以正常开关机、重启,thanks