Explore Venus + MoltenVK for GPU acceleration

osy commented 1 year ago

Currently we use VirGL + ANGLE to translate GL (guest) to Metal (host). This works decently (on Linux) but the downside is that it’s buggy (crashes) and more modern Linux applications and games are moving to Vulkan.

Venus translate guest Vulkan calls to host Vulkan calls.

MoltenVK translates host Vulkan calls to Metal calls.

It is worth exploring this pairing to see if it’s a) more stable and b) more performant.

Note that neither solution currently has Windows guest support so that will have to be developed separately.

tifasoftware commented 1 year ago

Could DXVK be used to also translate DirectX to Vulkan?

osy commented 1 year ago

Yes but that requires significant more work on windows side

IComplainInComments commented 1 year ago

Could DXVK be used to also translate DirectX to Vulkan?

Its more beneficial to just use DXVK on a Linux VM using Steam's Proton, as it would have everything needed already.

tifasoftware commented 1 year ago

Could DXVK be used to also translate DirectX to Vulkan?

Its more beneficial to just use DXVK on a Linux VM using Steam's Proton, as it would have everything needed already.

Yeah, thats is one way to go with it. However, I think there should be something that could benefit programs that only work in Windows (and not in WINE/Proton), as well as emulating areo in Vista/7.

osy commented 1 year ago

Attempted this in https://github.com/utmapp/UTM/tree/feature/venus-support and hit a blocker. Managed to build everything but there's missing support on macOS/HVF side.

From the Venus docs in Mesa:

The Venus renderer makes assumptions about VkDeviceMemory that has VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT. The assumptions are illegal and rely on the current behaviors of the host drivers. It should be possible to remove some of the assumptions and incrementally improve compatibilities with more host drivers by imposing platform-specific requirements. But the long-term plan is to create a new Vulkan extension for the host drivers to address this specific use case.

The Venus renderer assumes a device memory that has VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT can be exported as a mmapable dma-buf (in the future, the plan is to export the device memory as an opaque fd). It chains VkExportMemoryAllocateInfo to VkMemoryAllocateInfo without checking if the host driver can export the device memory.

The dma-buf is mapped (in the future, the plan is to import the opaque fd and call vkMapMemory) but the mapping is not accessed. Instead, the mapping is passed to KVM_SET_USER_MEMORY_REGION. The hypervisor, host KVM, and the guest kernel work together to set up a write-back or write-combined guest mapping (see virtio_gpu_vram_mmap of the virtio-gpu kernel driver). CPU accesses to the device memory are via the guest mapping, and are assumed to be coherent when the device memory also has VK_MEMORY_PROPERTY_HOST_COHERENT_BIT.

While the Venus renderer can force a VkDeviceMemory external, it does not force a VkImage or a VkBuffer external. As a result, it can bind an external device memory to a non-external resource.

What this means is that it requires a feature in the Linux kernel (UDMA buffers) which allows QEMU to DMA map memory in a way that GBM/minigbm can access. This way Vulkan can render directly to host device memory.

There's missing support for this all across the board from macOS to MoltenVK. So significant effort would have to be put in to either 1) change the render target to a Metal surface and do some weird guest->host passthrough or 2) port minigbm to use the Metal APIs. There may be other ways but I'm not experienced in the Linux graphics stack.

I think the more promising approach is to use Google android emulator's gfxstream technology which allows Vulkan commands to be serialized and streamed directly from guest to host. Since it already has M1 support, it could be easier. However the challenge is to get it working 1) on QEMU and 2) on vanilla Linux (there are a lot of Android ifdefs in the code).

DUOLabs333 commented 1 year ago

@osy I tried building from your fork of virglrenderer, but I couldn't get Venus to compile: gbm.h is missing, or is this what you meant by a "lack of support"?

zaptrem commented 1 year ago

@osy Could Apple's new D3DMetal make graphics acceleration support any easier?

tifasoftware commented 1 year ago

As long as apple license permits it

osy commented 1 year ago

@zaptrem it doesn't change anything for our purposes however in theory it may open up a path of ParavirtualizedGraphics (used in macOS guests for GPU virtualization through Metal) to Linux/Windows via D3DMetal. However, my hunch is that it would be much much harder to do that than Venus + MoltenVK or gfxstream + MoltenVK (the current plan of action).

DUOLabs333 commented 1 year ago

Hey @osy, I've been following the work on gfxstream (I've been trying independently to add Vulkan by patching virglrenderer). For vkcube to work, mesa's VENUS driver needs some extensions that MoltenVK can't implement. How are you planning to get around that (I'm seeing some references to opengl-goldfish. Is that the replacement for mesa?)

osy commented 1 year ago

@DUOLabs333 no that’s why I said “ gfxstream + MoltenVK (the current plan of action)”

DUOLabs333 commented 1 year ago

I've been working on this for a while now, and I got far enough that I can see the Draw commands being executed in the log (nothing on screen though, only a black window). However, when I updated mesa from 23.0 to 23.1, everything broke and I had to start all over. I was able to fix some of the problems, but I got an assertion crash: line with assert(isv) in target/arm/hvf/hvf.c, that occurs after the guest requests a blob to be mapped. I determined the mapping operation itself is not the problem, but something on the guest. Do you know any situations where such a crash would occur?

osy commented 1 year ago

Check out https://gitlab.com/qemu-project/qemu/-/issues/1611

DUOLabs333 commented 1 year ago

Ah, I see (I've been following the issue, but I've been skimming it). This is obviously much outside my area of expertise, but from my understanding, what seems to be happening is this: when virgl_renderer_map_blob is called, and the shmem is mapped, the physical address corresponding to the blob on the host isn't exposed as mapped to the guest. So, when an instruction tries to operate on the address, it returns some error. QEMU catches the error, and figures out how to apply the instruction on the corresponding host memory address.

The problem is that QEMU isn't doing that last part, and is just erroring. Did I get it right?

In any case, I wonder what changed between the two versions to trigger this.

osy commented 1 year ago

The problem is that when memory is mapped as MMIO, it will always trap and fail to decode what to do (ISV=0) if it’s a uncommon instruction like an atomic store or LDP or a cache line copy or something. Therefore it needs to be mapped as direct memory which should not trap at all.

DUOLabs333 commented 1 year ago

How would I do this on MacOS? I looked at it briefly when first starting (I got very confused, and just used shmem instead). There doesn't seem to be anything analogous with Linux's dma-buf, and I can't find a way to get a memory address I can use memcpy and friends on directly (I'm guessing it has something to do with IODMACommand, but I have no idea what to do with that).

osy commented 1 year ago

It would be a lot of work. I'm afraid I am no help there. I also took a look and gave up due to the amount of work that would be required.

DUOLabs333 commented 1 year ago

Ok, here's what I got:

Create IODMACommand instance.
init instance
Call getMemoryDescriptor on the class instance
Call getPhysicalAddress on the descriptor

I'm not sure how to convert this address into a file though, so virglrenderer can mmap seamlessly.

DUOLabs333 commented 1 year ago

I think I got it: I can use funopen to make a psuedo file, which can implement the descriptor's operations transparently when being mmaped.

DUOLabs333 commented 1 year ago

This is weird though --- the code path which leads to the error (which notably, I never reached before, which explains why I never gotten this error before), specifically wants shmem. If this was a problem with QEMU, why hasn't this been caught before?

DUOLabs333 commented 1 year ago

The problem exists even if you make a temporary tmpfile, instead of a shmem.

DUOLabs333 commented 1 year ago

Ok, I made a first version of using DMA instead of shmems, but I'm stuck at including the path from Kernel.framework.

If I include <Kernel/IOKit/IODMACommand.h>, then compilation fails, because that file includes <IOKit/IOCommand.h>. The problem is that MacOS looks for IOCommand in IOKit, where it doesn't exist. However, it does exist under Kernel/IOKit/IOCommand.h

DUOLabs333 commented 1 year ago

Apparently, I had to clean out my virglrenderer build folder before the -I option took hold. However, I immediately ran into another blocker. IODMACommand is only for C++, but qemu is written in C. We would have to include to compile a wrapper.

DUOLabs333 commented 1 year ago

I've written the wrapper, but I've gotten some errors around APPLE_KEXT_OVERRIDE. This might mean that we would have to make a kext for UTM/qemu, which might not be desirable.

DUOLabs333 commented 1 year ago

Ok, I rewrote it to use DriverKit, but now IOBufferMemoryDescriptor::Create fails with kIOReturnNotReady, which is a weird message to get (I thought I would have gotten something about permissions). I added the com.apple.developer.driverkit entitlement, just to be safe.

DUOLabs333 commented 1 year ago

Ok, since DriverKit seems to REQUIRE a driver to use any of its functions (or at least some special setup), I rewrote the DMA code once again to use IOSurface.

However, now I realize that fileno can't create file descriptors for file pointers. So, is there a way to get file descriptor to either void pointers or IOSurfaces?

DUOLabs333 commented 1 year ago

I got mesa working with IOSurface to work with mesa, but I still get the assertion error. Is there something else I'm not doing?

upintheairsheep commented 1 year ago

A full implementation of DirectX 12 to 9 is present via Apple Game Porting Toolkit

https://www.reddit.com/r/macgaming/comments/142tomx/apples_game_porting_toolkit_seems_to_have_a/

tifasoftware commented 1 year ago

We should take a look at this if the license allowsOn Aug 22, 2023, at 6:03 PM, upintheairsheep @.***> wrote: A full implementation of DirectX 12 to 9 is present via Apple Game Porting Toolkit https://www.reddit.com/r/macgaming/comments/142tomx/apples_game_porting_toolkit_seems_to_have_a/

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

baryluk commented 9 months ago

I am new to mac (was using Linux for over 20 years), but got Debian Linux running on UTM, and works nicely.

I am also interested in Venus.

Another benefit of venus over virgl, would be better handling of multiple OpenGL apps (when running Zink on top of venus to provide GL) in the guest. With virgl they are all funneled to a single host side OpenGL context. This has issues with buffer flip / sync, and due to OpenGL highly synchronous nature (at least in virgl world), causes stutters when one has two OpenGL apps open (i.e. glxgears + benchmark) - I have seen this with Linux guest on Linux host. With venus, each open of a device instance on a host, is mapped to open on the host, and all contexts are separate as they would be natively, and no more stutter.

Also, Zink implements OpenGL 4.6 on suitably modern Vulkan driver (I do not know if Zink works on MoltenVK as of the moment, but there was a bit of work on this in the past - but get blocked mostly due to Mesa requiring some features that are simply not in the macos. If Zink runs on the guest, and MoltenVK on the host, then this should not be a problem tho).

Of course dxvk and others should work too (with suitable work on the guest side, for things like page size differences).

DUOLabs333 commented 8 months ago

QEMU 8.2.0 was just released, with the Android Emulator's rutabaga merged. rutabaga supports Vulkan; however, from what I can tell, MacOS support hasn't been fully finalized (it's likely that it will come eventually).

I am working on another approach that doesn't require changes to QEMU --- the tradeoff is that it is slower (how much slower is to be seen).

utmapp / UTM

Explore Venus + MoltenVK for GPU acceleration #4551