talex5 / wayland-proxy-virtwl

Allow guest VMs to open windows on the host
Apache License 2.0
115 stars 12 forks source link

wl_drm/dmabuf support #43

Open puckipedia opened 2 years ago

puckipedia commented 2 years ago

I've been trialing wayland-proxy-virtwl today (for my spectrum-related project of getting it to support security contexts), and noticed that it doesn't do dmabuf passthrough at all yet, how much work do you suspect it'd be to add that in?

As a note re #33, running tests/test.exe here was smooth with no glitches (on crosvm eefbf6da, 5.17.0 kernel)

talex5 commented 2 years ago

It's probably fairly easy, if you can get Linux/crosvm to stop crashing!

What does test.exe output when you run it? On my (non-drm) system, I get:

$ dune exec -- ./tests/test.exe
test.exe: [INFO] alloc: strides = 4, offsets = 0, host_size = 4096, blob_id = 4
test.exe: [INFO] probe_drm: FAILED - we cannot use video memory
[ runs without video memory ]

If used with video memory (by recompiling crosvm to support that), it usually crashes after a few seconds of resizing the window. I'd be very happy if someone figures out what's causing that!

(https://spectrum-os.org/lists/hyperkitty/list/discuss@spectrum-os.org/message/BK4WYBFHJ264NRFOJZBPYW6XV6RWSCGE/ has some more context, if you haven't seen it already)

puckipedia commented 2 years ago

..I think i figured out what happens, but not entirely how to solve it: you're running out of VRAM! KVM_RUN returns EFAULT shortly after allocating a memory buffer into the GPU (and I don't see it unmapping memory at all), and I *can* replicate this, it's just that my GPU has a lot more memory at its disposal (12GB).

puckipedia commented 2 years ago

..oh, actually, that might've been a bit of a wrong guess (i was not tracing unmaps at all): I found someone else with a very similar issue, and a patch that i can't tell how well it's supposed to work, but it does: https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343 -- I applied this (backported) and now it seems to reliably not EFAULT crosvm :)

talex5 commented 2 years ago

Ah, good find! I bet that's it, indeed. I seem to remember that it was OK when the image was smaller than one page.

puckipedia commented 2 years ago

I hooked up a Zwp_linux_dmabuf_v1 proxy (which took longer due to something in the ocaml-wayland stuff wanting a V2 wl_buffer for create_immed), stubbed wl_drm to make mesa happy, and got it to believe virgl was usable! And after patching crosvm to not depend on sandbox status for the external_blob flag, it actually got as far as to send the dmabuf all the way to the compositor, at which point it failed to import it....

talex5 commented 2 years ago

I hooked up a Zwp_linux_dmabuf_v1 proxy

Wow, that was quick work!

which took longer due to something in the ocaml-wayland stuff wanting a V2 wl_buffer for create_immed

Oh, yeah. When you create an object in Wayland it's supposed to have the same version as its parent. But then create_immed in zwp_linux_buffer_params_v1 creates a wl_buffer, which isn't even in the same specification! I guess you discovered cast_version to override that.

it actually got as far as to send the dmabuf all the way to the compositor, at which point it failed to import it....

Interesting. I wonder if applications need to use a special libdrm plugin that understands the cross-domain protocol? I know very little about using GPUs though... I hope to get some time to learn how this stuff works soon!

puckipedia commented 2 years ago

I hooked up a Zwp_linux_dmabuf_v1 proxy

Wow, that was quick work!

Well, you did the hard work of figuring out how to transfer the dmabuf over the bridge to begin with, the code was literally just a 1:1 relay other than that :p

which took longer due to something in the ocaml-wayland stuff wanting a V2 wl_buffer for create_immed

Oh, yeah. When you create an object in Wayland it's supposed to have the same version as its parent. But then create_immed in zwp_linux_buffer_params_v1 creates a wl_buffer, which isn't even in the same specification! I guess you discovered cast_version to override that.

I uh. absolutely struggled with where to apply cast_version in the definition of the client->host impl of create_immed, so i decided to just postPatch the ocaml-wayland drv to drop that restriction, lol.

it actually got as far as to send the dmabuf all the way to the compositor, at which point it failed to import it....

Interesting. I wonder if applications need to use a special libdrm plugin that understands the cross-domain protocol? I know very little about using GPUs though... I hope to get some time to learn how this stuff works soon!

Well, there's the interesting thing: mesa knows how to talk to virgl over virtio-gpu for crosvm (I believe, but haven't verified fully, this is how chromebooks get their GPU acceleration!), but: I had to patch minigbm to support amdgpu and use virglrenderer built with it. The issue I hit also wasn't about transfering the GPU buffers, but something about their stride. I've got a patched Mesa on the compositor end, and stepped through, and I can see why it broke, but don't exactly understand how it's supposed to work, yet.. I'll probably enable more virgl debugging and see how far I can get :)

puckipedia commented 2 years ago

Manually fudging the stride in create_immed seems to have worked, so I think the issue is likely not virtwl! it's probably in virgl or mesa, and both codebases are a bit bigger to go through. Based on my tracing so far, I think mesa inside the VM is likely responsible for the stride being dropped..

talex5 commented 2 years ago

I uh. absolutely struggled with where to apply cast_version in the definition of the client->host impl of create_immed, so i decided to just postPatch the ocaml-wayland drv to drop that restriction, lol.

It's not easy! This is how I did it:

https://github.com/talex5/wayland-proxy-virtwl/blob/fa566c85ce7f75e98d66b9fe617dcc4d565cd48a/virtio_gpu/wayland_dmabuf.ml#L84-L91

I see from the mli that I made the executive decision that the Wl_buffer will be version 1. If they ever add a version 2 with extra events it will crash, but that seems unavoidable with the way they've written the spec.

puckipedia commented 2 years ago

An interesting issue i've found for sommelier is https://bugs.chromium.org/p/chromium/issues/detail?id=892242 - which seems to likely be the same issue i've had.

madushan1000 commented 2 years ago

Did you ever figure out how to fix this? Looks like sommelier is broken too probably due to the same bug, if forwards the wayland stuff okay and I can see a window being created, but it doesn't have any content. And the guest kernel has this in it's dmesg

[  125.172639] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x207)

Crosvm prints this if I enable debug logging

[2022-10-25T23:57:27.348741997+02:00 DEBUG devices::virtio::gpu] Some(CmdSubmit3d) -> ErrRutabaga(InvalidResourceId)

I added some logging of my own and figured out that sommelier creates two blob resources(GpuCommand::ResourceCreateBlob) with consecutive resource IDs (say 327, and then 328)m and then the host tries to write(in CrossDomainContext::send) to the next blob resource ID(329).

wayland-proxy-virtwl works okay I guess because it doesn't try to share the dmabufs.

madushan1000 commented 2 years ago

Oh my findings were irrelevant, I think latest sommelier is trying to use the new "drm native contexts" feature here, and it will only work for qualcomm msm graphics for now.

DemiMarie commented 5 months ago

@puckipedia do you mind sharing the patches you have so far?

DemiMarie commented 5 months ago

Once this gets implemented, explicit synchronization also needs to be added. Implicit sync will work (except with Nvidia proprietary drivers), but it has worse performance and interferes with the ability of drivers to improve (which is why Nvidia doesn’t want to implement it).

DemiMarie commented 5 months ago

@puckipedia want to work together on finishing this? Qubes OS needs this too.