Open puckipedia opened 2 years ago
It's probably fairly easy, if you can get Linux/crosvm to stop crashing!
What does test.exe output when you run it? On my (non-drm) system, I get:
$ dune exec -- ./tests/test.exe
test.exe: [INFO] alloc: strides = 4, offsets = 0, host_size = 4096, blob_id = 4
test.exe: [INFO] probe_drm: FAILED - we cannot use video memory
[ runs without video memory ]
If used with video memory (by recompiling crosvm to support that), it usually crashes after a few seconds of resizing the window. I'd be very happy if someone figures out what's causing that!
(https://spectrum-os.org/lists/hyperkitty/list/discuss@spectrum-os.org/message/BK4WYBFHJ264NRFOJZBPYW6XV6RWSCGE/ has some more context, if you haven't seen it already)
..I think i figured out what happens, but not entirely how to solve it: you're running out of VRAM! KVM_RUN
returns EFAULT
shortly after allocating a memory buffer into the GPU (and I don't see it unmapping memory at all), and I *can* replicate this, it's just that my GPU has a lot more memory at its disposal (12GB).
..oh, actually, that might've been a bit of a wrong guess (i was not tracing unmaps at all): I found someone else with a very similar issue, and a patch that i can't tell how well it's supposed to work, but it does: https://www.collabora.com/news-and-blog/blog/2021/11/26/venus-on-qemu-enabling-new-virtual-vulkan-driver/#qcom1343 -- I applied this (backported) and now it seems to reliably not EFAULT crosvm :)
Ah, good find! I bet that's it, indeed. I seem to remember that it was OK when the image was smaller than one page.
I hooked up a Zwp_linux_dmabuf_v1
proxy (which took longer due to something in the ocaml-wayland stuff wanting a V2 wl_buffer for create_immed
), stubbed wl_drm
to make mesa happy, and got it to believe virgl was usable! And after patching crosvm to not depend on sandbox status for the external_blob flag, it actually got as far as to send the dmabuf all the way to the compositor, at which point it failed to import it....
I hooked up a Zwp_linux_dmabuf_v1 proxy
Wow, that was quick work!
which took longer due to something in the ocaml-wayland stuff wanting a V2 wl_buffer for
create_immed
Oh, yeah. When you create an object in Wayland it's supposed to have the same version as its parent. But then create_immed
in zwp_linux_buffer_params_v1
creates a wl_buffer
, which isn't even in the same specification! I guess you discovered cast_version
to override that.
it actually got as far as to send the dmabuf all the way to the compositor, at which point it failed to import it....
Interesting. I wonder if applications need to use a special libdrm plugin that understands the cross-domain protocol? I know very little about using GPUs though... I hope to get some time to learn how this stuff works soon!
I hooked up a Zwp_linux_dmabuf_v1 proxy
Wow, that was quick work!
Well, you did the hard work of figuring out how to transfer the dmabuf over the bridge to begin with, the code was literally just a 1:1 relay other than that :p
which took longer due to something in the ocaml-wayland stuff wanting a V2 wl_buffer for
create_immed
Oh, yeah. When you create an object in Wayland it's supposed to have the same version as its parent. But then
create_immed
inzwp_linux_buffer_params_v1
creates awl_buffer
, which isn't even in the same specification! I guess you discoveredcast_version
to override that.
I uh. absolutely struggled with where to apply cast_version
in the definition of the client->host
impl of create_immed
, so i decided to just postPatch
the ocaml-wayland
drv to drop that restriction, lol.
it actually got as far as to send the dmabuf all the way to the compositor, at which point it failed to import it....
Interesting. I wonder if applications need to use a special libdrm plugin that understands the cross-domain protocol? I know very little about using GPUs though... I hope to get some time to learn how this stuff works soon!
Well, there's the interesting thing: mesa
knows how to talk to virgl
over virtio-gpu
for crosvm (I believe, but haven't verified fully, this is how chromebooks get their GPU acceleration!), but: I had to patch minigbm to support amdgpu
and use virglrenderer built with it. The issue I hit also wasn't about transfering the GPU buffers, but something about their stride. I've got a patched Mesa on the compositor end, and stepped through, and I can see why it broke, but don't exactly understand how it's supposed to work, yet.. I'll probably enable more virgl debugging and see how far I can get :)
Manually fudging the stride in create_immed
seems to have worked, so I think the issue is likely not virtwl! it's probably in virgl or mesa, and both codebases are a bit bigger to go through. Based on my tracing so far, I think mesa inside the VM is likely responsible for the stride being dropped..
I uh. absolutely struggled with where to apply cast_version in the definition of the client->host impl of create_immed, so i decided to just postPatch the ocaml-wayland drv to drop that restriction, lol.
It's not easy! This is how I did it:
I see from the mli that I made the executive decision that the Wl_buffer
will be version 1. If they ever add a version 2 with extra events it will crash, but that seems unavoidable with the way they've written the spec.
An interesting issue i've found for sommelier is https://bugs.chromium.org/p/chromium/issues/detail?id=892242 - which seems to likely be the same issue i've had.
Did you ever figure out how to fix this? Looks like sommelier is broken too probably due to the same bug, if forwards the wayland stuff okay and I can see a window being created, but it doesn't have any content. And the guest kernel has this in it's dmesg
[ 125.172639] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x207)
Crosvm prints this if I enable debug logging
[2022-10-25T23:57:27.348741997+02:00 DEBUG devices::virtio::gpu] Some(CmdSubmit3d) -> ErrRutabaga(InvalidResourceId)
I added some logging of my own and figured out that sommelier creates two blob resources(GpuCommand::ResourceCreateBlob
) with consecutive resource IDs (say 327, and then 328)m and then the host tries to write(in CrossDomainContext::send
) to the next blob resource ID(329).
wayland-proxy-virtwl works okay I guess because it doesn't try to share the dmabufs.
Oh my findings were irrelevant, I think latest sommelier is trying to use the new "drm native contexts" feature here, and it will only work for qualcomm msm graphics for now.
@puckipedia do you mind sharing the patches you have so far?
Once this gets implemented, explicit synchronization also needs to be added. Implicit sync will work (except with Nvidia proprietary drivers), but it has worse performance and interferes with the ability of drivers to improve (which is why Nvidia doesn’t want to implement it).
@puckipedia want to work together on finishing this? Qubes OS needs this too.
I've been trialing
wayland-proxy-virtwl
today (for my spectrum-related project of getting it to support security contexts), and noticed that it doesn't do dmabuf passthrough at all yet, how much work do you suspect it'd be to add that in?As a note re #33, running
tests/test.exe
here was smooth with no glitches (on crosvm eefbf6da, 5.17.0 kernel)