talex5 / wayland-proxy-virtwl

Allow guest VMs to open windows on the host
Apache License 2.0
114 stars 12 forks source link

*ERROR* response 0x1200 (command 0x207) #53

Closed hissssst closed 1 year ago

hissssst commented 1 year ago

I've tried to set up your qubes-lite gitlab project, and after some tweaking came across this issue

When called in guest

user@untrusted ~> dmesg | grep drm
[    2.777042] ACPI: bus type drm_connector registered
[    2.807181] [drm] pci: virtio-gpu-pci detected at 0000:00:03.0
[    2.808565] [drm] Host memory window: 0x200000000 +0x200000000
[    2.809273] [drm] features: +virgl -edid +resource_blob +host_visible
[    2.809275] [drm] features: +context_init
[    2.811139] [drm] number of scanouts: 1
[    2.811540] [drm] number of cap sets: 2
[    2.870889] [drm] cap set 0: id 2, max-version 2, max-size 1376
[    2.871818] [drm] cap set 1: id 5, max-version 0, max-size 16
[    2.872593] [drm] Initialized virtio_gpu 0.1.0 0 for 0000:00:03.0 on minor 0
[    2.880611] virtio-pci 0000:00:03.0: [drm] drm_plane_enable_fb_damage_clips() not called
[    2.884879] virtio-pci 0000:00:03.0: [drm] fb0: virtio_gpudrmfb frame buffer device
user@untrusted ~> wayland-proxy-virtwl --virtio-gpu
[  407.524408] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x207)

But after this error, wayland-proxy-virtwl doesn't stop working and every try to run any wayland program generates the same error, like

user@untrusted ~> weston-terminal
[  407.524408] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x207)
[  407.524408] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x207)
[  407.524408] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x207)

I have pinned nixpkgs to be the same as my host version, otherwise loading mesa drivers failed with incompatible GLIBC_VERSION error

{
  inputs = {
    nixpkgs.url = github:nixos/nixpkgs/08e4dc3a907a6dfec8bb3bbf1540d8abbffea22b;

    flake-utils.url = "github:numtide/flake-utils";
    wayland-proxy-virtwl = {
      url = "github:talex5/wayland-proxy-virtwl";
      inputs.nixpkgs.follows = "nixpkgs";
      inputs.flake-utils.follows = "flake-utils";
    };
    crosvm = {
      url = "git+https://gitlab.com/talex5/crosvm.git?submodules=1";
      inputs.nixpkgs.follows = "nixpkgs";
    };
  };
}
alyssais commented 1 year ago

What compositor are you using on your host?

talex5 commented 1 year ago

I see these errors in my logs too from time to time. Not sure what causes it. 0x1200 is VIRTIO_GPU_RESP_ERR_UNSPEC, which is rather unhelpful:

https://gitlab.com/talex5/crosvm/-/blob/854971404634be44c47aeeff3ae2ca5b46ddd5cd/devices/src/virtio/gpu/protocol.rs#L82

crosvm maps various errors to that:

            GpuResponse::ErrUnspec => VIRTIO_GPU_RESP_ERR_UNSPEC,
            GpuResponse::ErrTube(_) => VIRTIO_GPU_RESP_ERR_UNSPEC,
            GpuResponse::ErrBase(_) => VIRTIO_GPU_RESP_ERR_UNSPEC,
            GpuResponse::ErrRutabaga(_) => VIRTIO_GPU_RESP_ERR_UNSPEC,
            GpuResponse::ErrDisplay(_) => VIRTIO_GPU_RESP_ERR_UNSPEC,
            GpuResponse::ErrMapping(_) => VIRTIO_GPU_RESP_ERR_UNSPEC,
            GpuResponse::ErrUdmabuf(_) => VIRTIO_GPU_RESP_ERR_UNSPEC,
            GpuResponse::ErrScanout { num_scanouts: _ } => VIRTIO_GPU_RESP_ERR_UNSPEC,

Probably worth adding some extra debug to log the original problem.

Does the proxy always fail for you, or just sometimes?

hissssst commented 1 year ago

What compositor are you using on your host?

$ sway --version
sway version 1.8.1

Not sure what causes it. 0x1200 is VIRTIO_GPU_RESP_ERR_UNSPEC, which is rather unhelpful:

I've traced it to. https://github.com/torvalds/linux/blob/bb7c241fae6228e89c0286ffd6f249b3b0dea225/drivers/gpu/drm/virtio/virtgpu_vq.c#L221 So yeah, it's one of those. Perhaps changing it to

            GpuResponse::ErrUnspec => VIRTIO_GPU_RESP_ERR_UNSPEC + 0x100,
            GpuResponse::ErrTube(_) => VIRTIO_GPU_RESP_ERR_UNSPEC + 0x101,
            GpuResponse::ErrBase(_) => VIRTIO_GPU_RESP_ERR_UNSPEC + 0x102,
            GpuResponse::ErrRutabaga(_) => VIRTIO_GPU_RESP_ERR_UNSPEC + 0x103,
            GpuResponse::ErrDisplay(_) => VIRTIO_GPU_RESP_ERR_UNSPEC + 0x104,
            GpuResponse::ErrMapping(_) => VIRTIO_GPU_RESP_ERR_UNSPEC + 0x105,
            GpuResponse::ErrUdmabuf(_) => VIRTIO_GPU_RESP_ERR_UNSPEC + 0x106,
            GpuResponse::ErrScanout { num_scanouts: _ } => VIRTIO_GPU_RESP_ERR_UNSPEC + 0x107,

may help to find the exact error

Probably worth adding some extra debug to log the original problem.

How do I do this?

Does the proxy always fail for you, or just sometimes?

Always fails

talex5 commented 1 year ago

I rebased my crosvm branch on version 113 and attempted to add some debugging here: https://gitlab.com/talex5/crosvm/-/commit/05a90f3e66cfcc584d15ee810cd117ac0d00b9a3

Does that do anything?

hissssst commented 1 year ago
[2023-05-14T22:01:26.985453558+04:00 ERROR devices::virtio::sys::unix::net] net: tx: failed to write frame to tap: Input/output error (os error 5)
[2023-05-14T22:01:26.992337430+04:00 ERROR devices::virtio::sys::unix::net] net: tx: failed to write frame to tap: Input/output error (os error 5)
[2023-05-14T22:01:27.040260463+04:00 WARN  devices::virtio::gpu::protocol] Returning VIRTIO_GPU_RESP_ERR_UNSPEC for error ErrRutabaga(IoError(Os { code: 13, kind: PermissionDenied, message: "Permission denied" }))
[    2.516511] [drm:virtio_gpu_dequeue_ctrl_func [virtio_gpu]] *ERROR* response 0x1200 (command 0x207)
hissssst commented 1 year ago

I think it is somehow related to me being unable to mount shared directory into crosvm as virtiofs. It fails with exactly same exception

talex5 commented 1 year ago

Well, it should be easy enough to trace crosvm and find out where the EACCES is coming from.

hissssst commented 1 year ago

How do I do this?

hissssst commented 1 year ago

I have reproduced it with this setup too: https://github.com/astro/microvm.nix

talex5 commented 1 year ago

How do I do this?

I'm not an expert on Rust or crosvm, but maybe strace on one of the processes would do it? Probably the parent process, since I disabled the GUI sandbox for debugging. Could be a permissions problem with the graphics device.

You might need to prevent it from using io_uring to give useful results, though. Not sure how to do that, but there's a use_uring function in common/cros_asyncv2/src/unix/io_driver/uring.rs that looks like it could be modified easily.

I have reproduced it with this setup too: https://github.com/astro/microvm.nix

Looks like an interesting project - I should probably use that instead of my hacky scripts!

hissssst commented 1 year ago

I've managed to get it running. For some reason, running your script as a root created some permission problems, while running it as a user was impossible because mktuntap required cap_net_admin.

I've managed to wrap cap_net_admin with nixos's builtin capability wrapper and running your script as a regular user was successful without any rutabaga exception.

scrot_230520165352

However, I am still experiencing these exceptions, though UI works correctly:

[2023-05-20T16:53:31.538355063+04:00 ERROR devices::virtio::sys::unix::net] net: tx: failed to write frame to tap: Input/output error (os error 5)
talex5 commented 1 year ago

Ah, I create the tap devices in my configuration.nix. mktuntap just opens them. See the example at https://roscidus.com/blog/blog/2021/03/07/qubes-lite-with-kvm-and-wayland/#thoughts-on-nixos.