kas-gui / kas

Another GUI toolkit
Apache License 2.0
893 stars 27 forks source link

Panic on Linux Wayland #367

Closed Tim-Paik closed 1 year ago

Tim-Paik commented 1 year ago

command:

RUST_LOG=warn RUST_BACKTRACE=1 cargo run --release --example counter

I extracted some logs that may be useful:

[2022-12-17T09:09:17Z WARN  wgpu_hal::vulkan::instance] Disabling presentation on 'Intel(R) UHD Graphics 630 (CFL GT2)' (id 0x564f61c4f9d0) because of NV Optimus (on Linux)
[2022-12-17T09:09:17Z WARN  wgpu_hal::gles::adapter] Detected skylake derivative running on mesa i915. Clears to srgb textures will use manual shader clears.
[2022-12-17T09:09:17Z WARN  naga::front::spv] Unknown decoration Block
[2022-12-17T09:09:17Z WARN  naga::front::spv] Unknown decoration RelaxedPrecision
[2022-12-17T09:09:17Z WARN  naga::front::spv] Unused item decorations: **Too many, omitted here**
[2022-12-17T09:09:17Z WARN  wgpu_hal::gles::egl] Re-initializing Gles context due to Wayland window
thread 'main' panicked at 'Error in Surface::configure: surface does not support the adapter's queue family', /home/********/.cargo/registry/src/mirrors.****.com/wgpu-0.14.2/src/backend/direct.rs:274:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
   2: wgpu::backend::direct::Context::handle_error_fatal
   3: <wgpu::backend::direct::Context as wgpu::Context>::surface_configure
   4: kas_wgpu::window::Window<C,T>::new
   5: kas_wgpu::Toolkit<C,T>::add_boxed
   6: counter::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

System: Archlinux DE: GNOME + Wayland

If anything else is needed, I'm happy to provide it

dhardy commented 1 year ago

Looks like wgpu failed. It requires some form of accelerated graphics support (see its readme for details; you may also try wgpu examples).

KAS doesn't currently have any alternative (see #287), so unless you can confirm that WGPU examples work on the same setup there's nothing to do here, sorry.

Tim-Paik commented 1 year ago

wgpu

I'm well aware, so I just raised an issue in KAS after I tried the wgpu's example. I also tried to check the code of KAS, but I couldn't find any difference from the demo I wrote myself. If I have to say something different, it may be this:

let adapter = instance
            .request_adapter(&wgpu::RequestAdapterOptions {
                power_preference: wgpu::PowerPreference::default(),
                force_fallback_adapter: false,
                compatible_surface: Some(&surface),
            })
            .await
            .unwrap();

I set compatible_surface to Some(&surface) instead of None, because that would cause surface.get_supported_formats(&adapter) to be of length 0

@dhardy

dhardy commented 1 year ago

Doing the same in KAS is awkward because the adapter is shared and created before any surface or window. It's not immediately clear how to solve this — delay construction of the adapter, or perhaps changes to wgpu are required?

Tim-Paik commented 1 year ago

Because it panics at surface.configure, I’m actually not sure if it’s an error caused by compatible_surface (even though it does panic in my demo) I tried a few things but couldn't solve this problem. Perhaps, these things should not be shared? For example, the scale_factor of each screen may be different, the window should adapt to the scale_factor of the current screen? I think having an Instance per window might be the right thing to do

Tim-Paik commented 1 year ago

Now that I've identified the culprit as compatible_surface: None, I reproduced it with a small demo:

fn main() {
    env_logger::init();
    pollster::block_on(run());
}

pub async fn run() {
    let event_loop = winit::event_loop::EventLoop::new();
    let window = winit::window::WindowBuilder::new()
        .build(&event_loop)
        .unwrap();
    let instance = wgpu::Instance::new(wgpu::Backends::all());
    let surface = unsafe { instance.create_surface(&window) };
    let adapter = instance
        .request_adapter(&wgpu::RequestAdapterOptions {
            compatible_surface: Some(&surface), /* Change this to None to cause Panic */
            ..Default::default()
        })
        .await
        .unwrap();
    let (device, _queue) = adapter
        .request_device(&wgpu::DeviceDescriptor::default(), None)
        .await
        .unwrap();

    let config = wgpu::SurfaceConfiguration {
        usage: wgpu::TextureUsages::RENDER_ATTACHMENT,
        format: wgpu::TextureFormat::Rgba8UnormSrgb,
        width: 800,
        height: 600,
        present_mode: wgpu::PresentMode::Fifo,
        alpha_mode: wgpu::CompositeAlphaMode::Auto,
    };
    surface.configure(&device, &config);
}
dhardy commented 1 year ago

For example, the scale_factor of each screen may be different

It's not "shared", it's simply a hint of the last known value. The window's target size must be specified when constructing it, which can only be calculated with a known scale factor (unless you use Wayland's method of scaling up from an "unscaled size" which requires re-sizing and isn't quite accurate). Both options are rough hacks.

So it looks like your issue may be solvable by delaying request_adapter until the first window is needed — another hack.

dhardy commented 1 year ago

I guess there are two lessons out of this:

  1. Multiple adapters are often available (e.g. on my system, Vulkan/Radeon, Vulkan/LLVMpipe, GL/Radeon). instance.request_adapter is a helper to select an appropriate adapter (e.g. using power preferences), and as a convenience it can also filter by compatibility with a given surface. Ideally we'd either delay adapter choice until we have a surface (not easy in KAS) or re-write this logic with additional constraints, though this may be unnecessary (see second point).
  2. When constructing a surface, we should check adapter capabilities and potentially alter the surface (e.g. format, present mode). The present mode should in any case be selectable.
Tim-Paik commented 1 year ago

Since I'm new to rust, if you solve this problem, please also tell me how, I'd like to learn!

dhardy commented 1 year ago

@Tim-Paik from what I understand:

I don't exactly understand what this message means; it could be due to the TextureUsages.

Could you please set compatible_surface: None then tweak the SurfaceConfiguration to find what is/isn't supported? Thanks.

Tim-Paik commented 1 year ago

I do use a notebook with dual graphics cards, but I haven't installed software like optimus-manager.

I tried using other backends in my demo above, and it turns out that the reason for using GL is simple: everything else will panic

When I set KAS_BACKENDS to GL, things start to change:

timpaikx@timpaikx ~/t/kas (master) [SIGSEGV]> RUST_BACKTRACE=1 KAS_BACKENDS=GL cargo run --example hello
    Finished dev [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/examples/hello`
thread 'main' panicked at 'Error in Surface::configure: requested present mode Mailbox is not in the list of supported present modes: [Fifo]', /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.14.2/src/backend/direct.rs:274:9
stack backtrace:
   0: rust_begin_unwind
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
   2: wgpu::backend::direct::Context::handle_error_fatal
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.14.2/src/backend/direct.rs:274:9
   3: <wgpu::backend::direct::Context as wgpu::Context>::surface_configure
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.14.2/src/backend/direct.rs:1017:13
   4: wgpu::Surface::configure
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.14.2/src/lib.rs:3715:9
   5: kas_wgpu::window::Window<C,T>::new
             at ./crates/kas-wgpu/src/window.rs:144:9
   6: kas_wgpu::Toolkit<C,T>::add_boxed
             at ./crates/kas-wgpu/src/lib.rs:228:19
   7: kas_wgpu::Toolkit<C,T>::with
             at ./crates/kas-wgpu/src/lib.rs:221:9
   8: hello::main
             at ./examples/hello.rs:14:5
   9: core::ops::function::FnOnce::call_once
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fish: Job 1, 'RUST_BACKTRACE=1 KAS_BACKENDS=G…' terminated by signal SIGSEGV (Address boundary error)

I tried to get it in my demo, in wgpu::PresentMode, AutoNoVsync/AutoVsync/Fifo can run normally (GL backend)

I tried changing this to Fifo: https://github.com/kas-gui/kas/blob/3c41cc9a8eb673d2631c6b00ae25af21cbd403c0/crates/kas-wgpu/src/window.rs#L141

then

timpaikx@timpaikx ~/t/kas (master) [SIGSEGV]> RUST_BACKTRACE=1 KAS_BACKENDS=GL cargo run --example hello
    Finished dev [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/examples/hello`
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: BadDisplay', /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-hal-0.14.1/src/gles/egl.rs:294:14
stack backtrace:
   0: rust_begin_unwind
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/panicking.rs:65:14
   2: core::result::unwrap_failed
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/result.rs:1791:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/result.rs:1113:23
   4: wgpu_hal::gles::egl::EglContext::make_current
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-hal-0.14.1/src/gles/egl.rs:292:9
   5: wgpu_hal::gles::egl::AdapterContext::lock::{{closure}}
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-hal-0.14.1/src/gles/egl.rs:406:13
   6: core::option::Option<T>::map
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/option.rs:925:29
   7: wgpu_hal::gles::egl::AdapterContext::lock
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-hal-0.14.1/src/gles/egl.rs:405:19
   8: wgpu_hal::gles::egl::Surface::unconfigure_impl
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-hal-0.14.1/src/gles/egl.rs:1019:19
   9: <wgpu_hal::gles::egl::Surface as wgpu_hal::Surface<wgpu_hal::gles::Api>>::configure
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-hal-0.14.1/src/gles/egl.rs:1046:42
  10: wgpu_core::device::<impl wgpu_core::hub::Global<G>>::surface_configure
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.14.2/src/device/mod.rs:5220:17
  11: <wgpu::backend::direct::Context as wgpu::Context>::surface_configure
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.14.2/src/backend/direct.rs:1015:13
  12: wgpu::Surface::configure
             at /home/timpaikx/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.14.2/src/lib.rs:3715:9
  13: kas_wgpu::window::Window<C,T>::new
             at ./crates/kas-wgpu/src/window.rs:144:9
  14: kas_wgpu::Toolkit<C,T>::add_boxed
             at ./crates/kas-wgpu/src/lib.rs:228:19
  15: kas_wgpu::Toolkit<C,T>::with
             at ./crates/kas-wgpu/src/lib.rs:221:9
  16: hello::main
             at ./examples/hello.rs:14:5
  17: core::ops::function::FnOnce::call_once
             at /rustc/69f9c33d71c871fc16ac445211281c6e7a340943/library/core/src/ops/function.rs:251:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fish: Job 1, 'RUST_BACKTRACE=1 KAS_BACKENDS=G…' terminated by signal SIGSEGV (Address boundary error)
dhardy commented 1 year ago

requested present mode Mailbox is not in the list of supported present modes: [Fifo]

I used Mailbox to work around a different bug, but probably Fifo is the better supported mode.

Result::unwrap()on anErr` value: BadDisplay

No idea what that is...


But, did you see my previous question about tweaking SurfaceConfiguration from your example above? I'd still like to know what does/doesn't work.

Tim-Paik commented 1 year ago

Among other parameters, alpha_mode seems to only support [Opaque] (Auto) , format supports [Rgba8UnormSrgb, Bgra8UnormSrgb, Rgba16Float], usage only supports RENDER_ATTACHMENT.

If you want to use Mailbox, can AutoNoVsync solve that problem?

dhardy commented 1 year ago

Among other parameters, alpha_mode seems to only support [Opaque] (Auto) , format supports [Rgba8UnormSrgb, Bgra8UnormSrgb, Rgba16Float], usage only supports RENDER_ATTACHMENT.

These are what KAS uses anyway.

If you want to use Mailbox

This was selected to work around a bug; Fifo should work fine (aside from hang on exit after multiple windows are created on X11). So looks like this is the same issue as #368.