parasyte / pixels

A tiny hardware-accelerated pixel frame buffer. πŸ¦€
https://docs.rs/pixels
MIT License
1.8k stars 123 forks source link

render() call causes a Vulkan crash [Ubuntu, AMD, i3, Mesa] #242

Closed aegroto closed 2 years ago

aegroto commented 2 years ago

I'm running Ubuntu 20.04, with i3 and open source Mesa drivers. I'm not able to run a simple pixels application, the render() call causes a vulkan failure.

The source code of the program may be found here: https://github.com/aegroto/remotia/blob/winit-port/examples/pixels_example/main.rs

The 'master' branch of this project works, but relies on an older version of beryllium. I tried removing that layer and using bare winit, but the problem persisted. I guess it may be due to some incompatibilities with i3 or my own drivers, and I would like to fix it myself, but I have no idea on how to debug that error or on what may cause it.

Full logs:

    Finished release [optimized] target(s) in 0.12s
     Running `target/release/examples/pixels_example`
[2021-12-09T12:12:17Z INFO  winit::platform_impl::platform::x11::window] Guessed window scale factor: 1.25
[2021-12-09T12:12:17Z INFO  wgpu_hal::vulkan::instance] Instance version: 0x402083
[2021-12-09T12:12:17Z INFO  wgpu_hal::vulkan::instance] Enabling device properties2
[2021-12-09T12:12:20Z INFO  wgpu_core::instance] Adapter Vulkan AdapterInfo { name: "AMD RADV RAVEN", vendor: 4098, device: 5592, device_type: IntegratedGpu, backend: Vulkan }
[2021-12-09T12:12:20Z INFO  wgpu_hal::vulkan::adapter] Private capabilities: PrivateCapabilities { flip_y_requires_shift: true, imageless_framebuffers: true, image_view_usage: true, timeline_semaphores: true, texture_d24: false, texture_d24_s8: false, can_present: true, non_coherent_map_mask: 63, robust_buffer_access: true, robust_image_access: true }
[2021-12-09T12:12:20Z INFO  wgpu_core::device] Created texture Valid((0, 1, Vulkan)) with TextureDescriptor { label: Some("pixels_source_texture"), size: Extent3d { width: 320, height: 240, depth_or_array_layers: 1 }, mip_level_count: 1, sample_count: 1, dimension: D2, format: Rgba8UnormSrgb, usage: COPY_DST | TEXTURE_BINDING }
[2021-12-09T12:12:20Z INFO  wgpu_core::device] Created buffer Valid((0, 1, Vulkan)) with BufferDescriptor { label: Some("pixels_scaling_renderer_vertex_buffer"), size: 48, usage: VERTEX, mapped_at_creation: true }
[2021-12-09T12:12:20Z INFO  wgpu_core::device] Created buffer Valid((1, 1, Vulkan)) with BufferDescriptor { label: Some("pixels_scaling_renderer_matrix_uniform_buffer"), size: 64, usage: COPY_DST | UNIFORM, mapped_at_creation: true }
[2021-12-09T12:12:20Z INFO  wgpu_core::device] configuring surface with SurfaceConfiguration { usage: RENDER_ATTACHMENT, format: Bgra8UnormSrgb, width: 400, height: 300, present_mode: Fifo }
thread 'main' panicked at 'Error in Surface::present: parent device is lost', /home/lorenzo/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.11.1/src/backend/direct.rs:204:9
stack backtrace:
   0:     0x558a23cba4bc - std::backtrace_rs::backtrace::libunwind::trace::hc6c3491277866fea
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x558a23cba4bc - std::backtrace_rs::backtrace::trace_unsynchronized::h4524f073368a5b13
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x558a23cba4bc - std::sys_common::backtrace::_print_fmt::h0d0cace6159902af
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x558a23cba4bc - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h3e6af6f05919a7fc
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x558a23cdc0dc - core::fmt::write::h72801a82c94e6ff1
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/core/src/fmt/mod.rs:1149:17
   5:     0x558a23cb7135 - std::io::Write::write_fmt::ha4f5d34aaccbac84
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/io/mod.rs:1697:15
   6:     0x558a23cbbe20 - std::sys_common::backtrace::_print::heed69f5ce9a8e189
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x558a23cbbe20 - std::sys_common::backtrace::print::h5f3918bd80c09252
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x558a23cbbe20 - std::panicking::default_hook::{{closure}}::h5af30648530eb3d0
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panicking.rs:211:50
   9:     0x558a23cbb9cb - std::panicking::default_hook::he88d5fb1ba1b4c19
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panicking.rs:228:9
  10:     0x558a23cbc4d4 - std::panicking::rust_panic_with_hook::h01febc308b2b313b
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panicking.rs:606:17
  11:     0x558a23cbbfb0 - std::panicking::begin_panic_handler::{{closure}}::h24a6d13f5560b71f
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panicking.rs:502:13
  12:     0x558a23cba964 - std::sys_common::backtrace::__rust_end_short_backtrace::h3e2917f0da9fbc5c
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/sys_common/backtrace.rs:139:18
  13:     0x558a23cbbf19 - rust_begin_unwind
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panicking.rs:498:5
  14:     0x558a238abbb1 - core::panicking::panic_fmt::h7b8580d81fcbbacd
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/core/src/panicking.rs:106:14
  15:     0x558a23af5e72 - wgpu::backend::direct::Context::handle_error_fatal::hecdc4b1d25555d89
  16:     0x558a23af6ca5 - <wgpu::backend::direct::Context as wgpu::Context>::surface_present::hbea9471fdb62de30
  17:     0x558a23ab47b0 - wgpu::SurfaceTexture::present::h41507557c18e4d3d
  18:     0x558a23a54a6b - pixels::Pixels::render::h855c4a629cb48033
  19:     0x558a238fa2db - pixels_example::main::h6c22c3c20bbe7c48
  20:     0x558a238ae186 - std::sys_common::backtrace::__rust_begin_short_backtrace::hfb07b7064d13a8c4
  21:     0x558a238c1ef5 - std::rt::lang_start::{{closure}}::h625b9fc901241467
  22:     0x558a23cb9d01 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h6743157f0325d450
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/core/src/ops/function.rs:259:13
  23:     0x558a23cb9d01 - std::panicking::try::do_call::hc65378359d322d46
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panicking.rs:406:40
  24:     0x558a23cb9d01 - std::panicking::try::h52b83ca0140efb28
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panicking.rs:370:19
  25:     0x558a23cb9d01 - std::panic::catch_unwind::h0ba25f4b0d3448dc
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panic.rs:133:14
  26:     0x558a23cb9d01 - std::rt::lang_start_internal::{{closure}}::ha65f28100c5ad390
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/rt.rs:128:48
  27:     0x558a23cb9d01 - std::panicking::try::do_call::h5db5edfaf5b749d9
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panicking.rs:406:40
  28:     0x558a23cb9d01 - std::panicking::try::h62409771d6cd0419
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panicking.rs:370:19
  29:     0x558a23cb9d01 - std::panic::catch_unwind::h386261fb8f018fab
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/panic.rs:133:14
  30:     0x558a23cb9d01 - std::rt::lang_start_internal::h699f3530566c1833
                               at /rustc/baba6687df3e83fdb15cc6ec239b4a1c75a30505/library/std/src/rt.rs:128:20
  31:     0x558a238fa6c2 - main
  32:     0x7f623313f0b3 - __libc_start_main
  33:     0x558a238ac32e - _start
  34:                0x0 - <unknown>
[2021-12-09T12:12:20Z INFO  wgpu_core::hub] Dropping Global
parasyte commented 2 years ago

Hi! Thanks for the report.

I'm tentatively labeling this issue as an upstream-bug, since that is most likely to be the case. I haven't been able to build your project (on Windows 11) because it wants OpenSSL and I'm a little hesitant to install the 85th incarnation of heartbleed.

FWIW, here's the build error on my system. ``` error: failed to run custom build command for `openssl-sys v0.9.71` Caused by: process didn't exit successfully: `C:\Users\jay\other-projects\remotia\target\debug\build\openssl-sys-21d63f36006bc9de\build-script-main` (exit code: 101) --- stdout cargo:rustc-cfg=const_fn cargo:rerun-if-env-changed=X86_64_PC_WINDOWS_MSVC_OPENSSL_NO_VENDOR X86_64_PC_WINDOWS_MSVC_OPENSSL_NO_VENDOR unset cargo:rerun-if-env-changed=OPENSSL_NO_VENDOR OPENSSL_NO_VENDOR unset running "perl" "./Configure" "--prefix=C:\\Users\\jay\\other-projects\\remotia\\target\\debug\\build\\openssl-sys-acfb4bb755dab41f\\out\\openssl-build\\install" "no-dso" "no-shared" "no-ssl3" "no-tests" "no-comp" "no-zlib" "no-zlib-dynamic" "--libdir=lib" "no-legacy" "no-md2" "no-rc5" "no-weak-ssl-ciphers" "no-camellia" "no-idea" "no-seed" "no-capieng" "no-asm" "VC-WIN64A" --- stderr Can't locate Locale/Maketext/Simple.pm in @INC (you may need to install the Locale::Maketext::Simple module) (@INC contains: /c/Users/jay/other-projects/remotia/target/debug/build/openssl-sys-acfb4bb755dab41f/out/openssl-build/build/src/util/perl /usr/lib/perl5/site_perl /usr/share/perl5/site_perl /usr/lib/perl5/vendor_perl /usr/share/perl5/vendor_perl /usr/lib/perl5/core_perl /usr/share/perl5/core_perl /c/Users/jay/other-projects/remotia/target/debug/build/openssl-sys-acfb4bb755dab41f/out/openssl-build/build/src/external/perl/Text-Template-1.56/lib) at /usr/share/perl5/core_perl/Params/Check.pm line 6. BEGIN failed--compilation aborted at /usr/share/perl5/core_perl/Params/Check.pm line 6. Compilation failed in require at /usr/share/perl5/core_perl/IPC/Cmd.pm line 59. BEGIN failed--compilation aborted at /usr/share/perl5/core_perl/IPC/Cmd.pm line 59. Compilation failed in require at /c/Users/jay/other-projects/remotia/target/debug/build/openssl-sys-acfb4bb755dab41f/out/openssl-build/build/src/util/perl/OpenSSL/config.pm line 18. BEGIN failed--compilation aborted at /c/Users/jay/other-projects/remotia/target/debug/build/openssl-sys-acfb4bb755dab41f/out/openssl-build/build/src/util/perl/OpenSSL/config.pm line 18. Compilation failed in require at ./Configure line 23. BEGIN failed--compilation aborted at ./Configure line 23. thread 'main' panicked at ' Error configuring OpenSSL build: Command: "perl" "./Configure" "--prefix=C:\\Users\\jay\\other-projects\\remotia\\target\\debug\\build\\openssl-sys-acfb4bb755dab41f\\out\\openssl-build\\install" "no-dso" "no-shared" "no-ssl3" "no-tests" "no-comp" "no-zlib" "no-zlib-dynamic" "--libdir=lib" "no-legacy" "no-md2" "no-rc5" "no-weak-ssl-ciphers" "no-camellia" "no-idea" "no-seed" "no-capieng" "no-asm" "VC-WIN64A" Exit status: exit code: 2 ', C:\Users\jay\.cargo\registry\src\github.com-1ecc6299db9ec823\openssl-src-300.0.2+3.0.0\src\lib.rs:492:13 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ```

I commented out the OpenSSL dev-dependencies, and now I'm fighting with the FFmpeg dep. I'll report back when I have something useful.

parasyte commented 2 years ago

😬 So I finally got a build to work. This was the magical incantation:

$ git clone https://github.com/microsoft/vcpkg
$ ./vcpkg/bootstrap-vcpkg.sh
$ ./vcpkg/vcpkg --triplet=x64-windows-static-md install ffmpeg
$ ./vcpkg/vcpkg --triplet=x64-windows-static-md install llvm
$ LIBCLANG_PATH=$PWD/vcpkg/packages/llvm_x64-windows-static-md/bin/ VCPKG_ROOT=$PWD/vcpkg cargo build --example pixels_example

And LLVM took 33 minutes to build on a 12-core CPU. πŸ™„

Anyway, now I can run it and...

$ LIBCLANG_PATH=$PWD/../vcpkg/packages/llvm_x64-windows-static-md/bin/ VCPKG_ROOT=$PWD/../vcpkg cargo run --example pixels_example
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target\debug\examples\pixels_example.exe`
$ echo $?
0

Oh, well, it didn't crash or panic, it just exits. Admittedly, this is the first time I'm looking at your code. It doesn't spin up the event loop, so AFAICT, this code is running as well as it can on my system. I'm running Windows 11 with an RTX 3090 (and the Vulkan backend).

I think the most relevant difference is that window handling in Windows is pretty much a synchronous process. Calling pixels.render() immediately is probably not going to cause too many problems. But with Wayland (and probably X11?) all window handling is asynchronous. which leads to some problematic behavior that violates expectations, like https://github.com/rust-windowing/winit/issues/2080

In short, you really need to run the event loop before you try to draw anything.

Out of curiosity, have you experienced the same error when running the pixels examples? If yes, then the issue may be outside of your control (like a driver issue). If our examples work, then the problem is in your code. And it's very likely related to not starting the event loop.

aegroto commented 2 years ago

Hello, I'm not able to run example due to a compilation error on the winit dependency (0.25.0). I think the problem is not present on Windows and it's strictly linked to my setup, and I assume the behaviour of the example on your trial is correct as there is no loop and so it's supposed to close immediately.

I'm able to run pixels using an old version of beryllium, you can check some code on the "master" branch to verify, and the way I call render() is not much different.

I already had the suspect this was not a bug related to the pixels crate but something on a lower level, could you please suggest me at who I may report this issue?

parasyte commented 2 years ago

I'm not convinced the issue is outside of your code. To begin, you should probably not be calling pixels.render() immediately after creating the pixel buffer: https://github.com/aegroto/remotia/blob/f9f718a6c31c6630f4f4e1522cb44bf1647bee5c/src/client/pipeline/waterfall.rs#L101

This function needs a surface that is ready to draw on. And as I mentioned in my last comment that is not always the case. It is also unusual to want to draw at this point, since there is nothing to show; you haven't populated the pixel buffer with any image (it defaults to transparent black).

The actual task of drawing involves recording a command buffer and sending it to the GPU to do the work. Depending on the present mode, the thread could wait on the GPU to finish. But if the GPU doesn't have a valid surface, then it can't really do anything other than raise an error. The precise error you receive is this one: https://github.com/gfx-rs/wgpu/blob/c1c855bb9812d0c1703bd112d09e3eba58e45b6b/wgpu-core/src/device/mod.rs#L2759-L2760

Unfortunately, this error is pretty nebulous. It looks like a catchall for "Something went wrong, and we don't know a better way to categorize it". Here's what the spec has to say: https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#devsandqueues-lost-device Regardless, it is consistent with attempting to draw to a surface that is not yet available.

Other than that, I can't comment on SDL2 or why it seems to do what you expect. I think it's the exception rather than the rule, in this case.

aegroto commented 2 years ago

I can say that, since two hours ago, the master branch started crashing as well on my machine. I'll try to figure out what's causing this issue and let you know.

aegroto commented 2 years ago

For completeness, this is the compilation output of minimal-winit on my system:

lorenzo@lorenzo-Alpha-15-A3DDK:/media/lorenzo/ext/repository/pixels$ cargo run --package minimal-winit
   Compiling cfg-if v1.0.0
   Compiling bitflags v1.3.2
   Compiling once_cell v1.8.0
   Compiling lazy_static v1.4.0
   Compiling smallvec v1.7.0
   Compiling scopeguard v1.1.0
   Compiling cty v0.2.2
   Compiling termcolor v1.1.2
   Compiling ttf-parser v0.6.2
   Compiling minimal-lexical v0.2.1
   Compiling unicode-width v0.1.9
   Compiling downcast-rs v1.2.0
   Compiling byteorder v1.4.3
   Compiling scoped-tls v1.0.0
   Compiling bit-vec v0.6.3
   Compiling same-file v1.0.6
   Compiling hexf-parse v0.2.1
   Compiling ab_glyph_rasterizer v0.1.5
   Compiling cfg-if v0.1.10
   Compiling bytemuck v1.7.2
   Compiling profiling v1.0.4
   Compiling arrayvec v0.7.2
   Compiling inplace_it v0.3.3
   Compiling renderdoc-sys v0.7.1
   Compiling glow v0.11.0
   Compiling copyless v0.1.5
   Compiling percent-encoding v2.1.0
   Compiling regex-syntax v0.6.25
   Compiling humantime v2.1.0
   Compiling pollster v0.2.4
   Compiling ahash v0.7.6
   Compiling libloading v0.7.2
   Compiling instant v0.1.12
   Compiling libloading v0.6.7
   Compiling raw-window-handle v0.4.2
   Compiling gpu-descriptor-types v0.1.1
   Compiling gpu-alloc-types v0.2.0
   Compiling wgpu-types v0.11.0
   Compiling lock_api v0.4.5
   Compiling codespan-reporting v0.11.1
   Compiling walkdir v2.3.2
   Compiling fxhash v0.2.1
   Compiling bit-set v0.5.2
   Compiling safe_arch v0.5.2
   Compiling libc v0.2.109
   Compiling log v0.4.14
   Compiling memchr v2.4.1
   Compiling crossbeam-utils v0.8.5
   Compiling dlib v0.5.0
   Compiling ash v0.33.3+1.2.191
   Compiling owned_ttf_parser v0.6.0
   Compiling dlib v0.4.2
   Compiling gpu-alloc v0.5.2
   Compiling num-traits v0.2.14
   Compiling memoffset v0.6.5
   Compiling wide v0.6.5
   Compiling wayland-sys v0.28.6
   Compiling rusttype v0.9.2
   Compiling crossbeam-channel v0.5.1
   Compiling crossbeam-queue v0.3.2
   Compiling nom v7.1.0
   Compiling aho-corasick v0.7.18
   Compiling crossbeam-epoch v0.9.5
   Compiling getrandom v0.2.3
   Compiling parking_lot_core v0.8.5
   Compiling nix v0.20.0
   Compiling raw-window-handle v0.3.4
   Compiling dirs-sys v0.3.6
   Compiling nix v0.18.0
   Compiling khronos-egl v4.1.0
   Compiling memmap2 v0.1.0
   Compiling mio v0.8.0
   Compiling mio v0.7.14
   Compiling x11-dl v2.19.1
   Compiling atty v0.2.14
   Compiling spirv v0.2.0+1.5.4
   Compiling crossbeam-deque v0.8.1
   Compiling parking_lot v0.11.2
   Compiling ultraviolet v0.8.1
   Compiling dirs v3.0.2
   Compiling regex v1.5.4
   Compiling xcursor v0.3.4
   Compiling thiserror v1.0.30
   Compiling hashbrown v0.11.2
   Compiling crossbeam v0.8.1
   Compiling xdg v2.4.0
   Compiling env_logger v0.9.0
   Compiling wayland-commons v0.28.6
   Compiling mio-misc v1.3.2
   Compiling andrew v0.3.1
   Compiling calloop v0.6.5
   Compiling indexmap v1.7.0
   Compiling gpu-descriptor v0.2.2
   Compiling wayland-client v0.28.6
   Compiling naga v0.7.2
   Compiling wayland-cursor v0.28.6
   Compiling wayland-protocols v0.28.6
   Compiling wgpu-hal v0.11.5
   Compiling smithay-client-toolkit v0.12.3
   Compiling wgpu-core v0.11.3
   Compiling winit v0.25.0
error[E0308]: mismatched types
   --> /home/lorenzo/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.25.0/src/platform_impl/linux/x11/mod.rs:188:53
    |
188 |         let queue = Arc::new(NotificationQueue::new(waker));
    |                                                     ^^^^^ expected struct `mio::waker::Waker`, found struct `mio::Waker`
    |
    = note: expected struct `Arc<mio::waker::Waker>`
               found struct `Arc<mio::Waker>`
    = note: perhaps two different versions of crate `mio` are being used?

For more information about this error, try `rustc --explain E0308`.
error: could not compile `winit` due to previous error
warning: build failed, waiting for other jobs to finish...
error: build failed
parasyte commented 2 years ago

Before you do anything else, make a copy of Cargo.lock ...

That build error looks like something that can be fixed by running cargo update... Just guessing. We don't commit the lock file any more, but Cargo does keep one in the repo root directory. I've seen issues with some dependencies causing breaking changes in minor releases, and some other weird behavior. But we do build and test on Linux in our CI, so we know that Linux builds are working as of 2 days ago: https://github.com/parasyte/pixels/runs/4451823614?check_suite_focus=true

What stands out to me is that you are building two copies of mio, and our CI build only has one. If you really want to investigate that, cargo tree can tell you what is pulling in that stray mio version. Your Cargo.lock backup can, as well.

aegroto commented 2 years ago

a run of 'cargo clean' and 'cargo update' solved the demo problem, thanks! I think this may be added in the README of each example, it may be necessary to run an update on different OS than Windows.

Also, changing the reference PPA from oibaf to kisak has solved the error, although something is still not working but I think it's related to my Xorg configuration, so I guess we can close this issue, thanks for all the support!