vulkano-rs / vulkano

Safe and rich Rust wrapper around the Vulkan API
Apache License 2.0
4.53k stars 436 forks source link

STATUS_ACCESS_VIOLATION on AMD GPU #2390

Closed Michaelschnabel-DM closed 1 year ago

Michaelschnabel-DM commented 1 year ago

Template

If you dont understand something just leave it. If you can provide more detailed information than the template allows for, please ignore the template and present all of your findings.

Issue

image

Got an STATUS_ACCESS_VIOLATION when trying to run the both triangle examples.

marc0246 commented 1 year ago

And this doesn't happen in 0.33?

Michaelschnabel-DM commented 1 year ago

yes, tag v0.33.0 works totally fine

marc0246 commented 1 year ago

Alright. In that case, since I can't reproduce this myself, would you be willing to do a git bisect? I can give moral support.

Michaelschnabel-DM commented 1 year ago

sure, ill have more time to investigate tomorrow

Michaelschnabel-DM commented 1 year ago

Looks like i found the commit with bisect: b7679f8bbbcb62b41793dfaca243546ccf2879a0 is the first bad commit commit b7679f8bbbcb62b41793dfaca243546ccf2879a0 Author: Rua ruawhitepaw@gmail.com Date: Tue Apr 18 20:53:08 2023 +0200

Rewrite shader and specialization handling in pipelines (#2181)

* Rewrite shader and specialization handling in pipelines

* Make the shader loading in examples a bit cleaner

* Forgot some

* Fix incorrect color blend states in examples

* Nicer fix

* Use mem::discriminant

Co-authored-by: marc0246 <40955683+marc0246@users.noreply.github.com>

---------

Co-authored-by: marc0246 <40955683+marc0246@users.noreply.github.com>

examples/src/bin/basic-compute-shader.rs | 9 +- examples/src/bin/buffer-allocator.rs | 29 +- .../bin/deferred/frame/ambient_lighting_system.rs | 23 +- .../deferred/frame/directional_lighting_system.rs | 23 +- .../bin/deferred/frame/point_lighting_system.rs | 23 +- examples/src/bin/deferred/triangle_draw_system.rs | 25 +- examples/src/bin/dynamic-buffers.rs | 9 +- examples/src/bin/dynamic-local-size.rs | 29 +- examples/src/bin/gl-interop.rs | 25 +- examples/src/bin/image-self-copy-blit/main.rs | 23 +- examples/src/bin/image/main.rs | 22 +- examples/src/bin/immutable-sampler/main.rs | 22 +- examples/src/bin/indirect.rs | 37 +- examples/src/bin/instancing.rs | 29 +- .../fractal_compute_pipeline.rs | 9 +- .../interactive_fractal/pixels_draw_pipeline.rs | 26 +- examples/src/bin/msaa-renderpass.rs | 24 +- examples/src/bin/multi-window.rs | 29 +- .../bin/multi_window_game_of_life/game_of_life.rs | 9 +- .../bin/multi_window_game_of_life/pixels_draw.rs | 26 +- examples/src/bin/multiview.rs | 29 +- examples/src/bin/occlusion-query.rs | 29 +- examples/src/bin/pipeline-caching.rs | 9 +- examples/src/bin/push-constants.rs | 9 +- examples/src/bin/push-descriptors/main.rs | 22 +- examples/src/bin/runtime-shader/main.rs | 24 +- examples/src/bin/runtime_array/main.rs | 38 +- examples/src/bin/self-copy-buffer.rs | 9 +- examples/src/bin/shader-include/main.rs | 9 +- examples/src/bin/shader-types-sharing.rs | 35 +- examples/src/bin/simple-particles.rs | 35 +- examples/src/bin/specialization-constants.rs | 20 +- examples/src/bin/teapot/main.rs | 48 +- examples/src/bin/tessellation.rs | 46 +- examples/src/bin/texture_array/main.rs | 22 +- examples/src/bin/triangle-v1_3.rs | 71 +- examples/src/bin/triangle.rs | 55 +- vulkano-shaders/src/codegen.rs | 15 +- vulkano-shaders/src/entry_point.rs | 93 +- vulkano-shaders/src/lib.rs | 43 +- vulkano-shaders/src/structs.rs | 154 - vulkano/src/pipeline/cache.rs | 79 +- vulkano/src/pipeline/compute.rs | 408 +- vulkano/src/pipeline/graphics/builder.rs | 5835 +++++++++----------- vulkano/src/pipeline/graphics/creation_error.rs | 100 +- vulkano/src/pipeline/graphics/mod.rs | 14 +- vulkano/src/shader/mod.rs | 793 +-- vulkano/src/shader/reflect.rs | 161 +- 48 files changed, 4225 insertions(+), 4431 deletions(-)

Rua commented 1 year ago

At what point in the program does the crash happen?

Michaelschnabel-DM commented 1 year ago

Its line 440 in the triangle-v1_3 example where the GraphicsPipeline is beeing created.

Rua commented 1 year ago

I'm unable to reproduce the issue. Valgrind doesn't detect any memory problems when I run the triangle example. Does it report anything when you try? Or maybe if you run with address sanitizer?

daigennki commented 1 year ago

I too was getting STATUS_ACCESS_VIOLATION with the AMD integrated graphics (Ryzen 7 5800H; Vega-based) in my laptop on Windows 10. It stopped happening after I updated the AMD graphics driver to 23.11.1, just released on Nov. 2nd, so that might be worth a try.

Rua commented 1 year ago

In case the problem also exists on Vulkano's end, it would still be useful to know the exact cause of the crash. Especially since the problem was triggered by changes made to Vulkano.

Michaelschnabel-DM commented 1 year ago

Hi, @Rua I did the following:

set RUSTFLAGS=-Zsanitizer=address cargo clean cargo build cargo run --bin triangle-v1_3

But did not gather any additional information by doing so. Did I do anything wrong or is there sth else I could try?

Michaelschnabel-DM commented 1 year ago

Setting all of the other flags also had no effect: image

marc0246 commented 1 year ago

I think it's clear that the segfault happens in the driver, otherwise I and everyone else with an NVIDIA card would observe the same on Windows (but I don't). I'm pretty sure that turning on sanitizers won't do anything to catch memory bugs happening inside the driver, because they rely on the compiler emitting special code for the sanitizer to work, and you don't control the code of the driver. That said, don't ask me how to debug this because I have no idea myself.

Michaelschnabel-DM commented 1 year ago

Hi @marc0246 thanks for your response. I have no issue with updating the driver. Just wanted to provide help / information in case this was caused by vulkano :)

marc0246 commented 1 year ago

I would be interested in what changed on our end as well, it's just I'm not sure how to go about that. Every time I find a bug in the NVIDIA driver for Linux (and I have been traumatized by the amount at this point) I stop at that. I've also been staring at the diff for hours and I can't make out anything. I guess the only way would be to brute-force it which is probably not worth the effort.

Michaelschnabel-DM commented 1 year ago

Updating to the latest pro drivers solved the issue: image I think this can be closed then.

marc0246 commented 1 year ago

Thanks for reporting anyway. :)