linebender / vello

A GPU compute-centric 2D renderer.
http://linebender.org/vello/
Apache License 2.0
2.39k stars 140 forks source link

Renderer stops functioning under stress without generating error messages #720

Open beholdnec opened 1 month ago

beholdnec commented 1 month ago

I decided to stress-test Vello by modifying the with_winit example. Instead of just one tiger, I changed it to draw hundreds of animated tigers.

My branch with the modified test can be found here: https://github.com/beholdnec/vello/tree/stress-test

I found certain conditions where the renderer would simply stop generating frames. In my branch, 128 tigers are drawn.

When the renderer stops functioning, no error messages are generated and the program doesn't crash. It responds to user input but doesn't generate video.

I am running on an RTX 4080 with 16GB of video memory. My theory is I've hit a memory limit. It would be helpful if Vello would report an error in this condition rather than failing silently.

dominikh commented 1 month ago

The first one is https://github.com/linebender/vello/issues/366. I've experienced the issue with zooming in before, too. I don't recall if there's an open issue for that already.

beholdnec commented 1 month ago

Running the sample with --async_pipeline exposes this error message when I zoom in too far:

[2024-10-20T16:03:43.932Z ERROR wgpu_core::device::global] Buffer::map_async error: Buffer access out of bounds: last index 50583552 would overrun the buffer (limit: 50331648)
[2024-10-20T16:03:43.932Z ERROR wgpu::backend::wgpu_core] Handling wgpu errors as fatal by default
thread 'main' panicked at C:\Users\nolan\.cargo\registry\src\index.crates.io-6f17d22bba15001f\wgpu-22.1.0\src\backend\wgpu_core.rs:3411:5:
wgpu error: Validation Error

Caused by:
  In Buffer::map_async
    Buffer access out of bounds: last index 50583552 would overrun the buffer (limit: 50331648)

note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `target\release\with_winit_bin.exe --async-pipeline` (exit code: 101)
DJMcNab commented 1 month ago

These both sound like intentional behaviour added to handle #366 in #537. The alternative to skipping rendering that frame would be for there to be rendering artefacts.

Progress on resolving this has stalled due to a shortage of review bandwidth; I can't prioritise this work at the moment. A sketch of a solution can be found in #606.

The async pipeline is undermaintained (which is why we deprecated it). That failure is just suggesting that #366 is to blame again.

Is there a real world use case where you are running into these memory limits? That would help us to prioritise working on a fix.

beholdnec commented 1 month ago

Thanks, I don't really have a real world use case, I was just experimenting with vello. I can imagine certain use cases (like, say, a drawing program, if the user draws too complex geometry) where the error could be triggered in the wild. An error message would help - I wasn't familiar with vello's internals, so I couldn't easily tell this issue was caused by a limitation in vello.