zeux / niagara

A Vulkan renderer written from scratch on stream
MIT License
1.26k stars 72 forks source link

Kitten starts to flicker when turn off frustum culling and lod together #36

Closed jsjtxietian closed 1 year ago

jsjtxietian commented 1 year ago

When I press C to turn off frustum culling and then press L to disable lod , the screen start to flickering as the video shows. If disable either frustum culling or lod, it will be stable again.

https://github.com/zeux/niagara/assets/8315986/018056fb-1af2-457b-bd8b-871c4252e0ac

zeux commented 1 year ago

Thanks - yeah I noticed this before but didn't act on it. Some fixed limit somewhere is probably being exhausted, will take a look.

jsjtxietian commented 1 year ago

Some fixed limit somewhere is probably being exhausted.

After some investigation I found the bug was probaly introduced by this commit https://github.com/zeux/niagara/commit/0eae88480ad110d5ecabed74b4452ab6d7136881

zeux commented 1 year ago

Yeah, with task command submission and all culling/LOD disabled we need ~8M commands (each command will submit ~4K triangles), for a total of ~32B triangles/frame. (edit there's a little bit of overallocation here happening due to task padding, kitten mesh is 29K triangles so the total should be 29B triangles for 1M kittens, but meshlets get grouped into 64-meshlet tasks so we have a little bit of redundancy).

The buffer we allocate for commands is enough for ~6.7M, and the encoding in tasksubmit.comp.glsl restricts it further to ~4M. The flickering is due to variability in GPU scheduling for writes to the task list which results in different geometry rendered per frame as the list isn't constructed in deterministic order.

zeux commented 1 year ago

Ah, right, and part of the issue is that this is a fundamental limitation of EXT_mesh_shader as maxTaskWorkGroupTotalCount is only guaranteed to be 4M (and is actually 4M on NV driver). So this isn't really avoidable in a meaningful sense with the approach of using a single draw call - multiple draw calls work but we had to stop doing that to avoid inefficiencies in GPU submission on AMD hardware.

It looks like we should do some code cleanup around this, eg to avoid buffer overrun and overallocation (we currently reserve 128 MB for draw commands, and yet the maximum amount of memory 4M commands use is ~80MB), but fundamentally it feels like a limitation that is fine to keep, as drawing more than 32B triangles in a single scene is impractical for performance.

jsjtxietian commented 1 year ago

Thank you for the explanation and clarification!