zeux / meshoptimizer

Mesh optimization library that makes meshes smaller and faster to render
MIT License
5.49k stars 473 forks source link

clusterizer: Implement experimental meshlet optimizer #673

Closed zeux closed 5 months ago

zeux commented 5 months ago

So far we were mostly concerned with meshlet clustering from the perspective of treating meshlets as an unordered set of triangles; while this matches the computational and documented model, this may not be optimal for a given GPU.

Notably, NVidia GPUs are much more sensitive to the order of triangles in the meshlet than to the number and fill percentage; so much so that from pure rasterization performance, scan may win over proper clustering because it implicitly generates a better order.

We do not know the precise criteria / mechanism that NV GPUs use here but it helps to do locality optimization; most importantly, triangle order, but also reordering meshlet-local vertices helps a little bit.

This change implements a simple meshlet optimizer; while this can also be achieved by running existing optimization algorithms (vcache / vfetch) on meshlet data, a custom optimizer is faster even when using quadratic implementation, and may allow us to implement better locality reodering algorithms in the future assuming a small input patch.

On NVidia RTX 4090, this change can result in up to 15% speedup when workloads are raster-bound compared to just using buildMeshlets; the gains are workload and mesh dependent. niagara sees a 5% speedup when software triangle culling is disabled.