Closed godlikepanos closed 7 months ago
The code would technically work with 126, but because each meshlet aligns the index data by 4 bytes for performance, using values that aren't divisible by 4 is dangerous as they complicate the computation for worst case index buffer size and the caller might not correctly perform this; for example, for max_triangles 126, the worst case effective amount of index data per meshlet is not 126*3=378 bytes - it's 380 bytes. If the caller allocates 378 bytes per meshlet, degenerate inputs may result in an overflow of that buffer.
This limitation could of course be removed - it's just the assertion that fails; the documentation could be adjusted to reflect this (the documentation doesn't seem to mention this restriction so it needs to be updated either way...). However, in practice I think you should not observe performance degradation between 126 and the next lowest value divisible by 4 (124) because with a configuration that NVidia recommends (64v 126t), meshlets should almost never reach 124 triangles because of topology constraints.
Thanks for the explanation. An update in the README might be enough then.
Hi,
nVidia recommends meshlets with 126 primitives according to this page https://developer.nvidia.com/blog/advanced-api-performance-mesh-shaders. The problem is that the meshopt_buildMeshletsXXX functions are not designed to work with a value like 126. Passing 126 will hit the following assertions:
Not sure if it would be a good idea to remove that limitation.