zeux / meshoptimizer

Mesh optimization library that makes meshes smaller and faster to render
MIT License
5.49k stars 473 forks source link

Improve support for cluster simplification #704

Closed zeux closed 2 months ago

zeux commented 2 months ago

When simplifying a small subset of the larger mesh, all computations that go over the entire vertex buffer become expensive; this adds up even when done once, and especially when done for every pass. This is a critical part of some workflows that combine clusterization and simplification, notably Nanite-style virtual geometry renderers.

This change introduces a sparse simplification mode that instructs the simplifier to optimize based on the assumption that the subset of the mesh that is being simplified is small. In that case it's worth spending extra time to convert indices into a small 0..U subrange, do all internal processing assuming we are working with a small vertex/index buffer, and remap the indices at the end. While this processing could be done externally, that is less efficient as it requires constant copying of position/attribute data; in constrast, we can do it fairly cheaply.

When using sparse simplification, the error is treated as relative to the mesh subset. This is a performance requirement as computing the full mesh extents is too expensive when the subset is small relative to the mesh, but it means that it can be difficult to rely on exact error metrics.

There are also cases in general, when not using sparse simplification, when an absolute error is more convenient. These can be achieved right now via meshopt_simplifyScale but that is an extra step that is not always necessary.

The new features can be accessed by adding meshopt_SimplifySparse and meshopt_SimplifyErrorAbsolute bit flags to simplification options.

As an example of a performance delta, the newly added simplifyClusters demo call takes 17.7 seconds to simplify a 870K triangle mesh one cluster at a time, with the new sparse mode it takes ~150 msec (100x faster).

zeux commented 2 months ago

I've tested this in Bevy using https://github.com/bevyengine/bevy/pull/13431 and the following diff (probably could have left 0.5 factor in but I am not sure it's correct to apply it!), and after that change simplification is barely visible in the profile - the overall process of data preparation is still not very fast because Bevy's meshlet connectivity analysis (find_connected_meshlets) is slow, but I'm sure that can be made faster separately.

patch ```patch diff --git a/crates/bevy_pbr/src/meshlet/from_mesh.rs b/crates/bevy_pbr/src/meshlet/from_mesh.rs index a5ff00fad..9d95978ee 100644 --- a/crates/bevy_pbr/src/meshlet/from_mesh.rs +++ b/crates/bevy_pbr/src/meshlet/from_mesh.rs @@ -58,6 +58,8 @@ impl MeshletMesh { .map(|m| m.triangle_count as u64) .sum(); + let scale = simplify_scale(&vertices); + // Build further LODs let mut simplification_queue = 0..meshlets.len(); let mut lod_level = 1; @@ -82,7 +84,7 @@ impl MeshletMesh { for group_meshlets in groups.values().filter(|group| group.len() > 1) { // Simplify the group to ~50% triangle count let Some((simplified_group_indices, mut group_error)) = - simplify_meshlet_groups(group_meshlets, &meshlets, &vertices, lod_level) + simplify_meshlet_groups(group_meshlets, &meshlets, &vertices, lod_level, scale) else { continue; }; @@ -287,6 +289,7 @@ fn simplify_meshlet_groups( meshlets: &Meshlets, vertices: &VertexDataAdapter<'_>, lod_level: u32, + scale: f32, ) -> Option<(Vec, f32)> { // Build a new index buffer into the mesh vertex data by combining all meshlet data in the group let mut group_indices = Vec::new(); @@ -299,7 +302,8 @@ fn simplify_meshlet_groups( // Allow more deformation for high LOD levels (1% at LOD 1, 10% at LOD 20+) let t = (lod_level - 1) as f32 / 19.0; - let target_error = 0.1 * t + 0.01 * (1.0 - t); + let target_error_rel = 0.1 * t + 0.01 * (1.0 - t); + let target_error = target_error_rel * scale; // Simplify the group to ~50% triangle count // TODO: Use simplify_with_locks() @@ -309,7 +313,7 @@ fn simplify_meshlet_groups( vertices, group_indices.len() / 2, target_error, - SimplifyOptions::LockBorder, + SimplifyOptions::LockBorder | SimplifyOptions::Sparse | SimplifyOptions::ErrorAbsolute, Some(&mut error), ); @@ -318,9 +322,6 @@ fn simplify_meshlet_groups( return None; } - // Convert error to object-space and convert from diameter to radius - error *= simplify_scale(vertices) * 0.5; - Some((simplified_group_indices, error)) } ```

For the above to work, meshopt-rs needs to get two extra enum entries and that's it.

zeux commented 2 months ago

Going to mark this as ready to merge although I want to look into using a hash map for second part of buildSparseRemap as on Windows the large allocation is not as fast as I'd like it to be (buildSparseRemap accounts for ~30% of cluster simplification there compared to ~2% on Linux).