zeux / meshoptimizer

Mesh optimization library that makes meshes smaller and faster to render
MIT License
5.49k stars 473 forks source link

Large perf improvement #713

Closed Sluggernot closed 2 months ago

Sluggernot commented 2 months ago

I was working in Godots main branch, trying to improve load times and made a ton of changes to this file. I ultimately found that this one small change accounted for 90% or more of my total improvements. In godot, loading a ~34 MB glb file went from 18 seconds to 4.9 seconds.

zeux commented 2 months ago

This completely breaks the simplifier since you no longer rank collapses.

Sluggernot commented 2 months ago

How so? I'm missing something about this function, then. Without my change: We loop through the collapses array, setting Collapse c to be a reference to collapses[i]. Quadric Error calculations. Then we set data to our referenced Collapse c. Then c falls out of scope and is recreated again, in the next iteration, and set to the next Collapse.

If we declare c above the for loop and simply change which item in the array it is referencing every iteration, what breaks?

zeux commented 2 months ago

If we declare c above the for loop and simply change which item in the array it is referencing every iteration, what breaks?

That is not how your code works. Your code change always references element 0 in the loop body. The assignment overwrites collapse 0 with collapse on each iteration, but the error on any other collapses is never updated. You would note that this change breaks the tests (https://github.com/zeux/meshoptimizer/actions/runs/9704155260/job/26783830103).

Sluggernot commented 2 months ago

I did see that test failing from the results of my pull request. Unfortunately I am failing on line 880 without any changes. I'll keep at it.

Sluggernot commented 2 months ago

And sorry. I didn't say thank you for enlightening me. I really appreciate it. I still have a lot to learn. Related, I commented out the tests that I couldn't pass (with an unchanged copy of the repo). Now I am able to break on line 955 with that change. Perfect.

So, out of curiosity I tried:

Collapse* c = &collapses[0];
for (size_t i = 0; i < collapse_count; ++i)
{
    c = &collapses[i];

    unsigned int i0 = c->v0;
    unsigned int i1 = c->v1;

    // most edges are bidirectional which means we need to evaluate errors for two collapses
    // to keep this code branchless we just use the same edge for unidirectional edges
    unsigned int j0 = c->bidi ? i1 : i0;
    unsigned int j1 = c->bidi ? i0 : i1;

    float ei = quadricError(vertex_quadrics[remap[i0]], vertex_positions[i1]);
    float ej = quadricError(vertex_quadrics[remap[j0]], vertex_positions[j1]);

    if (attribute_count)
    {
        ei += quadricError(attribute_quadrics[remap[i0]], &attribute_gradients[remap[i0] * attribute_count], attribute_count, vertex_positions[i1], &vertex_attributes[i1 * attribute_count]);
        ej += quadricError(attribute_quadrics[remap[j0]], &attribute_gradients[remap[j0] * attribute_count], attribute_count, vertex_positions[j1], &vertex_attributes[j1 * attribute_count]);
    }

    // pick edge direction with minimal error
    c->v0 = ei <= ej ? i0 : j0;
    c->v1 = ei <= ej ? i1 : j1;
    c->error = ei <= ej ? ei : ej;
}

This appears to pass. I didn't want to spam you with another potentially terrible pull request. Do you see any issues here?

zeux commented 2 months ago

This code looks correct but it should not be any faster.

Sluggernot commented 2 months ago

I agree. Unfortunately I am seeing a load time in Godot for a ~34 MB glb file go from: EditorFileSystem: "res://chess_set.glb" import took 18206 ms. Down to: EditorFileSystem: "res://chess_set.glb" import took 4813 ms.

And I also don't see how. But I have tested it about a dozen times on two copies of the repo.