Adding normals slows AddTriangle to crawl

wlinna commented 3 years ago

I am trying to convert a rather large (350 MiB) IFC model to glTF. The problem is that this conversion had already taken close to 60 minutes, and according to various profiles I took, at least 85% of the time is consumed running AddTriangle. Eventually I just interrupted the conversion (the conversion was still approximately only 6 % complete!).

Later I stopped adding normals, and the conversion speed dropped down to 11 minutes! And by setting VertexPreprocessor to null, it dropped to 9:50 minutes.

(Even then 90 % of the time is spent in AddTriangle, which seems very heavy to me, considering that I basically just want to write data, and the resulting gltf + bin file amount to 23 MiB.)

//using VERTEX = SharpGLTF.Geometry.VertexTypes.VertexPositionNormal;
using VERTEX = SharpGLTF.Geometry.VertexTypes.VertexPosition;
..
foreach(var t in p.triangles) {
    //prim.AddTriangle(
    //   new VERTEX( <positions> <normals>),
    //   new VERTEX( <positions> <normals>),
    //   new VERTEX( <positions> <normals>);
    prim.AddTriangle(
        new VERTEX( <positions>),
        new VERTEX( <positions>),
        new VERTEX( <positions>),
}

I am using .NET 5, VS 2019 on Windows 10

vpenades commented 3 years ago

How many vertices and triangles the input mesh has?

Under the hood, AddTriangle allocates a huge dictionary of vertices, so every vertex occurrence only happens once. As the mesh grows, every new addition checks if the vertices already exist in the collection. The dictionary is quite heavy in computing the vertex hash, and I suspect that code can be optimized a bit.

Could it be possible for you to write a tiny command line app benchmark with your use case?

vpenades commented 3 years ago

I've made a small test that uses VertexPositionNormal and I've been able to add 10 million triangles into a single mesh primitive, in just 1 minute, so maybe the performance loss is somewhere else. I would need more information to reproduce.

wlinna commented 3 years ago

Thank you for the quick response!

I forgot to mention that I use alpha0020.

In the final gltf there are (according to Blender):

~930k vertices
1.87 million triangles

Some of the meshes are weirdly large considering their visual output. They have 250k triangles each, many of them which have an area very close to zero or zero. Does your test have triangles like that? I just noticed that there are some limited forms of degeneracy checks, and I'm wondering if that might cause lots of branch prediction failures or something. This is just a wild guess of course, and I doubt that's the cause since those degenerate triangles end up in the final glTF anyway.

The memory consumption of the application (I am not referring to SharpGLTF here) is quite high, about 4 GiB much of the time. It makes me wonder if that can cause lots of memory pressure. Otherwise I'm not sure why adding normals would cause performance degradation that big, unless there are still some tests. I'm trying to figure out.

Can I have your test code? Maybe I'll be able to modify it to exhibit the same kind of slowness that my application is exhibiting.

wlinna commented 3 years ago

It turns out that even one of those weirdly complex meshes takes minutes to create. Here's one such specimen

https://drive.google.com/file/d/14EKbICJ9DwuTjcc4OAn2PHkcEAYW1cpe/view?usp=sharing

vpenades commented 3 years ago

900k vertices and 2m triangles should be fine to process within a few minutes at worst.

The glTF standard states that the burden of writing a proper glTF model resides on the exporter, that's why the library does so many checks everywhere. It's almost the only way to guarantee the data is correctly written.

That includes checking for degenerated triangles, which are not added to the final model, so if your source model has degenerated triangles, the resulting glTF model will have less triangles than the source model.

The memory consumption is huge, that's most probably caused by the internal vertex dictionary, which grows a lot.

This is the example I've wrote to test performance

```c# using System; using System.Collections.Generic; using System.Numerics; using VERTEX = SharpGLTF.Geometry.VertexTypes.VertexPositionNormal; namespace ExportMassiveMesh { class Program { static void Main(string[] args) { var srcMesh = SourceMesh.CreateRandomMesh(10000000); var dstMesh = new SharpGLTF.Geometry.MeshBuilder(); var dstPrim = dstMesh.UsePrimitive(SharpGLTF.Materials.MaterialBuilder.CreateDefault()); var timer = System.Diagnostics.Stopwatch.StartNew(); foreach(var (a, b, c) in srcMesh.TriangleIndices) { var va = srcMesh.GetVertex(a); var vb = srcMesh.GetVertex(b); var vc = srcMesh.GetVertex(c); dstPrim.AddTriangle(va, vb, vc); } timer.Stop(); System.Console.WriteLine($"Added {dstPrim.Triangles.Count} in {timer.Elapsed}"); var dstScene = new SharpGLTF.Scenes.SceneBuilder(); dstScene.AddRigidMesh(dstMesh, Matrix4x4.Identity); dstScene.ToGltf2().Save("result.glb"); } } class SourceMesh { public static SourceMesh CreateRandomMesh(int triangleCount) { var rnd = new Random(117); var mesh = new SourceMesh(); int tri = 0; while (triangleCount > 0) { --triangleCount; for(int i=0; i < 3; ++i) { var x = (float)rnd.NextDouble(); var y = (float)rnd.NextDouble(); var z = (float)rnd.NextDouble(); var p = new Vector3(x, y, z); var n = Vector3.Normalize(p); var v = new VERTEX(p, n); mesh._Vertices.Add(v); } mesh._Indices.Add((tri * 3 + 0, tri * 3 + 1, tri * 3 + 2)); ++tri; } return mesh; } private readonly List _Vertices = new List(); private readonly List<(int, int, int)> _Indices = new List<(int,int,int)>(); public IEnumerable<(int A, int B, int C)> TriangleIndices => _Indices; public VERTEX GetVertex(int vertexIndex) { return _Vertices[vertexIndex]; } } } ```

Notice that I've isolated the timer benchmarking only to the AddTriangle method.

vpenades commented 3 years ago

https://drive.google.com/file/d/14EKbICJ9DwuTjcc4OAn2PHkcEAYW1cpe/view?usp=sharing

I took your model and modified the example to import the triangles and re-export it... it took around 3 seconds.

I have an Intel Core I7 at 3.6ghz .... I guess your're not running on a Celeron or Atom...

wlinna commented 3 years ago

That includes checking for degenerated triangles, which are not added to the final model, so if your source model has degenerated triangles, the resulting glTF model will have less triangles than the source model.

I have trouble finding a mention of degenerate triangles from the specification. And the resulting glTF file seems to pass gltf validation despite its many remaining degenerate triangles. And removing degenerate triangles itself would kinda break the model, at least for our use, because it would leave holes in the model (even if the hole has an are of zero, it's still a hole). Those holes prevent us from doing certain types of analysis on the model.

If the library simply remove the degenerate triangles, it denies the possibility of fixing the degenerate triangles further in the pipeline, like we do with another program, written in another programming language.

Fortunately for us though, most degenerate triangles are left alone.

The glTF standard states that the burden of writing a proper glTF model resides on the exporter, that's why the library does so many checks everywhere. It's almost the only way to guarantee the data is correctly written.

The way I read this is that the runtime showing the models is not required to fix broken meshes. It doesn't say though where those fixes should be made, or who/what is responsible for the correctness. Is it the modeler? Is it the modeling tool? Or the code that turns input data into output data (the exporter)?

wlinna commented 3 years ago

Curious. Note that the data I exported was exported by Blender. It might have been modified already. I'll try to create a model exported directly via SharpGLTF.

I have i5-6200U CPU @ 2.30GHz, and 32 GiB RAM.

I think the memory consumption is because of all the data that the IFC importer imports. The memory consumption soars to 4 GiB by that alone, almost, and plateaus at 4 GiB, at least until the problematic object has been exported.

vpenades commented 3 years ago

Regarding degenerated triangles:

SharpGLTF considers a degenerated triangle, one where the positions of two vertices are exactly the same. If the vertices are very very close, but not the same, the triangle is not considered degenerated and is kept in the mesh.

The way I read this is that the runtime showing the models is not required to fix broken meshes. It doesn't say though where those fixes should be made, or who/what is responsible for the correctness. Is it the modeler? Is it the modeling tool? Or the code that turns input data into output data (the exporter)?

It's more like an exporter should not produce broken meshes, so a runtime does not need to check for them. Who is responsible for this? Clearly it's the library writing the glTF, at which point a glTF export library being fed with broken data has two options: throw exceptions and try to fix them. But what a glTF export library cannot do is to export a broken glTF (which is, unfortunately, what's currently happening currently with most libraries)

Also keep in mind that glTF is not an intermediate format, it's a "last mile" format intended to deliver "performant visual assets", so, to some degree, exporters are free to clean and modify the geometry to optimize it for performance, while keeping the visual appearance.

Btw, if you want to further discuss, you can also find me at discord.

wlinna commented 3 years ago

Here's a direct export via SharpGLTF

https://drive.google.com/file/d/1vl6H7QqiVC8G1iwCmA40liU-nqgd9O7S/view?usp=sharing

I tried logging some information regarding the processing time. I wanted to see if there's a pattern. I noticed that some primitives disproportionate amount of time to process. I also tried to record the maximum time it takes to call AddTriangle once. The code to measure AddTriangle time is this:

var watch = System.Diagnostics.Stopwatch.StartNew();
var addResult = prim.AddTriangle(v1, v2, v3);
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
primitiveTime += elapsedMs;
if (elapsedMs > maxProcessingTime) {
        maxProcessingTime = elapsedMs;
        Console.WriteLine("New record time on AddTriangle {0}", elapsedMs);
}

Primitive processing time is the time it takes to call AddTriangle for all the triangles of the given primitive. TotalPrimitiveTime is the sum of all Primitive processing time's. New record time is the longest time that calling AddTriangle once has ever taken

The times are in milliseconds.

RUN 1

Primitive processing time 8. Triangles: 76
Primitive processing time 0. Triangles: 36
Primitive processing time 0. Triangles: 88
Primitive processing time 0. Triangles: 36
Primitive processing time 0. Triangles: 12
New record time on AddTriangle 17
Primitive processing time 232. Triangles: 6276
Primitive processing time 25831. Triangles: 37012
Primitive processing time 7292. Triangles: 5944
Primitive processing time 30. Triangles: 25902
Primitive processing time 194. Triangles: 2908
New record time on AddTriangle 24
New record time on AddTriangle 25
New record time on AddTriangle 26
New record time on AddTriangle 27
New record time on AddTriangle 33
New record time on AddTriangle 47
New record time on AddTriangle 75
Primitive processing time 300946. Triangles: 119846
Primitive processing time 31806. Triangles: 19842
Primitive processing time 0. Triangles: 56
TotalPrimitiveTime: 366339

RUN 2

Primitive processing time 7. Triangles: 76
Primitive processing time 0. Triangles: 36
Primitive processing time 0. Triangles: 88
Primitive processing time 0. Triangles: 36
Primitive processing time 0. Triangles: 12
New record time on AddTriangle 12
Primitive processing time 248. Triangles: 6276
Primitive processing time 20394. Triangles: 37012
Primitive processing time 5821. Triangles: 5944
Primitive processing time 23. Triangles: 25902
Primitive processing time 166. Triangles: 2908
New record time on AddTriangle 16
New record time on AddTriangle 18
New record time on AddTriangle 35
New record time on AddTriangle 44
New record time on AddTriangle 53
New record time on AddTriangle 65
New record time on AddTriangle 78
New record time on AddTriangle 80
New record time on AddTriangle 126
Primitive processing time 305068. Triangles: 119846
Primitive processing time 28935. Triangles: 19842
Primitive processing time 0. Triangles: 56
TotalPrimitiveTime: 360662

For example, it seems that some primitives simply take longer time to process even if they are not particularly big.

RUN 1
...
Primitive processing time 30. Triangles: 25902
Primitive processing time 194. Triangles: 2908
...

RUN 2:
...
Primitive processing time 23. Triangles: 25902
Primitive processing time 166. Triangles: 2908
...

I was also able to cut down the total application memory usage to 2 GiB for testing purposes (by not importing other geometry), but it didn't change this pattern.

I suppose I have to take a look at the test next

vpenades commented 3 years ago

I've modified the example to load your meshes and dump the triangles to a new MeshBuilder, and the result is still within 2 minutes

Example of loading a gltf, dump triangles to another gltf.

```c# using System; using System.Collections.Generic; using System.Numerics; using VERTEX = SharpGLTF.Geometry.VertexTypes.VertexPositionNormal; using SharpGLTF.Schema2; namespace ExportMassiveMesh { class Program { static void Main(string[] args) { // var srcMesh = SourceMesh.CreateRandomMesh(10000000); var srcMesh = SourceMesh.CreateFromFile(System.IO.Path.GetFullPath("Palkkikatu4.gltf")); var dstMesh = new SharpGLTF.Geometry.MeshBuilder(); var dstPrim = dstMesh.UsePrimitive(SharpGLTF.Materials.MaterialBuilder.CreateDefault()); var timer = System.Diagnostics.Stopwatch.StartNew(); foreach(var (a, b, c) in srcMesh.TriangleIndices) { var va = srcMesh.GetVertex(a); var vb = srcMesh.GetVertex(b); var vc = srcMesh.GetVertex(c); dstPrim.AddTriangle(va, vb, vc); } timer.Stop(); System.Console.WriteLine($"Added {dstPrim.Triangles.Count} in {timer.Elapsed}"); var dstScene = new SharpGLTF.Scenes.SceneBuilder(); dstScene.AddRigidMesh(dstMesh, Matrix4x4.Identity); dstScene.ToGltf2().Save("result.glb"); } } class SourceMesh { public static SourceMesh CreateFromFile(string filePath) { var model = ModelRoot.Load(filePath); var tris = model.DefaultScene.EvaluateTriangles(); var mesh = new SourceMesh(); int idx = 0; foreach (var tri in tris) { mesh._Vertices.Add(tri.A.Geometry); mesh._Vertices.Add(tri.B.Geometry); mesh._Vertices.Add(tri.C.Geometry); mesh._Indices.Add((idx * 3 + 0, idx * 3 + 1, idx * 3 + 2)); ++idx; } return mesh; } public static SourceMesh CreateRandomMesh(int triangleCount) { var rnd = new Random(117); var mesh = new SourceMesh(); int idx = 0; while (triangleCount > 0) { --triangleCount; for(int i=0; i < 3; ++i) { var x = (float)rnd.NextDouble(); var y = (float)rnd.NextDouble(); var z = (float)rnd.NextDouble(); var p = new Vector3(x, y, z); var n = Vector3.Normalize(p); var v = new VERTEX(p, n); mesh._Vertices.Add(v); } mesh._Indices.Add((idx * 3 + 0, idx * 3 + 1, idx * 3 + 2)); ++idx; } return mesh; } private readonly List _Vertices = new List(); private readonly List<(int, int, int)> _Indices = new List<(int,int,int)>(); public IEnumerable<(int A, int B, int C)> TriangleIndices => _Indices; public VERTEX GetVertex(int vertexIndex) { return _Vertices[vertexIndex]; } } } ```

wlinna commented 3 years ago

Thanks for the test! I will try that myself soon, hopefully this week.

It seems that it'll take two hours until I can write on Discord (according to the timer) even though it says 5 minutes, so I'll write one more off-topic comment here (because I need to go before 2 hours):

You are right, the output gltf should not be broken. But is degenerate triangle broken according to glTF standard? It seems like it's not taking a stance on that, which is equivalent to silent acceptance.

But I guess there's no reason to continue discussing the topic of "broken models". At least in my case, the degenerate triangles are usually on the same line rather than sharing points.

wlinna commented 3 years ago

Okay, I couldn't resist running it. It took 6 minutes for me to run the test.

Added 218034 in 00:06:06.2301567

That seems rather large. But isn't two minutes also rather long time to wait for this amount of triangles? Assimp managed to import the model, run a bunch of optimizations on it and export it back to gltf under one second.

Two culprits come to mind:

(re)allocations
vertex dictionaries.

--

could be mitigated a little bit if there was a possibility to pass multiple triangles at once and/or provide a size hint.
I've only seen that there's a vertex dictionary, but I'm not sure yet how it works. But I suppose the hashing could be changed, or maybe it would be possible opt out of it? The models can be post-processed anyway with many different, automatizable tools.

vpenades commented 3 years ago

Hmm... it seems the processor has a huge impact in the performace?

In my I7: Added 218034 in 00:00:01.0172687

I will take a look and see if I can improve the performance of adding a vertex; although more than speed, I'm even more concerned about memory usage, which is a lot. Most probably I can get some improvements by limiting the hashing only to the position of the vertex, instead of the whole vertex.

MeshBuilder API heavily depends on the premise of ensuring that a geometric vertex is never repeated, so the dictionary usage cannot be opted out.

If you're going to do heavy duty operations, and you have confidence on your own data, there's also an alternative path, which is this:

MeshBuilder essentially implements SharpGLTF.Geometry.IMeshBuilder interface... so you can create your own implementation that is filled with your own data. Then, it can be added to a SceneBuilder, and let the rest of the framework do the job. It's not something I've ever done, but the whole API is designed so something like this could be possible.

doing a custom implementation of SharpGLTF.Geometry.IMeshBuilder and SharpGLTF.Geometry.IPrimitiveReader<TMaterial> is not as difficult as it might seem because you don't need exotic stuff like morph targets, skinning, etc.

The other options is to create the Schema2 meshes directly, by filling the vertex attributes, indices, etc. Certainly, the API allows you doing that, but then you would have to deal with some low level stuff of the glTF mesh architecture.

wlinna commented 3 years ago

Curious. The test completed in two seconds with net48 target

vpenades commented 3 years ago

@wlinna I've been running the example in net5 too, and it completed under 2 seconds... so maybe we don't have exactly the same runtimes.

When measuring performance, it's important to try different runtimes because they're optimized differently. If the time difference between two runtimes is too big, then it's probably an optimization miss, or even a bug in the slower runtime.

I use to skip the very latest runtime for production, so I'll probably stick to NetCore3.1 for this whole year.

wlinna commented 3 years ago

Well, for us .NET Core 3.1 was worse, and we only switched to .NET 5 when it seemed like it was already working.. but now it seems like it the problem is still there.

https://github.com/vpenades/SharpGLTF/issues/57

I'm probably going to try this on Linux too, once I'm able to handle our native dependencies

vpenades commented 3 years ago

I've been able to improve the performance by a bit, by doing these steps:

Hash is calculated only from the Position, instead of using the whole vertex
"in" modifier on many methods to prevent redundant copies

I'll upload the changes along the day.

vpenades commented 3 years ago

do you have other machines where to try it?

wlinna commented 3 years ago

Can you send me your binary BTW? I'd like to try if it performs similarly on my machine

vpenades commented 3 years ago

Here they're:

SharpGLTF Perf Test Binaries.zip

wlinna commented 3 years ago

The conversion completes in 1.34 second with your .NET 5 version! Your .NET Core version completes in 1.47 seconds. Your .NET 4.7.2 version completes in 1.57 seconds.

Are you using your package as a NuGet package or as a project dependency?

vpenades commented 3 years ago

as a project dependency.... but I can compile a version with Alpha20

SharpGLTF Perf Test Binaries - Alpha20.zip

wlinna commented 3 years ago

SharpGLTF Perf Test Binaries - Alpha20.zip

With this, it took 5 minutes to complete. It's not any faster than my binary, because I was also able to reach that perf with some configuration and/or luck.

So I guess my temporary solution will be to add SharpGLTF alpha 0020 as a project dependency instead.

vpenades commented 3 years ago

The binaries using the project reference use the latest optimizations I've added, but they were not supposed to have so much effect.

Jumping from "seconds" to "minutes" is definitely not normal, and seeing so huge differences just by switching runtimes... it's fishy to say the least.

wlinna commented 3 years ago

I added SharpGLTF Alpha 0020 as a project dependency (except that I changed the target to net5.0, removed some packaging stuff like icons, strong naming etc.), compiled everything and tested... 5 minutes.

Then I did the same with the master branch, and the time plummeted to 1.32 seconds! 🎉

But yeah, definitely fishy!

wlinna commented 3 years ago

The original problem I had, the conversion of the whole model which originally would have taken over an hour with normals and ten minutes without normals took something like one minute this time!

wlinna commented 3 years ago

Anyway, thanks a lot for help. I'm not exactly sure what to do with this issue though. I have a temporary fix for my use-case of course: use SharpGLTF-master as a project dependency.

Did this problem really have anything to do with normals anyway?

vpenades commented 3 years ago

Did this problem really have anything to do with normals anyway?

The biggest difference is that the new code explictly uses the vertex position for getting the hash of the whole vertex, while the old code uses the default struct implementation for getting the hash (which scans all the properties/fields of the structure). Theoretically, the mapping to get the implicit hash is done only once.... but with the new optimizations being added to the newer runtimes, who knows what's happening under the hood.

Also, different runtimes use different implementations of Dictionary<,> , the old net framework uses a dictionary written in C#, while the new frameworks maps to a dictionary written in native code.

Plus, using System.Numerics.Vectors extensively also hits lots of edge optimization cases in the newer runtimes, which are being sorted as we speak.

As I said before, I am more concerned about the memory consumption of the Dictionary, which is too much for my taste, so most problaby sometime in the future I will revisit the vertex hashing to find a balanced solution between performance and memory usage.

Btw, you can also try building preview packages from the master by running this batch

FrankenApps commented 3 years ago

I have the same Issue with v1.0.0-alpha20from nuget. I generated a 18000 triangles with only one simple material into one mesh. Without normals it took 2.4s with well over 2.3s spend in the AddTriangle method. When adding normals, it took 17s to generate the same mesh. I will try to add the current master as a project dependency, as @wlinna seems to have done.

vpenades commented 3 years ago

Fixed in the new package v1.0.0-alpha21.

vpenades / SharpGLTF

Adding normals slows AddTriangle to crawl #82