Closed wlinna closed 3 years ago
How many vertices and triangles the input mesh has?
Under the hood, AddTriangle allocates a huge dictionary of vertices, so every vertex occurrence only happens once. As the mesh grows, every new addition checks if the vertices already exist in the collection. The dictionary is quite heavy in computing the vertex hash, and I suspect that code can be optimized a bit.
Could it be possible for you to write a tiny command line app benchmark with your use case?
I've made a small test that uses VertexPositionNormal and I've been able to add 10 million triangles into a single mesh primitive, in just 1 minute, so maybe the performance loss is somewhere else. I would need more information to reproduce.
Thank you for the quick response!
I forgot to mention that I use alpha0020.
In the final gltf there are (according to Blender):
Some of the meshes are weirdly large considering their visual output. They have 250k triangles each, many of them which have an area very close to zero or zero. Does your test have triangles like that? I just noticed that there are some limited forms of degeneracy checks, and I'm wondering if that might cause lots of branch prediction failures or something. This is just a wild guess of course, and I doubt that's the cause since those degenerate triangles end up in the final glTF anyway.
The memory consumption of the application (I am not referring to SharpGLTF here) is quite high, about 4 GiB much of the time. It makes me wonder if that can cause lots of memory pressure. Otherwise I'm not sure why adding normals would cause performance degradation that big, unless there are still some tests. I'm trying to figure out.
Can I have your test code? Maybe I'll be able to modify it to exhibit the same kind of slowness that my application is exhibiting.
It turns out that even one of those weirdly complex meshes takes minutes to create. Here's one such specimen
https://drive.google.com/file/d/14EKbICJ9DwuTjcc4OAn2PHkcEAYW1cpe/view?usp=sharing
900k vertices and 2m triangles should be fine to process within a few minutes at worst.
The glTF standard states that the burden of writing a proper glTF model resides on the exporter, that's why the library does so many checks everywhere. It's almost the only way to guarantee the data is correctly written.
That includes checking for degenerated triangles, which are not added to the final model, so if your source model has degenerated triangles, the resulting glTF model will have less triangles than the source model.
The memory consumption is huge, that's most probably caused by the internal vertex dictionary, which grows a lot.
Notice that I've isolated the timer benchmarking only to the AddTriangle method.
https://drive.google.com/file/d/14EKbICJ9DwuTjcc4OAn2PHkcEAYW1cpe/view?usp=sharing
I took your model and modified the example to import the triangles and re-export it... it took around 3 seconds.
I have an Intel Core I7 at 3.6ghz .... I guess your're not running on a Celeron or Atom...
That includes checking for degenerated triangles, which are not added to the final model, so if your source model has degenerated triangles, the resulting glTF model will have less triangles than the source model.
I have trouble finding a mention of degenerate triangles from the specification. And the resulting glTF file seems to pass gltf validation despite its many remaining degenerate triangles. And removing degenerate triangles itself would kinda break the model, at least for our use, because it would leave holes in the model (even if the hole has an are of zero, it's still a hole). Those holes prevent us from doing certain types of analysis on the model.
If the library simply remove the degenerate triangles, it denies the possibility of fixing the degenerate triangles further in the pipeline, like we do with another program, written in another programming language.
Fortunately for us though, most degenerate triangles are left alone.
The glTF standard states that the burden of writing a proper glTF model resides on the exporter, that's why the library does so many checks everywhere. It's almost the only way to guarantee the data is correctly written.
The way I read this is that the runtime showing the models is not required to fix broken meshes. It doesn't say though where those fixes should be made, or who/what is responsible for the correctness. Is it the modeler? Is it the modeling tool? Or the code that turns input data into output data (the exporter)?
Curious. Note that the data I exported was exported by Blender. It might have been modified already. I'll try to create a model exported directly via SharpGLTF.
I have i5-6200U CPU @ 2.30GHz, and 32 GiB RAM.
I think the memory consumption is because of all the data that the IFC importer imports. The memory consumption soars to 4 GiB by that alone, almost, and plateaus at 4 GiB, at least until the problematic object has been exported.
Regarding degenerated triangles:
SharpGLTF considers a degenerated triangle, one where the positions of two vertices are exactly the same. If the vertices are very very close, but not the same, the triangle is not considered degenerated and is kept in the mesh.
The way I read this is that the runtime showing the models is not required to fix broken meshes. It doesn't say though where those fixes should be made, or who/what is responsible for the correctness. Is it the modeler? Is it the modeling tool? Or the code that turns input data into output data (the exporter)?
It's more like an exporter should not produce broken meshes, so a runtime does not need to check for them. Who is responsible for this? Clearly it's the library writing the glTF, at which point a glTF export library being fed with broken data has two options: throw exceptions and try to fix them. But what a glTF export library cannot do is to export a broken glTF (which is, unfortunately, what's currently happening currently with most libraries)
Also keep in mind that glTF is not an intermediate format, it's a "last mile" format intended to deliver "performant visual assets", so, to some degree, exporters are free to clean and modify the geometry to optimize it for performance, while keeping the visual appearance.
Btw, if you want to further discuss, you can also find me at discord.
Here's a direct export via SharpGLTF
https://drive.google.com/file/d/1vl6H7QqiVC8G1iwCmA40liU-nqgd9O7S/view?usp=sharing
I tried logging some information regarding the processing time. I wanted to see if there's a pattern. I noticed that some primitives disproportionate amount of time to process. I also tried to record the maximum time it takes to call AddTriangle once. The code to measure AddTriangle time is this:
var watch = System.Diagnostics.Stopwatch.StartNew();
var addResult = prim.AddTriangle(v1, v2, v3);
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
primitiveTime += elapsedMs;
if (elapsedMs > maxProcessingTime) {
maxProcessingTime = elapsedMs;
Console.WriteLine("New record time on AddTriangle {0}", elapsedMs);
}
Primitive processing time
is the time it takes to call AddTriangle
for all the triangles of the given primitive.
TotalPrimitiveTime
is the sum of all Primitive processing time
's.
New record time
is the longest time that calling AddTriangle
once has ever taken
The times are in milliseconds.
RUN 1
Primitive processing time 8. Triangles: 76
Primitive processing time 0. Triangles: 36
Primitive processing time 0. Triangles: 88
Primitive processing time 0. Triangles: 36
Primitive processing time 0. Triangles: 12
New record time on AddTriangle 17
Primitive processing time 232. Triangles: 6276
Primitive processing time 25831. Triangles: 37012
Primitive processing time 7292. Triangles: 5944
Primitive processing time 30. Triangles: 25902
Primitive processing time 194. Triangles: 2908
New record time on AddTriangle 24
New record time on AddTriangle 25
New record time on AddTriangle 26
New record time on AddTriangle 27
New record time on AddTriangle 33
New record time on AddTriangle 47
New record time on AddTriangle 75
Primitive processing time 300946. Triangles: 119846
Primitive processing time 31806. Triangles: 19842
Primitive processing time 0. Triangles: 56
TotalPrimitiveTime: 366339
RUN 2
Primitive processing time 7. Triangles: 76
Primitive processing time 0. Triangles: 36
Primitive processing time 0. Triangles: 88
Primitive processing time 0. Triangles: 36
Primitive processing time 0. Triangles: 12
New record time on AddTriangle 12
Primitive processing time 248. Triangles: 6276
Primitive processing time 20394. Triangles: 37012
Primitive processing time 5821. Triangles: 5944
Primitive processing time 23. Triangles: 25902
Primitive processing time 166. Triangles: 2908
New record time on AddTriangle 16
New record time on AddTriangle 18
New record time on AddTriangle 35
New record time on AddTriangle 44
New record time on AddTriangle 53
New record time on AddTriangle 65
New record time on AddTriangle 78
New record time on AddTriangle 80
New record time on AddTriangle 126
Primitive processing time 305068. Triangles: 119846
Primitive processing time 28935. Triangles: 19842
Primitive processing time 0. Triangles: 56
TotalPrimitiveTime: 360662
For example, it seems that some primitives simply take longer time to process even if they are not particularly big.
RUN 1
...
Primitive processing time 30. Triangles: 25902
Primitive processing time 194. Triangles: 2908
...
RUN 2:
...
Primitive processing time 23. Triangles: 25902
Primitive processing time 166. Triangles: 2908
...
I was also able to cut down the total application memory usage to 2 GiB for testing purposes (by not importing other geometry), but it didn't change this pattern.
I suppose I have to take a look at the test next
I've modified the example to load your meshes and dump the triangles to a new MeshBuilder, and the result is still within 2 minutes
Thanks for the test! I will try that myself soon, hopefully this week.
It seems that it'll take two hours until I can write on Discord (according to the timer) even though it says 5 minutes, so I'll write one more off-topic comment here (because I need to go before 2 hours):
You are right, the output gltf should not be broken. But is degenerate triangle broken according to glTF standard? It seems like it's not taking a stance on that, which is equivalent to silent acceptance.
But I guess there's no reason to continue discussing the topic of "broken models". At least in my case, the degenerate triangles are usually on the same line rather than sharing points.
Okay, I couldn't resist running it. It took 6 minutes for me to run the test.
Added 218034 in 00:06:06.2301567
That seems rather large. But isn't two minutes also rather long time to wait for this amount of triangles? Assimp managed to import the model, run a bunch of optimizations on it and export it back to gltf under one second.
Two culprits come to mind:
--
Hmm... it seems the processor has a huge impact in the performace?
In my I7: Added 218034 in 00:00:01.0172687
I will take a look and see if I can improve the performance of adding a vertex; although more than speed, I'm even more concerned about memory usage, which is a lot. Most probably I can get some improvements by limiting the hashing only to the position of the vertex, instead of the whole vertex.
MeshBuilder API heavily depends on the premise of ensuring that a geometric vertex is never repeated, so the dictionary usage cannot be opted out.
If you're going to do heavy duty operations, and you have confidence on your own data, there's also an alternative path, which is this:
MeshBuilder essentially implements SharpGLTF.Geometry.IMeshBuilder
interface... so you can create your own implementation that is filled with your own data. Then, it can be added to a SceneBuilder, and let the rest of the framework do the job. It's not something I've ever done, but the whole API is designed so something like this could be possible.
doing a custom implementation of SharpGLTF.Geometry.IMeshBuilder
and SharpGLTF.Geometry.IPrimitiveReader<TMaterial>
is not as difficult as it might seem because you don't need exotic stuff like morph targets, skinning, etc.
The other options is to create the Schema2 meshes directly, by filling the vertex attributes, indices, etc. Certainly, the API allows you doing that, but then you would have to deal with some low level stuff of the glTF mesh architecture.
Curious. The test completed in two seconds with net48 target
@wlinna I've been running the example in net5 too, and it completed under 2 seconds... so maybe we don't have exactly the same runtimes.
When measuring performance, it's important to try different runtimes because they're optimized differently. If the time difference between two runtimes is too big, then it's probably an optimization miss, or even a bug in the slower runtime.
I use to skip the very latest runtime for production, so I'll probably stick to NetCore3.1 for this whole year.
Well, for us .NET Core 3.1 was worse, and we only switched to .NET 5 when it seemed like it was already working.. but now it seems like it the problem is still there.
https://github.com/vpenades/SharpGLTF/issues/57
I'm probably going to try this on Linux too, once I'm able to handle our native dependencies
I've been able to improve the performance by a bit, by doing these steps:
I'll upload the changes along the day.
do you have other machines where to try it?
Can you send me your binary BTW? I'd like to try if it performs similarly on my machine
Here they're:
The conversion completes in 1.34 second with your .NET 5 version! Your .NET Core version completes in 1.47 seconds. Your .NET 4.7.2 version completes in 1.57 seconds.
Are you using your package as a NuGet package or as a project dependency?
as a project dependency.... but I can compile a version with Alpha20
SharpGLTF Perf Test Binaries - Alpha20.zip
With this, it took 5 minutes to complete. It's not any faster than my binary, because I was also able to reach that perf with some configuration and/or luck.
So I guess my temporary solution will be to add SharpGLTF alpha 0020 as a project dependency instead.
The binaries using the project reference use the latest optimizations I've added, but they were not supposed to have so much effect.
Jumping from "seconds" to "minutes" is definitely not normal, and seeing so huge differences just by switching runtimes... it's fishy to say the least.
I added SharpGLTF Alpha 0020 as a project dependency (except that I changed the target to net5.0, removed some packaging stuff like icons, strong naming etc.), compiled everything and tested... 5 minutes.
Then I did the same with the master branch, and the time plummeted to 1.32 seconds! 🎉
But yeah, definitely fishy!
The original problem I had, the conversion of the whole model which originally would have taken over an hour with normals and ten minutes without normals took something like one minute this time!
Anyway, thanks a lot for help. I'm not exactly sure what to do with this issue though. I have a temporary fix for my use-case of course: use SharpGLTF-master as a project dependency.
Did this problem really have anything to do with normals anyway?
Did this problem really have anything to do with normals anyway?
The biggest difference is that the new code explictly uses the vertex position for getting the hash of the whole vertex, while the old code uses the default struct implementation for getting the hash (which scans all the properties/fields of the structure). Theoretically, the mapping to get the implicit hash is done only once.... but with the new optimizations being added to the newer runtimes, who knows what's happening under the hood.
Also, different runtimes use different implementations of Dictionary<,> , the old net framework uses a dictionary written in C#, while the new frameworks maps to a dictionary written in native code.
Plus, using System.Numerics.Vectors extensively also hits lots of edge optimization cases in the newer runtimes, which are being sorted as we speak.
As I said before, I am more concerned about the memory consumption of the Dictionary, which is too much for my taste, so most problaby sometime in the future I will revisit the vertex hashing to find a balanced solution between performance and memory usage.
Btw, you can also try building preview packages from the master by running this batch
I have the same Issue with v1.0.0-alpha20
from nuget.
I generated a 18000 triangles with only one simple material into one mesh.
Without normals it took 2.4s with well over 2.3s spend in the AddTriangle
method.
When adding normals, it took 17s to generate the same mesh.
I will try to add the current master as a project dependency, as @wlinna seems to have done.
Fixed in the new package v1.0.0-alpha21.
I am trying to convert a rather large (350 MiB) IFC model to glTF. The problem is that this conversion had already taken close to 60 minutes, and according to various profiles I took, at least 85% of the time is consumed running AddTriangle. Eventually I just interrupted the conversion (the conversion was still approximately only 6 % complete!).
Later I stopped adding normals, and the conversion speed dropped down to 11 minutes! And by setting VertexPreprocessor to null, it dropped to 9:50 minutes.
(Even then 90 % of the time is spent in AddTriangle, which seems very heavy to me, considering that I basically just want to write data, and the resulting gltf + bin file amount to 23 MiB.)
I am using .NET 5, VS 2019 on Windows 10