question: suitable sizes, expected processing times?

antont commented 7 years ago

or put differently, will this run ever complete or is it too big? :) 30k scene objects in Blender.. (from a FBX)

39 000 000 lines parsed - Forced GC took 1.82s

processor #1: Duplicates
  - Using epsilon of 1e-06

antont commented 7 years ago

without deduping was fast even for that, and made it down to 2 objs so can be great .. i don't think it had any duplicates anyway, didn't find a single one during the 10 hours it ran (i7 4/8 core laptop)

39 000 000 lines parsed - Forced GC took 1.73s

processor #1: Duplicates - Disabled

processor #2: Merge
  - Found 2 unique materials

Parse                 1m 36.32s    38%
Merge                     0.41s    0.16%
Write                 2m 34.89s    61%
Total                 4m 11.61s

Vertices              8 497 719
Normals               5 415 022
UVs                  11 735 801

Faces                13 890 657

Objects                       2    -30563     -100%

Lines input          39 630 897
Lines output         39 569 790    -61 107    -0%

File input              1.54 GB
File output             1.51 GB    -33.39 MB  -2%

jonnenauha commented 7 years ago

Yes, I have some trouble with very large files (aka a lot of vertices/normals/uvs). The scanning time is not very linear when the file size grows :)

I'm working on a lot simpler algorithm that will catch 99% of duplicates from most files. Atm is very fast even without parallel exec, which I will probably still add. The ones that slip trough it are two vectors that are exactly eplison apart, eg. 0.000001. The technique uses string formatting and comparing the strings, it is a lot faster than doing abs comparisons for 3-4 floats. This string serialization can be done once while the vec comparisons need to be done each time for all vec pairs. I might be able to optimize this later.

But yeah, the faster algo should be better for pretty much all runs. I will probably do a boolean -some-flag to enable the much more time consuming check, default would be the new faster one.

I suppose you found the cmd line flag for disabling duplicate checking. As you can see then the time consuming part is the parsing. If the file is very large and you dont have enough RAM it will start to swap and the perf will be very bad. There might be still some room to reduce the struct size per vertex/normal/uv to reduce runtime mem cost.

I'll ping you once I get the faster algo in, maybe you can try how long it takes. I dont want to ask you to upload the 1.5gb file if its not already on the web somewhere? :) I do have one 4.5gb file myself for perf testing.

antont commented 7 years ago

yep i got the merge done by using -no-duplicates, for the later run pasted above.

i did have enough RAM for much of the deduping actually, at least for almost all of the ~10 hours it ran. was first IIRC just 1.2GB or so but then later 12GB and things started to get tight on the 16GB laptop. i think it started swapping then and the system SSD happened to be very full so i had to stop it.

very cool already that got the merge done -- was just able to test publish the scene on the web now (internally only, model not for public access (yet). is however from a c3po pilot where your company is too so I'll ask and most probably get permission to share for you now). is 280MB the final first compressed test asset now, I hope basic poly reduction tools can make it nice enough (to some 80-120MB perhaps)

am curious to learn about your optimizations later, also i figure there are clever mechanisms in existing optimization tools.

BTW afterwards i tested the merge with Blender too (that used to get the obj from the FBX -- Autodesk's FBX Converter borked) but it crashed (eventually), took very much mem I think pretty quickly (30k objects is much there). so thanks a lot already and there might well be potential here for cool things :)

jonnenauha commented 7 years ago

There is no great magic here. I'm just doing simple duplicate removal as it is a big problem on many obj files. Especially with uv:s and normals.

The grouping to a single draw call per material is very trivial as well. This is the full code https://github.com/jonnenauha/obj-simplify/blob/master/process-merge.go#L25-L105

Essentially its piling all the faces into fewer objects :)

I don't have any geometry recuders/simlifiers and don't plan on having them either. This would be out of scope and better done in Blender etc.

jonnenauha commented 7 years ago

Btw. I also implemented a -gzip option if you want to compress for permanent storage without hassle. Just set the Content-Encoding header in there correctly to gzip.

antont commented 7 years ago

yep that's what I meant too, that am hoping that doing normal poly reduction in Blender or other tools solves that scene.

but this tool was already helpful to do the simple merge .. because the scene was so big that e.g. Blender was not able to (crashed when I tried)

jonnenauha / obj-simplify

question: suitable sizes, expected processing times? #1