Performance Ideas: 2 pass ?

thisistherk / fast_obj

Fast C OBJ parser

MIT License

632 stars 48 forks source link

Performance Ideas: 2 pass ? #12

Open Kuranes opened 4 years ago

Kuranes commented 4 years ago

Hi,

Wondering if you had tried or have experience around those 2 ideas:

prepass for big allocation upfront and avoid realloc? (parse line numbers of v/n/vt then allocate once and for all )
prepass for "simd token tricks" as in https://github.com/lemire/simdjson (blogpost https://branchfree.org/2019/02/25/paper-parsing-gigabytes-of-json-per-second/ paper https://arxiv.org/abs/1902.08318). Idea implemented for example in csv parsing https://github.com/geofflangdale/simdcsv

thisistherk commented 4 years ago

Not really. I'd assume two passes would slow it down - I think reading and parsing in a single pass is where most of the performance benefit comes from over other libs. Last time I profiled it parsing of floats was the most time consuming bit of the code and I'm not really sure if there's any easy way of speeding that up (having said that, the simdjson code has a number parser in it that might be worth trying, although it looks fairly similar).

Did once have a crazy plan of attempting to multithread it at one point way back - one thread reads file into buffers, farms them off to workers to parse, then puts the results back together - just to see how fast I could get it to go, but that's (a) crazy, and (b) way too much for a simple single header lib :)

Kuranes commented 4 years ago

Indeed parse numbers are big in Vtune profile, didn't expect it was that much ! it's mostly float and int parsing, it makes senses.

I got big mem operation hotspots (as here "parse_face" hotspot is in fact mostly array_push ) in my profile that's why I though it could be interesting.

agree on multithread is more last resort, rarely fits in user code, and definitely hard API for a header lib

Thanks for the answer!

stgatilov commented 3 years ago

Most users won't like multithreading, since they have a bunch of models to load, and they can do much more efficient multithreading simply by distributing files across threads.

What is the target application that would really benefit from multithreaded reader? Loading single huge .obj file?

Kuranes commented 3 years ago

yes, huge .obj files,

Since Obj is so "easy to write", many scanners hardware/software output very huge obj as "raw source" (which are therefore often huge)

Fast_obj is already performing way better than all other libs, and indeed complexifying code base with MT might not be worth it.