Open lfcnassif opened 3 years ago
I'm seeing a performance cost while trying to solve #486, apparently not for extra hash calculations, but for the extra indexed searches for media files. If you have this implemented, maybe it could be a good idea to test and see if it improves.
I will push the experimental branch tomorrow. Another idea is to query multiple items at the same time using an OR query, I don't know if you have tried this. I use jvisualvm "Sampler->CPU" feature to measure method calls cost, helps a lot to identify bottlenecks.
I was thinking exactly that a few minutes ago. Do a big query in advance, instead of one query per item. And see what happens.. will try that tomorrow
@fmpfeifer I just pushed the experimental lazy_load_fields branch after resolving some merge conflicts with master
Tested it here on to of my whatsapp-parser-bugfix branch. It improved the processing time.
p.s.: this test was done processing only "ChatStorage.sqlite" files on top of a pre-processed case with all other data (using --append) option.
Good, thank you! And about specific ParsingTask time a bit above those lines?
not so much:
didn't understand the math here:
Those stats were thought for cases with many items. Total task time per thread is measured, all are added and divided by numThreads at the end. As your case is just 1 item, the real time should be multiplied by numThreads. Or the performance test could be done with just 1 thread. I asked about ParsingTask time to exclude other things, like graph generation.
I see.. I disabled graph generation to test this
So, will not merge this for now, thanks!
ok.. one more thing.. I compared the result of both processing, and with the lazy_load I noticed that the item.getTypeExt() is not returning the correct file extension.
the function dpf.sp.gpinf.indexer.parsers.util.Util.getExportPath(IItemBase) is always returnig the file without extension. Either item.getTypeExt() is returning null or ""
Hum will take a look, thanks!
Commit above should fix this.
This is an old idea. When a new item is created, all indexed and stored fields are loaded at once. But sometimes parsers just want to query one or two item properties, possibly wasting time. I did not detect large bottlenecks with "load everything" approach in past tests and because this is a sensible change, I did not push. @fmpfeifer if you think current approach is a bottleneck on #486, I can push the experimental local branch for testing.