Closed joxeankoret closed 6 years ago
I managed to fix the problem some days ago but forgot to push the changes (commit f07f0b6ad576cd52d5da0f1f306f795ddb3420b6). And yes, the changes are just what you say. This is how it performs now with proper parallelism:
Building with parallelism zlib-1.2.11, ~2 seconds:
$ time.py srcbindiff.py -parallel -export
[i] Removing existing file zlib-1.2.11.sqlite
Using a total of 8 thread(s)
[+] CC gzlib.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include
[+] CC compress.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include
(...)
[+] Building the callgraphs...
[+] Creating indexes...
Time 0:00:02.597428
Building without parallelism zlib-1.2.11, ~6 seconds:
joxean@joxean-2017:~/devel/zlib/zlib-1.2.11$ time.py srcbindiff.py -export
[i] Removing existing file zlib-1.2.11.sqlite
[+] CC gzlib.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include
[+] CC compress.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include
(...)
[+] Building the callgraphs...
[+] Creating indexes...
1 warning(s), 19 error(s), 1 fatal error(s)
Time 0:00:06.893240
Looking at
CBaseExporter.do_export_one
andCClangExporter.export_one
it looks like most of the time spent handling those functions will be spent in Python land. The sqlite functions will hit C and release the GIL as will some of the clang code since it's going through ctypes, but outside of that it'll all run single-threadedIt may be faster to use a multiprocessing Pool (not a ThreadPool) but you'd have to make sure the SQL inserts all happen from one process since sqlite doesn't support multiple writers across different processes