joxeankoret / pigaios

A tool for matching and diffing source codes directly against binaries.
GNU General Public License v3.0
634 stars 67 forks source link

Check why the hell the export process goes slower using parallelism than doing it from a single thread #6

Closed joxeankoret closed 6 years ago

WanderingGlitch commented 6 years ago

Looking at CBaseExporter.do_export_one and CClangExporter.export_one it looks like most of the time spent handling those functions will be spent in Python land. The sqlite functions will hit C and release the GIL as will some of the clang code since it's going through ctypes, but outside of that it'll all run single-threaded

It may be faster to use a multiprocessing Pool (not a ThreadPool) but you'd have to make sure the SQL inserts all happen from one process since sqlite doesn't support multiple writers across different processes

joxeankoret commented 6 years ago

I managed to fix the problem some days ago but forgot to push the changes (commit f07f0b6ad576cd52d5da0f1f306f795ddb3420b6). And yes, the changes are just what you say. This is how it performs now with proper parallelism:

Building with parallelism zlib-1.2.11, ~2 seconds:

$ time.py srcbindiff.py -parallel -export
[i] Removing existing file zlib-1.2.11.sqlite
Using a total of 8 thread(s)
[+] CC gzlib.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include
[+] CC compress.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include
(...)
[+] Building the callgraphs...
[+] Creating indexes...

Time 0:00:02.597428

Building without parallelism zlib-1.2.11, ~6 seconds:

joxean@joxean-2017:~/devel/zlib/zlib-1.2.11$ time.py srcbindiff.py -export
[i] Removing existing file zlib-1.2.11.sqlite
[+] CC gzlib.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include
[+] CC compress.c -I/usr/lib/llvm-3.8/bin/../lib/clang/3.8.0/include -I. -I./include
(...)
[+] Building the callgraphs...
[+] Creating indexes...

1 warning(s), 19 error(s), 1 fatal error(s)

Time 0:00:06.893240