Parallel export - Githubissues

joxeankoret / diaphora

Diaphora, the most advanced Free and Open Source program diffing tool.

http://diaphora.re

GNU Affero General Public License v3.0

3.62k stars 373 forks source link

Parallel export #265

Open clslgrnc opened 1 year ago

clslgrnc commented 1 year ago

Following our brief exchange on mastodon, here is a complete parallel export code, with functions and call graph.

With 5 workers I get a 2x speedup on a 1MB binary and 100% match on callgraph and functions from a regular export.

Each job is assigned some of the functions to export.
all resulting db are merged
a final job exports the remaining data

I refactored all SQL insertions related to functions in order to easily switch between:

sequential export: rows are inserted sequentially
parallel export: ensure rowid % nbr_jobs = job_id in order to avoid collisions when merging

run with: IDADIR=<path-to-ida> ./diaphora_parallel_export.py <path-to-target-binary>

Two potential improvements that remain to be done:

[x] distribute address ranges to workers without relaunching IDA (needs a communication channel with the ida scripts)
[ ] use a threadsafe db in order to parallelize merges

Sequential export still seems to work.

joxeankoret commented 1 year ago

I will not integrate it into the main branch for obvious reasons, but I will leave this PR here so people can use it if they like.

clslgrnc commented 1 year ago

I can think of several obvious reasons not to integrate this into the main branch :smiley:

If you could integrate any (or all) of the first three commits (a1b769d404d55cbb03179db1bd252d089c1047cf, d28da05ac1e1ba4cece8846e0b68642bfeab80da, 6f9ae8605f54073f0bf98e5685e871ec55239bef) it would make maintaining my fork easier though. If you are interested, let me know which commits would be acceptable so that I can prepare a PR (if not, fair enough).

For anyone interested please post any issue about diaphora_parallel_export.py over there: https://github.com/clslgrnc/diaphora-parallel-export/issues

Edit:
I just expanded the commit messages (and updated the commit SHAs)

clslgrnc commented 1 year ago

From #264

Not a feature so much, but optimizing the export speed would be nice. I have an arm64 binary that takes over 2 and a half hours to export at the moment. [Diaphora: Wed Jul 19 21:07:13 2023] Database exported, time taken: 2:43:39.113402.

@Myles1 the diaphora_parallel_export.py script in this PR might be able to speed this up. It launches several instances of idat64 on copies of the target idb, thus RAM might be a limiting factor. Let me know if you have any question.

It's just so nice to be able to automate function matching in this way. You've made an enormously helpful program.

Indeed, thank you @joxeankoret

joxeankoret commented 1 year ago

I will review the commits as soon as I can and integrate them as possible. I will probably do this weekend. Thank you!