redballoonsecurity / ofrak

OFRAK: unpack, modify, and repack binaries.
https://ofrak.com
Other
1.85k stars 126 forks source link

Concurrent analyses of files with Ghidra backend clobber each other #344

Open EdwardLarson opened 1 year ago

EdwardLarson commented 1 year ago

What is the problem? (Here is where you provide a complete Traceback.) The ofrak_ghidra backend package only supports analyzing one file with Ghidra at a time. However, in OFRAK, components (including Ghidra components) are allowed to run concurrently in some situations. It is possible for OFRAK to try to run multiple Ghidra analyses concurrently, which end up clobbering each other and causing a hard error.

The exact traceback logs will vary a bit depending on when the Ghidra processes are clobbered, but it should end with something like this:

  File "/root/repos/rbs-ofrak/ofrak/disassemblers/ofrak_ghidra/ofrak_ghidra/components/ghidra_analyzer.py", line 109, in analyze
    program_name = await self._do_ghidra_import(ghidra_project, full_fname)
  File "/root/repos/rbs-ofrak/ofrak/disassemblers/ofrak_ghidra/ofrak_ghidra/components/ghidra_analyzer.py", line 154, in _do_ghidra_import
    "Disconnected from Ghidra repository before file import succeeded!"
ofrak_ghidra.ghidra_model.GhidraComponentException: Disconnected from Ghidra repository before file import succeeded!

Please provide some information about your environment. At minimum we would like the following information on your platform and Python environment:

This likely happens in any environment. For sure it occurs in the dev Docker image.

If you've discovered it, what is the root cause of the problem? The ofrak_ghidra backend package only supports analyzing one file with Ghidra at a time. However, in OFRAK, components (including Ghidra components) are allowed to run concurrently in some situations. It is possible for OFRAK to try to run multiple Ghidra analyses concurrently, which end up clobbering each other and causing a hard error.

How often does the issue happen? Consistent OFRAK starts Ghidra analyses concurrently. For example, unpack_recursively is one call that runs components in parallel. Recursively unpacking a filesystem that has multiple executables at the same level (e.g. in the same directory) will trigger this error as those executables will be unpacked concurrently with Ghidra.

What are the steps to reproduce the issue? Ideally, give us a short script that reproduces the issue.

The follow script creates two ELFs in the same directory in a tar archive, then unpacks them recursively with Ghidra, replicating the error:

mkdir ghidra_bug_example/
echo '
#include <stdio.h>

int main(int argc, char** argv){
    printf("Hello World!");
    return 0;
}
' > ghidra_bug_example/hello_world.c
gcc ghidra_bug_example/hello_world.c -o ghidra_bug_example/a1.out
gcc ghidra_bug_example/hello_world.c -o ghidra_bug_example/a2.out
tar -czf ghidra_bug_example.tar.gz ghidra_bug_example/
ofrak unpack ghidra_bug_example.tar.gz -r -b ghidra

How would you implement this fix? The ofrak_ghidra backend should ideally support analyzing multiple files, not clearing all existing analyses when a new file is added.

Are there any (reasonable) alternative approaches? Alternatively, the OFRAK auto run infrastructure could be made more complex to handle categories of components that cannot be run in parallel. However this is more complex and leaves the ofrak_ghidra backend still with strict limitations.

Are you interested in implementing it yourself?