quarkslab / qbindiff

Quarkslab Bindiffer but not only !
https://diffing.quarkslab.com
Apache License 2.0
181 stars 8 forks source link

BinExport files result in many keyerror issues #35

Open maximus12793 opened 1 year ago

maximus12793 commented 1 year ago

Following the documentation I've tried secondary = Program(LoaderType.binexport, "./objcopy.BinExport") and p1 = Program(Path("objcopy.BinExport")) both of which crash in unexpected ways.

...
    91     dst = cg.vertex[edge.target_vertex_index].address
     92     self.callgraph.add_edge(src, dst)
---> 93     self[src].children.add(self[dst])
     94     self[dst].parents.add(self[src])
     96 # Create a map of function names for quick lookup later on
...
  self.be_prog = binexport.ProgramBinExport(file)
  File ".../python3.10/site-packages/binexport/program.py", line 93, in __init__
    self[src].children.add(self[dst])
KeyError: 4751856

When using the cli via qbindiff -l 'binexport' file1.BinExport file2.BinExport I get the same sort of key errors as well despite these files working fine with BinDiff as-is.

Note: If I do not explicitly ask for binexport I get this issue

     59     self._backend = ProgramBackendQuokka(*args, **kwargs)
     61 else:
---> 62     raise NotImplementedError("Loader: %s not implemented" % loader)
     64 self._filter = lambda x: True
     65 self._load_functions()

Any idea what could be causing the issue? I am using BinDiff 8 and Ghidra 10.3

Fenrisfulsur commented 1 year ago

Hello maximus12793,

We are aware if this problem, it's because Ghidra creates reference to unknown function that doesn't exist in the exported file. The problem is actually related to python-binexport, We are currently working on a patch.

If you want to have it working now, you can apply the fix from here to your local install of python-binexport.

For the second issue, you can try without the -l option, it will use BinExport by default.

If you have still some issues, let me know :)

maximus12793 commented 1 year ago

@Fenrisfulsur thanks for the quick reply! Just to confirm, the documentation of python-binexport says it requires IDA. Based on the commit you linked, it looks like this should just help process the file and I am ok to keep using the Ghidra setup I have today? I do not have an IDA instance currently.

As python-binexport entirely relies on Binexport, it has to be installed first. The project is available at: https://github.com/google/binexport

Note that python-binexport requires IDA >=7.2 (as it calls the BinExportBinary IDC function).
Fenrisfulsur commented 1 year ago

No, there is no need to use IDA, you can just use the Ghidra exporter.

RobinDavid commented 1 year ago

Thanks @maximus12793 for the feedback. Yes at the time of writing Binexport only had an IDA plugin. I just updated the README to make it more explicit: https://github.com/quarkslab/python-binexport/commit/2fe404cd88db9f2016ad35669ab70269be1ccfe1.

For the issue, yes we are aware of it, it shall get fixed soon.

maximus12793 commented 1 year ago

@Fenrisfulsur that did the trick thanks!

One quick follow up question. Is it expected that we get a lot of missing function address errors when loading BinExport files? With the two I've processed via Ghidra's BinExport extension I get quite a bit of the following, and it takes ~10+min to process two exports. These load in BinDiff in ~5-10sec so I am curious if there's anything I might need to change from the example snippet provided in the readme?

ERROR:root:Missing function address: 0x12f228 (3)
ERROR:root:Missing function address: 0x12f230 (3)
ERROR:root:Missing function address: 0x12f238 (3)
ERROR:root:Missing function address: 0x12f240 (3)
ERROR:root:Missing function address: 0x12f248 (3)
ERROR:root:Missing function address: 0x12f250 (3)
ERROR:root:Missing function address: 0x12f258 (3)
ERROR:root:Missing function address: 0x12f260 (3)
ERROR:root:Missing function address: 0x12f268 (3)
ERROR:root:Missing function address: 0x12f270 (3)
ERROR:root:Missing function address: 0x12f278 (3)
ERROR:root:Missing function address: 0x12f280 (3)
ERROR:root:Missing function address: 0x12f288 (3)

...
305 lines

Also happy to close this thread and upload some examples to debug further. Looks like with small binaries (< 700kb). For context I'd like to load two binaries and get the top level similarity score.

Fenrisfulsur commented 1 year ago

Sorry for the delay, I did not receive the notification. Yeah, this issue tends to happen often when using the Ghidra Exporter. This is mainly due to errors in Ghidra analysis that are not correctly handled by BinExport plugin. We are currently working on a Quokka exporter for Ghidra to address these issues.