google / binexport

Export disassemblies into Protocol Buffers
Apache License 2.0
1.05k stars 206 forks source link

Function are split into smaller parts #48

Open psylocn opened 4 years ago

psylocn commented 4 years ago

I did some patch diffing using BinDiff and I noticed that there is a bug, which I guess is located in BinExport.

The problem is that BinExport splits functions into several small parts, each recognized as an one function. This make the analysis using BinDiff very difficult if many function have changed.

As an example I used BinExport on the following binary: win32kfull.sys 10.0.18362.592_x64 (4D6691EEBC1FE5DB0FF4691DE10F429779B44D4208219A53A6116ADBA5484A5B)

Using the 'Text Dump Export' Button, I created a list of functions:

1C0002C40          EngPlgBlt
1C0002CE4          sub_1C0002CE4
1C0002CFE          sub_1C0002CFE
1C0002D10          sub_1C0002D10

It should have only found the function 'EngPlgBlt' and not the other three. Its easier to see if you have symbols for the binaries, because there shouldn't be many function without symbol names.

The bug occurs around code, which looks like this:

.text:00000001C0002CD8 48 FF 15 79 2A 35 00                    call    cs:__imp_SURFOBJ_TO_SURFACE
.text:00000001C0002CDF 0F 1F 44 00 00                          nop     dword ptr [rax+rax+00h]
.text:00000001C0002CE4 48 8B F0                                mov     rsi, rax             ;<--- BinExport wrongly detects this a the beginning of a new function sub_1C0002CE4
.text:00000001C0002CE7 48 89 84 24 30 02 00 00                 mov     [rsp+5A8h+var_378], rax
.text:00000001C0002CEF 48 8B CF                                mov     rcx, rdi

The bug occurs also on other binaries, which are compiled for Windows 10 19H1, at least also ws2ifsl.sys

The same binaries on previous windows versions do not trigger the bug, so I guess it is related to some new compiler feature/optimization

I tried to reproduce this case on sample driver with the latest visual studio compiler, but I failed to do so. I hope the information I provide are enough to reproduce the issue on your side.

I've tested this on the latest version of IDA (7.4.191112) with: BinExport Google 11 (@297836223, Feb 28 2020) com.google.binexport

I've encountered this bug also on IDA 7.3 with BinExport 10

cblichmann commented 4 years ago

Hi there!

Thanks for your report. A similar issue seems to occur in b/151242195 . See there for more details.

TL;DR - this is likely or heuristic for detecting non-returning functions.

The heuristic is implemented around here: https://github.com/google/binexport/blob/a156e279be8a64a26ee46589880a9fc6a7c865d4/flow_graph.cc#L92

Another related issue is b/151859042.

As optimizations become more aggressive, we seem to trigger this case more often than I would like. While this is working as intended, the intention is by now at least outdated ;-) -- marking this as a bug. If you have good ideas on how to improve the heuristic, I'm all ears :-)

cblichmann commented 4 years ago

Also happens in IDA 7.4 and BinExport 11. Both BinDiff 5 and 6.

Another workaround might be to use the Ghidra exporter (also in this repo or in the release version of BinDiff 6). That one does not have the heuristic and relies exclusively on the disassembler to figure out non-returning functions.

psylocn commented 4 years ago

Thanks, the workaround with -OBinExportX86NoReturnHeuristic:FALSE helped me alot!

Sorry, I do not know how to improve the heuristic. But why not let IDA figure out how to deal with non-returning functions? Similar to Ghidra :)

cblichmann commented 4 years ago

[...] Sorry, I do not know how to improve the heuristic. But why not let IDA figure out how to deal with non-returning functions? Similar to Ghidra :)

Yeah that will likely be the solution. The heuristic is there in the first place, because BinExport is basically a full disassembler and only uses IDA Pro to provide hints where the initial entry points are (and as instruction decoder).

cblichmann commented 3 years ago

BinDiff 7 sets -OBinExportX86NoReturnHeuristic:FALSE by default. We should still propagate any "No Return" attribute that a disassembler sets.