pfalcon / ScratchABit

Easily retargetable and hackable interactive disassembler with IDAPython-compatible plugin API
GNU General Public License v3.0
393 stars 47 forks source link

Switch function traversal from "breadth first" to "depth first" #14

Open pfalcon opened 7 years ago

pfalcon commented 7 years ago

https://github.com/pfalcon/ida-xtensa2/issues/1 represents an issue with the current code traversal process of SAB: it tries to follow all code paths of the current function, before considering calls the current function makes. Unfortunately, some calls are effectively jumps and never return. Trying to decode bytes after such a call may lead to conflicts in byte types, which may affect decoding of other functions following such instructions. So, instead, should try to emulate how a real CPU does - do a depth-first traversal of as many functions as possible first, and consider return addresses from calls only after that. This should allow to detect natural stop-gaps for unconstrained and incorrect code propagation. E.g. if following a call instruction there's a literal data belonging to another function, then it doesn't make sense to try to interpret this data as an instruction.

This approach will however have problems with function boundary detection (such detection was the reason to adopt breadth-first scheme). So, it will need to be reworked.

mewmew commented 7 years ago

Related to function boundary detection, some inspiration may be sought from rev.ng

Talk at https://youtu.be/5CbuU4KwBCE

From slides at http://llvm.org/devmtg/2016-11/Slides/DiFederico-rev.ng.pdf

The function detection process

  1. Identify function calls and return instructions
  2. Create a set of candidate function entry points (CFEP):
    1. called basic blocks
    2. unused code pointers in global data (e.g., not jump tables)
    3. code pointers embedded in the code
  3. Compute the basic blocks reachable from each CFEP
  4. Keep a CFEP only if:
    1. it’s a called basic block, or
    2. it’s reached by a skipping jump instruction

Skipping jump instruction in this case refers to a jump instruction (think tail call), which jumps over the basic blocks of another function.

pfalcon commented 7 years ago

@mewmew , Thanks, I noticed rev.ng, though was surprised (not really) that instead of being "suite of tools for static binary analysis", it's currently just a binary translator. Didn't see those slides, looked thru. Well... I saw a similar paper recently (https://syssec.mistakenot.net/papers/eurosp-2017.pdf), that one caught my attention with a title "Compiler-Agnostic Function Detection in Binaries". "Compiler-agnostic, really? How else could it be?"

So well, both that paper and rev.ng's try to deal with detection of all functions in a binary. Well, good (luck). That's something completely different than what ScratchABit does - it's purposed is to be a dumb tool, and detect only what guaranteedly can be detected. Beyond that, everything is offloaded to a human (which can do further analysis manually, or using heuristic plugins).

The topic of this ticket is how to do that automatic detection in a better way. Depth-first was implemented btw, I don't close this ticket because I'd like to review if there're some TODOs still left.

On related note, I recently did switch basic blocks in ScratchABlocks to end on a call (https://github.com/pfalcon/ScratchABlock/commit/1446a8380de3e6074b21ed423b039b491926b311), following the idea that in machine code, a call is much closer to a jump, than to a mathematical definition of function. That's of course how even Cifuentes suggested to do it, I just still find that other people like the idea that call == function too, e.g. http://x64dbg.com/blog/2016/07/27/Control-flow-graph.html