disasm: level up - Githubissues

This issue tracks approaches for a future level up hackathon :)

The notes are essentially a summary of the disassembly approach taken by angr as detailed in their (State of) The Art of War paper.

In short, angr uses four approaches for disassembly, each with strengths and drawbacks, and they iterate between these different methods during control flow graph generation (i.e. data/code separation) until the reach a fixed-point (none of the methods result in any additional basic blocks added or other new information to help guide the other approaches to more insights).

The four approaches are as follows (using their terminology):

forced execution
lightweight backward slicing
symbolic execution
value set analysis

Essentially, forced execution visits all branches of conditional statements of the assembly. It maintains a list of visited basic blocks, and can handle direct jumps, but not indirect jumps.

Symbolic execution is use whenever we get stuck at an indirect branch, and proceeds by working backwards until we reach a merge point (a part of the control flow where multiple paths join to later reach the indirect jump), and from the merge point perform forward (regular) symbolic execution until we reach the indirect jump. Once there, use a constraint solver to solve for the target(s) of the indirect jump.

Lightweight backwards slicing, is a fancy way of saying "sometimes we need more context". Essentially track control flow backwards across function call boundaries, as we may pass function pointers as arguments to function calls, and these function pointers would correspond to indirect jumps in the assembly of those functions.

Value set analysis (VSA) is what it has always been. A conservative analysis of the possible set of values that a register or memory location may take, where conservative in this sense means that we can have false positives (i.e. values that VSA reports as valid but that e.g. a register cannot actually hold during execution) but never false negatives (i.e. there are never values that a register can take during execution that are not included in the set of possible values by VSA).

Of course, with VSA, we can infer a set of potential targets for any given indirect jump, but these targets may not actually be targets during execution.

For these reasons, we combine each methods as they have strengths and drawbacks, with VSA never missing a potential target. Forced execution being blazingly fast. Symbolic execution being slow but giving us actual targets for indirect branches. And backwards slicing to maintain state across function call boundaries.

Oh well, this is at least some food for thought. We can look for and combine with other approaches for disassembly as well. Just wanted to summarize the key take-aways from SoTAoW.

lapsang-boys / pippi

disasm: level up #56