lifting-bits / mcsema

Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode
https://www.trailofbits.com/expertise/mcsema
GNU Affero General Public License v3.0
2.65k stars 343 forks source link

Sort-of lift function arguments #105

Open pgoodman opened 7 years ago

pgoodman commented 7 years ago

OK so my ideas for sort-of argument recovery would be a few phases, and it wouldn't necessarily lift arguments as such, and in no meaningful order, though a semi-meaningful order may be imposed. I will probably limit the scope of the analysis and stuff to the programs following the amd64 ABI for the time being.

The first phase is to identify and distinguish registers whose values are preserved by a function (saved on entry, restored on exit) and live registers (registers whose values are read in the callee without first being initialized). So, preserved registers are necessarily live, but I want to treat them independently.

The gist of what I want to accomplish with the analysis is to start with a simple liveness analysis. This will establish a superset of registers that I may want to pass as arguments. The second step is to refine this analysis with a simplistic simulation of operations on the stack, checking that at the end of a function, that some values saved to the stack are accordingly restored.

The second step is to inject loads and stores around function calls, so then where you'd normally see this:

...
call sub_f00
...

you'd now see this:

...
%rbp_pres = state->rbp
call sub_foo
state->rbp = %rbp_pres
...

The hope is that a mem2reg pass will do something semi smart with this.

With regs that are live on entry, say rdi, I want to do this:

...
rdi_arg = state->rdi
state->rdi = undef // maybe
call sub_foo(state, rdi_arg)
...

with sub_foo defined as:

def sub_foo(state, rdi_arg) {
  .. normal GEP prologue ...
  state->rdi = rdi_arg
  ... lifted instructions ...
}

Again, the hope is that the optimizer will do something sensible with this

All of this is predicated on issue #91 landing, as that should enable the optimizer to do better alias analysis. After that, some simplistic evaluation will have to be done to validate that the optimizer can even do anything useful given the above patterns. If so, then I'll see if I can dive into the implementation.

pgoodman commented 7 years ago

This will probably be best done with some Binary Ninja support. XREF Issue #181.