lifting-bits / anvill

anvill forges beautiful LLVM bitcode out of raw machine code
GNU Affero General Public License v3.0
340 stars 37 forks source link

ELF external thunk recognition #5

Open pgoodman opened 4 years ago

pgoodman commented 4 years ago

The ELF thunk recognition code of McSema should be copied and adapted for Anvill so that if a function references an ELF thunk, then we go and follow through and find the referenced external and use its name in the prototype, rather than the name of the function itself, which may be prefixed with junk.

That is, instead of a prototype of this function having the name _signal or .signal: image

We should instead follow through to the .plt segment...

image

And take the info from here:

image

The relevant code to adapt from McSema is:

https://github.com/lifting-bits/mcsema/blob/master/tools/mcsema_disass/ida7/get_cfg.py#L334-L466

pgoodman commented 3 years ago

Here is a rough example of the patterns we want to recognize, and how we want to "re-interpret" them:

Redirection-based patterns

The following patterns describe "code redirection" patterns, that is, where we want to orchestrate the lifting of control-flow to redirect control flow to something other than what is actually in the binary. In practice, the idea is to redirect to the intended target, rather than the actual mechanical target.

Pattern 1:

call [__libc_start_main@plt]

Replacement 1

call __libc_start_main

From an LLVM standpoint, this means the following:

Pattern 2

jmp [printf@plt]

Replacement 2

jmp printf

From an LLVM standpoint, this is similar to pattern and replacement (1). We want to have the same kind of redirection entry. Here, isntead of of lifting this as a call to the semantics, followed by a __remill_jump, we want to lift it as a call to semantics, followed by a terminating tail call to the lifted external printf.

Pattern 3

_printf:
  jmp [printf@plt]

foo:
  ...
  call _printf

Replacement 3

foo:
  ...
  call printf

This is similar to (1), but instead of a calling to the lifted version of the internal _printf, we want to redirect execution to the lifted external printf.

Pattern 4

_printf:
  jmp [printf@plt]

foo:
  ...
  jmp _printf

Replacement 4

foo:
  ...
  jmp printf

Similar to pattern 3, but using a terminating tail call redirection.

Relocation-based patterns

The following patterns describe data relocation-based patterns. This means operating on the actual operands of a lifted instruction, and substituting them with something else. Here are some examples of what we want to deal with.

Pattern 1

mov rax, [__libc_start_main@plt]
call rax

Replacement 1

tmp = alloca
store __libc_start_main, tmp
state->rax = load tmp 

This one is tricky. We want a relocation entry that says that a memory load of the address __libc_start_main@plt will load the address of the external __libc_start_main. By extending the instruction lifter class, in a nearly identical way to McSema, we can interpose on the operands and look at if they are used for memory reads or address generation, then try to figure out the effective loaded address, and identify if a relocation applies. If a relocation applies, then we want to invent a new address to be loaded, based off of an alloca that we pre-fill with the address of the external __libc_start_main.

artemdinaburg commented 3 years ago

I think we can finally close this.

pgoodman commented 3 years ago

We can't close just yet. Parts of this issue are done, but not all parts. What remains to be done: